Skip to content

Latest commit

 

History

History
33 lines (27 loc) · 1.65 KB

tasks.md

File metadata and controls

33 lines (27 loc) · 1.65 KB

nlp@21 class projects task list:

Dataset(Arxiv6K):

About: 6000+ Arxiv papers from AI category at 2020. The dataset contains latex source files and images, which make it a good research dataset for multimodal learning.

Task1(ResearchKG):

About:Build fine-grained knowledge graph from given research papers of Arxiv6k. Consider answering the following questions:

  • Which sentence is most similar to a given sentence?
  • What concepts can be extracted from the corpus?
  • Which concept is relevant to a given phrase/concept and in what manner?
  • Which concepts are relevant to a given research problem?
  • Which concepts are clustered together in one paragraph/section/paper?
  • ... other important questions...

Task2(MultiModalSys):

About: Build multimodal retrieval or recommendation system supporting text, image, formulas, and tables. Consider answering the following questions:

  • Which image is most relevant to a given sentence/query?
  • Which sentence/paragraph is most relevant to a given image?
  • Which formulas are relevant to a given sentence/query?
  • Which tables are relevant to a given sentence/query?
  • What concepts are relevant to a given formula?
  • ... other important questions ...

Task3(AIHelper)

About:build AI helper system for computer science.

Task4(DIY):

About:build your own dataset, and develop some interesting models with it.

Any suggestions are welcome, current tasks may be updated and new tasks may be added in the future.