Script that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).
Branch: master
Clone or download
angelosalatino Added new functionalities to the classifier
* extract the network of topics and then plot (using either networkx or webweb)
* extract the coverage of each topic based on how many of its descendants have been identified
Latest commit 6566dae Feb 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
v1 Update README.md Feb 1, 2019
v2
.gitattributes
.gitignore
LICENSE
README.md

README.md

CSO-Classifier

Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this repository, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of research areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.

About

The CSO Classifier is a novel application that takes as input the text from abstract, title, and keywords of a research paper and outputs a list of relevant concepts from CSO. It consists of two main components: (i) the syntactic module and (ii) the semantic module. Figure 1 depicts its architecture. The syntactic module parses the input documents and identifies CSO concepts that are explicitly referred in the document. The semantic module uses part-of-speech tagging to identify promising terms and then exploits word embeddings to infer semantically related topics. Finally, the CSO Classifier combines the results of these two modules and enhances them by including relevant super-areas.

Framework of CSO Classifier

Repository Structure

  • In v1 folder you can find the find version of the classifier published as poster paper at ISWC 2018. This classifier finds all topics in the ontology that are explicitly mentioned within the processed papers.
  • In v2 folder you can find the second version submitted to JCDL 2019. Pre-print. This classifier instead analyzes papers both on a syntactic and semantic level, and returns a set of pertinent research topics drawn from CSO.

Main Requirements

  1. Ensure you have Python 3 installed.
  2. Each folder will have its own requirements.txt file, including all necessary dependencies. Install them by executing the following command:pip install -r requirements.txt.

Other Links and Relevant Papers

How to cite this work

If you use the CSO Classfier in your research or work and would like to cite the SKM3 Application Programming Interface, we suggest you cite:

  • Salatino, Angelo; Thanapalasingam, Thiviyan; Mannocci, Andrea; Osborne, Francesco and Motta, Enrico (2018). Classifying Research Papers with the Computer Science Ontology. In: ISWC 2018 Posters & Demonstrations and Industry Tracks (van Erp, Marieke ed.).