Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Covid-on-the-Web Dataset

Covid-on-the-Web Dataset is an RDF dataset that provides two main knowledge graphs produced by analyzing the scholarly articles of the COVID-19 Open Research Dataset (CORD-19) [1], a resource of articles about COVID-19 and the coronavirus family of viruses:

  • the CORD-19 Named Entities Knowledge Graph describes named entities identified and disambiguated by NCBO BioPortal annotator, Entity-fishing and DBpedia Spotlight.
  • the CORD-19 Argumentative Knowledge Graph describes argumentative components and PICO elements (Patient/Population/Problem, Intervention, Comparison, Outcome) extracted from the articles by the Argumentative Clinical Trial Analysis platform (ACTA).

A description of the dataset, in the Turtle format, as well as examples are provided in the dataset directory.

Covid-on-the-Web Dataset is an initiative of the Wimmics team, I3S laboratory, University Côte d'Azur, Inria, CNRS.

Covid-on-the-Web Dataset v1.2 is based on CORD-19 v47.


CORD-19 Named Entities Knowledge Graph (CORD19-NEKG)

To identify and disambiguate named entities, we used DBpedia Spotlight (links to DBpedia), Entity-fishing (links to Wikidata), and NCBO BioPortal annotator (links to ontologies in Bioportal).

Named entities were identified primarily in the articles' titles and abstracts. Entity-fishing was also used to process the articles' bodies.

The table below shows the total number of named entities extracted by each tool, as well as the corresponding number of unique URIs.

DBpedia Wikidata Bioportal Total
No. named entities 4,084,979 66,098,777 42,972,551 113,156,307
No. unique URIs 63,750 252,150 429,755 745,655

CORD-19 Argumentative Knowledge Graph (CORD19-AKG)

To extract argumentative components (claims and evidences) and PICO elements, we used the Argumentative Clinical Trial Analysis platform (ACTA) [2].

Argumentative components and PICO elements were extracted from the articles' abstracts.

No. argumentative components 119,053
No. PICO elements linked to UMLS concepts 515,590
No. unique UMLS concepts 31,841

URIs naming scheme

Covid-on-the-Web namespace is All URIs are dereferenceable.

The dataset itslef is identified by URI It comes with DCAT and VOID descriptions. All articles, annotations and arguments are linked back to the dataset with property rdfs:isDefinedBy.

Article URIs are formatted as where paper_id may be either the article SHA hash or its PCM identifier. Parts of an article (title, abstract and body) are also identified by URIs so that annotations of named entities can link back to the part they belong to. These URIs are formatted as


Downloading and SPARQL Querying

The dataset is downloadable as a set of RDF dumps (in Turtle syntax) from Zenodo: DOI

It can also be queried through our Virtuoso OS SPARQL endpoint

You may use the Faceted Browser to look up text or URIs. As an example, you can look up article Further details about how named entities are represented in RDF are given in the Data Modeling section.

The following named graphs can be queried from our SPARQL endpoint:

Named graph Description No. RDF triples dataset description + definition of a few properties 170 articles metadata (title, authors, DOIs, journal etc.) 3,722,381 named entities identified by Entity-fishing in articles titles/abstracts 35,049,832 named entities identified by Entity-fishing in articles bodies 1,156,611,321 named entities identified by Bioportal Annotator in articles titles/abstracts 104,430,547 named entities identified by DBpedia Spotlight in articles titles/abstracts 65,359,664 argumentative components and PICO elements extracted by ACTA from articles titles/abstracts 7,469,234
Total 1,361,451,364

The example query below retrieves two articles that have been annotated with at least one common Wikidata entity.

select ?uri ?title1 ?title2
where {
  graph <> {
    ?paper1 a fabio:ResearchPaper; dct:title ?title1.
    ?paper2 a fabio:ResearchPaper; dct:title ?title2.
    filter (?paper1 != ?paper2)
  graph <> {
    ?a1 a oa:Annotation;
        schema:about ?paper1;
        oa:hasBody ?uri.
    ?a2 a oa:Annotation;
        schema:about ?paper2;
        oa:hasBody ?uri.
} limit 10


See the LICENSE file.

Cite this work

When including Covid-on-the-Web data in a publication or redistribution, please cite this paper:

Franck Michel, Fabien Gandon, Valentin Ah-Kane, Anna Bobasheva, Elena Cabrio, Olivier Corby, Raphaël Gazzotti, Alain Giboin, Santiago Marro, Tobias Mayer, Mathieu Simon, Serena Villata, Marco Winckler. Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research. International Semantic Web Conference (ISWC), Nov 2020, Athens, Greece. PDF


[1] Wang, L.L., Lo, K., Chandrasekhar, Y., Reas, R., Yang, J., Eide, D., Funk, K., Kinney, R.M., Liu, Z., Merrill, W., Mooney, P., Murdick, D.A., Rishi, D., Sheehan, J., Shen, Z., Stilson, B., Wade, A.D., Wang, K., Wilhelm, C., Xie, B., Raymond, D.M., Weld, D.S., Etzioni, O., & Kohlmeier, S. (2020). CORD-19: The Covid-19 Open Research Dataset. ArXiv, abs/2004.10706.

[2] T. Mayer, E. Cabrio, and S. Villata. ACTA a tool for argumentative clinical trialanalysis. In Proceedings of the 28th International Joint Conference on ArtificialIntelligence (IJCAI), pages 6551–6553, 2019.