# Access to endpoint and data sources CORD19-NEKG
### maintaned by WIMMICS team

This notebook is intended to be used as a client to the dataset called CORD-19 Named Entities Knowledge Graph (CORD19-NEKG): it describes the named entities identified in the 47,000+ articles provided by the COVID-19 Open Research Dataset (CORD-19).

For now, the named entities published are those identified by DBpedia Spotlight (DBpedia URIs) and entity-fishing (Wikidata URIs) in articles' titles and abstracts.

You can query the dataset from our Virtuoso endpoint: https://covid19.i3s.unice.fr/sparql. Here are the relevant named graphs:
    http://ns.inria.fr/covid19/graph/metadata: dataset description + definition of a few properties
    http://ns.inria.fr/covid19/graph/articles: articles metadata (title, authors, DOIs, journal etc.)
    http://ns.inria.fr/covid19/graph/dbpedia-spotlight: named entities identified by DBpedia Spotlight => 1,835,902 named entities
    http://ns.inria.fr/covid19/graph/entityfishing: named entities identified by Entity-fishing => 790,922 named entities

The dataset is described here: https://github.com/Wimmics/cord19-nekg
More specifically, you shall find details on how named entities are represented in RDF in that page: https://github.com/Wimmics/cord19-nekg/blob/master/doc/01-data-modeling.md


In [0]:
from __future__ import print_function

## Install required packages

#### SPARQLWrapper 

This package helps to convert service output to a Pandas DataFrame. https://rdflib.dev/sparqlwrapper/

#### Pandas

Using Pandas DataFrame to contain the query results.


NOTE: if you are runing Anaconda distribution the prefered way to install packages:

_conda install -c conda-forge sparqlwrapper_

_conda install pandas_

Only run it once or periodically to check for the updates.

In [0]:
!pip install pandas

In [2]:
!pip install SPARQLWrapper

Collecting SPARQLWrapper
  Downloading https://files.pythonhosted.org/packages/00/9b/443fbe06996c080ee9c1f01b04e2f683b2b07e149905f33a2397ee3b80a2/SPARQLWrapper-1.8.5-py3-none-any.whl
Collecting rdflib>=4.0
[?25l  Downloading https://files.pythonhosted.org/packages/3c/fe/630bacb652680f6d481b9febbb3e2c3869194a1a5fc3401a4a41195a2f8f/rdflib-4.2.2-py3-none-any.whl (344kB)
[K     |████████████████████████████████| 348kB 4.0MB/s 
[?25hCollecting isodate
[?25l  Downloading https://files.pythonhosted.org/packages/9b/9f/b36f7774ff5ea8e428fdcfc4bb332c39ee5b9362ddd3d40d9516a55221b2/isodate-0.6.0-py2.py3-none-any.whl (45kB)
[K     |████████████████████████████████| 51kB 6.2MB/s 
Installing collected packages: isodate, rdflib, SPARQLWrapper
Successfully installed SPARQLWrapper-1.8.5 isodate-0.6.0 rdflib-4.2.2


In [3]:
import pandas as pd
print('Pandas ver.', pd.__version__)

import SPARQLWrapper
import json
print('SPARQLWrapper ver.', SPARQLWrapper.__version__)

from SPARQLWrapper import SPARQLWrapper, JSON

Pandas ver. 1.0.3
SPARQLWrapper ver. 1.8.5


In [0]:
def sparql_service_to_dataframe(service, query):
    """
    Helper function to convert SPARQL results into a Pandas DataFrame.
    
    Credit to Ted Lawless https://lawlesst.github.io/notebook/sparql-dataframe.html
    """
    sparql = SPARQLWrapper(service)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    result = sparql.query()

    processed_results = json.load(result.response)
    cols = processed_results['head']['vars']

    out = []
    for row in processed_results['results']['bindings']:
        item = []
        for c in cols:
            item.append(row.get(c, {}).get('value'))
        out.append(item)

    return pd.DataFrame(out, columns=cols)


# Run queries

In [0]:
wds_Corese_Covid = 'https://covid19.i3s.unice.fr/sparql'

In [0]:
# Select articles with a reference to "coronavirus" 
query = '''
SELECT (group_concat(distinct ?name,", ") AS ?authors)
       ?title 
       (year(?date) as ?year)
       ?pub
       ?url
from <http://ns.inria.fr/covid19/graph/articles>
WHERE {
    ?doc a ?t;
        dce:creator ?name;
        dct:title ?title;
        schema:publication ?pub;
        schema:url ?url;
        dct:abstract [ rdf:value ?abs ].

    optional { ?doc dct:issued ?date }
    filter contains(?abs, "coronavirus")
} 
group by ?doc ?title ?date ?pub ?url
order by desc(?date)
limit 10
	
'''

In [7]:
%time df = sparql_service_to_dataframe(wds_Corese_Covid, query)
print(df.shape)

CPU times: user 8.22 ms, sys: 2.23 ms, total: 10.5 ms
Wall time: 1.87 s
(10, 5)


In [8]:
pd.set_option('display.max_colwidth', None)
pd.set_option('display.precision', 3)
pd.set_option('display.max_rows', 9999)

df

Unnamed: 0,authors,title,year,pub,url
0,"Compton, Susan, Macy, James",Chapter 13 Viral Disease,2020,The Laboratory Rat,https://doi.org/10.1016/b978-0-12-814338-4.00013-1
1,"Ennaji, Youssef, Khataby, Khadija, Mustapha, Moulay",Chapter 3 Infectious Bronchitis Virus in Poultry: Molecular Epidemiology and Factors Leading to the Emergence and Reemergence of Novel Strains of Infectious Bronchitis Virus,2020,Emerging and Reemerging Viral Pathogens,https://doi.org/10.1016/b978-0-12-814966-9.00003-2
2,"Khan, Gulfaraz, Sheek-Hussein, Mohamud",Chapter 8 The Middle East Respiratory Syndrome Coronavirus: An Emerging Virus of Global Threat,2020,Emerging and Reemerging Viral Pathogens,https://doi.org/10.1016/b978-0-12-819400-3.00008-9
3,"Al, Hai, Bai, Chunxue, Bai, Li, Chen, Hong, Chen, Rongchang, Dong, Chunling, Han, Baohui, Jiang, Jinjun, Jiang, Yan, Jin, Yang, Li, Jing, Li, Qiang, Li, Shengqing, Liu, Jie, Ma, Xia, Powell, Charles, Qiu, Zhongmin, Shen, Yao, Shi, Guochao, Song, Yuanlin, Song, Zhenju, Sun, Jiayuan, Tan, Fei, Tapuyihai, Al, Tong, Lin, Tu, Chunlin, Wang, Changhui, Wang, Jiwei, Wang, Qi, Wang, Xiongbiao, Wang, Xun, Wang, Yaoli, Wang, Yuehong, Wu, Chaomin, Wu, Xueling, Xiao, Kui, Xu, Tao, Xu, Yu, Yang, Dawei, Ye, Maosong, Yu, Jinming, Yu, Wencheng, Zhang, Ding, Zhang, Lichuan, Zhang, Min, Zhang, Xiaoju, Zhang, Yong, Zhang, Ziqiang, Zhao, Lin, Zhong, Nanshan, Zhou, Jian, Zhou, Xin, Zhu, Huili, Zhu, Xiaodan",Chinese experts’ consensus on the Internet of Things-aided diagnosis and treatment of coronavirus disease 2019 (COVID-19),2020,Clinical eHealth,https://doi.org/10.1016/j.ceh.2020.03.001
4,"Kasmi, Yassine, Khataby, Khadija, Mustapha, Moulay, Souiri, Amal","Chapter 7 Coronaviridae: 100,000 Years of Emergence and Reemergence",2020,Emerging and Reemerging Viral Pathogens,https://doi.org/10.1016/b978-0-12-819400-3.00007-7
5,"Li, Chun, Ren, Linzhu, Yang, Yanling",Genetic evolution analysis of 2019 novel coronavirus and coronavirus from other species,2020,"Infection, Genetics and Evolution",https://doi.org/10.1016/j.meegid.2020.104285
6,"Bashir, Nadia, Kazmi, Abeer, Khan, Suliman, Shereen, Muhammad, Siddique, Rabeea","COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses",2020,Journal of Advanced Research,https://doi.org/10.1016/j.jare.2020.03.005
7,"Chen, Zixian, Ding, Yuxiao, Li, Xiaogang, Lin, Chen, Niu, Meng, Sun, Zhujian, Xie, Bin",Asymptomatic novel coronavirus pneumonia patient outside Wuhan: The value of CT images in the course of the disease,2020,Clinical Imaging,https://doi.org/10.1016/j.clinimag.2020.02.008
8,"Phan, Tung",Genetic diversity and evolution of SARS-CoV-2,2020,"Infection, Genetics and Evolution",https://doi.org/10.1016/j.meegid.2020.104260
9,"Chen, Kaiyu, Wang, Peng, Wang, Pengfei, Zhang, Hongliang, Zhu, Shengqiang",Severe air pollution events not avoided by reduced anthropogenic activities during COVID-19 outbreak,2020,"Resources, Conservation and Recycling",https://doi.org/10.1016/j.resconrec.2020.104814
