# Ontology Validation: Competency Question addressing through Knowledge Graph Construction and Access on biomedical, pharmacological and biological  heterogeneous data sources
---

***Authors*** : *Ana Solbas, Natalia García Sánchez*

**Date** : *21/01/2023*

***Description***: Knowledge Graph (KG) materialization, description and access on biomedical, pharmacological and biological  heterogeneous data sources using Morph-KGC previous mapping file in RML

v1.0

*Dependencies*: `morph-kgc`

---


## KG generation
This will be done with the `morph-kgc` python library

**[Morph-KGC](https://github.com/oeg-upm/morph-kgc)** is an engine that constructs our wanted **[RDF](https://www.w3.org/TR/rdf11-concepts/)** knowledge graphs from heterogeneous data sources with different mapping language, among which the **[RML](https://rml.io/specs/rml/)** language can be found. This document will have the declarative mapping instructions we previously specified in our **[YARRRML](https://rml.io/yarrrml/)** document, and is obtained through prior conversion using the **[YARRRML's Matey](https://rml.io/yarrrml/assets/pdf/eswc2018.pdf)**  browser-based application that helps you write YARRRML rules and generate the corresponding final RML mapping files. All the mentioned files are annexed in the Supplementary Material Files.

Morph-KGC currently supports our data sources format **CSV** the working directory of this Colab Script for running the mapping.

We will run this library via the **command line**.This is the most recommended option if you work with large volumes of data. 


First of all, we need to **install** [Morph-KGC](https://pypi.org/project/morph-kgc) (this will also install [RDFLib](https://pypi.org/project/rdflib/) and [Oxigraph](https://pypi.org/project/pyoxigraph/)).

In [1]:
!pip install morph-kgc

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting morph-kgc
  Downloading morph_kgc-2.3.1-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 KB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting duckdb>=0.5.0
  Downloading duckdb-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.4/14.4 MB[0m [31m47.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyoxigraph>=0.3.0
  Downloading pyoxigraph-0.3.11-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.4/6.4 MB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sql-metadata>=2.3.0
  Downloading sql_metadata-2.6.0-py3-none-any.whl (21 kB)
Collecting rdflib>=6.1.1
  Downloading rdflib-6.2.0-py3-none-any.whl (500 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

We should mount the contents of the drive to access all mapping files and data sources for the KG

In [2]:
# drive access to csv files and mapping rules in YARRRML - ignore if using script on local
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Now we just need to **import** Morph-KGC and we are ready to go!

In [4]:
import morph_kgc

### RDF file generation

Now we only have to specify a config file describing the path of the RML `mapping_file.rml.ttl`

The config file `config.ini` has the following contents

```
[CONFIGURATION]
             na_values: 

[DataSource1]
             mappings: /path_to_files/RML_MappingFile.rml.ttl

```

where `output_tsv_joinfix.rml.ttl` represents the mapping files. Additional configuration specifications, like the missing values symbols expected and used, are specified in the `na_values` variable of `[CONFIGURATION]` section of the .ini file

The config file can be generated the following way:

In [24]:
# create the config file
!echo "[CONFIGURATION]" > path_to_files+'config.ini'
!echo "na_values: " >> path_to_files+'config.ini'
!echo "[DataSource1]" >> path_to_files+'config.ini'
!echo "mappings: /path/RML_MappingFile.rml.ttl" >> 'path/config.ini'

# show the config file
!cat path_to_files+'config.ini'

[CONFIGURATION]
na_values: 
[DataSource1]
mappings: /content/drive/MyDrive/entregase master/OhBOI/YARRML_mapping/YARRRML_Mapping_OhBOI/tsv/Mappings.ttl


Now with the following command, `morph_kgc` will generate a RDF triple file `knowledge_graph.nt` describing the data following the conceptual schema of the ontology, that can subsequently be loaded into a triplestore

In [25]:
!python3 -m morph_kgc path_to_files+'config.ini'

INFO | 2023-01-23 12:31:05,031 | 77 mapping rules retrieved.
INFO | 2023-01-23 12:31:05,081 | Mapping partition with 45 groups generated.
INFO | 2023-01-23 12:31:05,082 | Maximum number of rules within mapping group: 21.
INFO | 2023-01-23 12:31:05,085 | Mappings processed in 6.392 seconds.
INFO | 2023-01-23 12:31:06,405 | Number of triples generated in total: 3943.
INFO | 2023-01-23 12:31:06,405 | Materialization finished in 1.257 seconds.


## RDF triple post-processing
Loading output file with RDF triples

In [26]:
with open('/content/knowledge-graph.nt') as f:
  docstring = f.read()

Replacing "Â" by ""

In [27]:
docstring = docstring.replace("Â","")

Writing processed file to drive

In [28]:
with open(path_to_files+'OhBOI_RDF.nt', 'w') as f:
  f.write(docstring)

# KG Access

Now we can work with our RDFLib graph: query, navigate or save the graph and more with **[RDFLib](https://rdflib.readthedocs.io)**, which is the reference library to work with RDF in Python. Morph-KGC can also be used to load a KG to RDFLib. 

We will investigate about the final generated KG with several SPARQL queries. This will perform as a validation of KG access.

- Number of Distinct subjects 
- Distinct properties
- Distinct subject property tuples


**Load Knowledge Graph to [RDFLib](https://rdflib.readthedocs.io)**
With the generated RDF we could for instance load it to RDFLib (or any triplestore) and pose queries.

The RDFLib graph output was chosen for this. In this way, instead of using a triplestore such as Brazegraph or GraphDB to upload the RDF data, the data was loaded into a RDFLib datastore importing the rdflib python library in conjunction with morph-kgc library. Then, the RDF data from the KG file had to be converted to a RDFLib graph using both of this libraries. These were the previous steps to perform access to the KG using SPARQL queries using RDFLib.

Example of the data generated

In [35]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

q0 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

SELECT ?cdsubjects (count (DISTINCT ?s) as ?cdsubjects) 
         WHERE {
           ?s ?p ?o .}
      """

q0_res = g.query(q0)
print(q0_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f0cc3baa5e0>


In [36]:
import json
results_json = q0_res.serialize(format="json")
resultdict = json.loads(results_json)["results"]["bindings"]
result = resultdict[0]['cdsubjects']['value']
print("Number of distinct subjects : "+str(result))

Number of distinct subjects : 957


Number of distinct treatments

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

q11 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

         SELECT ?number_of_distinct_treatments (count (DISTINCT ?treatments) as ?number_of_distinct_treatments) 
         WHERE {
           ?treatments a ncit:C28180 .
         }
      """

q11_res = g.query(q11)
print(q11_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f4a72cc2f40>


In [None]:
import json
results_json = q11_res.serialize(format="json")
resultdict = json.loads(results_json)["results"]["bindings"]
result = resultdict[0]['number_of_distinct_treatments']['value']
print("Number of distinct treatments : "+str(result))

Number of distinct treatments : 131


Second question : Distinct properties

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

q2 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 
         SELECT DISTINCT ?p 
         WHERE {
           ?s ?p ?o .
         }
      """

q2_res = g.query(q2)
print(q2_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f4a7278a5b0>


In [None]:
import json
import pandas as pd
df = pd.DataFrame.from_dict(q2_res.bindings)
df

Unnamed: 0,p
0,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
1,https://w3id.org/ohboi#contains
2,https://w3id.org/ohboi#hasPrescription
3,http://semanticscience.org/resource/SIO_001403
4,https://w3id.org/ohboi#hasCommonName
5,http://www.w3.org/2000/01/rdf-schema#subClassOf
6,https://w3id.org/ohboi#belongsTo
7,https://w3id.org/ohboi#hasCode
8,https://w3id.org/ohboi#prescribes
9,https://w3id.org/ohboi#prevalence


Distinct subjects and properties and non-distinct subject-property tuple count(checking for duplicates)

In [32]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

q31 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

         SELECT ?s ?p
         WHERE {
           ?s ?p ?o .
         }
      """

q31_res = g.query(q31)
print(q31_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f0cc2d717f0>


In [33]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

q32 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

         SELECT DISTINCT ?s ?p 
         WHERE {
           ?s ?p ?o .
         }
      """

q32_res = g.query(q32)
print(q32_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f0cc3ab8100>


In [34]:
import json
import pandas as pd
df = pd.DataFrame.from_dict(q32_res.bindings)
df_nd = pd.DataFrame.from_dict(q31_res.bindings)
print("Number of distinct subject property tuples: ", df.shape[0])
print("Number of repeated s-p tuples: ", df_nd.shape[0]-df.shape[0] )
print('----------------------')
print("Number of triples: ", df_nd.shape[0])

Number of distinct subject property tuples:  3184
Number of repeated s-p tuples:  744
----------------------
Number of triples:  3928


#Competency Questions

What pathogens exist for Y disease

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq1 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#>  

  SELECT DISTINCT ?Pathogen ?pathogen_NCIT_code ?Disease ?Disease_SNOMED_code 
         WHERE {
           ?pathogen_XCO_Code rdfs:subClassOf ?pathogen_NCIT_code .
           ?pathogen_NCIT_code ohboi:hasBacteriaName ?Pathogen .
           ?pathogen_XCO_Code snomedct:cause_of ?Disease_DOID_code .
           ?Disease_DOID_code snomedct:due_to ?pathogen_XCO_Code .
           ?Disease_DOID_code rdfs:subClassOf ?Disease_SNOMED_code .
           ?Disease_SNOMED_code ohboi:hasDiseaseLabel ?Disease .
         }
      """

cq1_res = g.query(cq1)
print(cq1_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f6dca6db3a0>


In [None]:
import json
import pandas as pd
df_CQ1 = pd.DataFrame.from_dict(cq1_res.bindings)
df_CQ1

Unnamed: 0,Disease,Disease_SNOMED_code,Pathogen,pathogen_NCIT_code
0,Pneumonia,http://purl.bioontology.org/ontology/SNOMEDCT/...,Streptococcus pneumoniae,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
1,Chronic tonsillitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Streptococcus pneumoniae,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
2,Cholera due to Vibrio cholerae O1 Classical bi...,http://purl.bioontology.org/ontology/SNOMEDCT/...,Vibrio cholerae-asiaticae,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
3,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Mycobacterium tuberculosis,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
4,Tuberculosis of genitourinary system,http://purl.bioontology.org/ontology/SNOMEDCT/...,Mycobacterium tuberculosis,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
5,Tuberculosis of bones and/or joints,http://purl.bioontology.org/ontology/SNOMEDCT/...,Mycobacterium tuberculosis,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
6,Disseminated tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Mycobacterium tuberculosis,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
7,Multidrug resistant tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Mycobacterium tuberculosis,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
8,Relapsing fever,http://purl.bioontology.org/ontology/SNOMEDCT/...,Borrelia,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...
9,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Borrelia,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...


In [None]:
# from a specific disease SNOMED TERM : Lyme Disease -->  http://purl.bioontology.org/ontology/SNOMEDCT/23502006
df_CQ1[df_CQ1[rdflib.term.Variable('Disease_SNOMED_code')]==rdflib.term.URIRef("http://purl.bioontology.org/ontology/SNOMEDCT/23502006")]

Unnamed: 0,Disease,Disease_SNOMED_code,Pathogen,pathogen_NCIT_code
9,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Borrelia,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...


Saving example to csv

In [None]:
df_example.to_csv(path_to_files+"Data_Example.csv", index=False)

Treatment approved for X infection

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq2 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

  SELECT DISTINCT ?Disease ?Disease_SNOMED_code ?Drug_ID ?Drug_BrandName ?Drug_CommonName ?Substance ?atc ?mass ?formula ?smiles
         WHERE {
           ?Disease_DOID_code rdfs:subClassOf ?Disease_SNOMED_code .
           ?Disease_SNOMED_code ohboi:hasDiseaseLabel ?Disease .
           ?Disease_DOID_code ohboi:hasPrescription ?Presc .
           ?Presc ohboi:prescribes ?Drug_ID .
           ?Drug_ID ohboi:prescribedIn ?Presc .
           OPTIONAL { ?Drug_ID ohboi:hasBrandName ?Drug_BrandName }
           OPTIONAL { ?Drug_ID ohboi:hasCommonName ?Drug_CommonName }
           ?Drug_ID rdfs:subClassOf ?Substance .
           OPTIONAL { ?Substance cco:atcClassification ?atc }
           ?Substance skos:exactMatch ?ChemEntity
           OPTIONAL { ?ChemEntity obo:formula ?formula }
           OPTIONAL { ?ChemEntity obo:mass ?mass }
           OPTIONAL { ?ChemEntity obo:smiles ?smiles }
         }
      """

cq2_res = g.query(cq2)
print(cq2_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f6dc796b370>


In [None]:
import json
import pandas as pd
df_CQ2 = pd.DataFrame.from_dict(cq2_res.bindings)
df_CQ2

Unnamed: 0,Disease,Disease_SNOMED_code,Drug_BrandName,Drug_CommonName,Drug_ID,Substance,atc,formula,mass,smiles
0,Chronic gonorrhea lower genitourinary tract,http://purl.bioontology.org/ontology/SNOMEDCT/...,,cefpodoxime,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,J01DD13,C15H17N5O6S2,427.46,COCC1=C(N2C(C(C2=O)NC(=O)C(=NOC)C3=CSC(=N3)N)S...
1,Chronic tonsillitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Solfoton,phenobarbital,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,N03AA02,C12H12N2O3,232.24,CCC1(C(=O)NC(=O)NC1=O)C2=CC=CC=C2
2,Cholera due to Vibrio cholerae O1 Classical bi...,http://purl.bioontology.org/ontology/SNOMEDCT/...,Solfoton,phenobarbital,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,N03AA02,C12H12N2O3,232.24,CCC1(C(=O)NC(=O)NC1=O)C2=CC=CC=C2
3,Pneumonia,http://purl.bioontology.org/ontology/SNOMEDCT/...,Solfoton,phenobarbital,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,N03AA02,C12H12N2O3,232.24,CCC1(C(=O)NC(=O)NC1=O)C2=CC=CC=C2
4,Chronic tonsillitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,cefuroxime axetil,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,,C20H22N4O10S,510.48,CC(OC(=O)C)OC(=O)C1=C(CSC2N1C(=O)C2NC(=O)C(=NO...
5,Pneumonia,http://purl.bioontology.org/ontology/SNOMEDCT/...,,cefuroxime axetil,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,,C20H22N4O10S,510.48,CC(OC(=O)C)OC(=O)C1=C(CSC2N1C(=O)C2NC(=O)C(=NO...
6,Cellulitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,cefuroxime axetil,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,,C20H22N4O10S,510.48,CC(OC(=O)C)OC(=O)C1=C(CSC2N1C(=O)C2NC(=O)C(=NO...
7,Pneumonia,http://purl.bioontology.org/ontology/SNOMEDCT/...,Foscavir,foscarnet,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,,CNa3O5P,191.95,C(=O)(O)P(=O)(O)O
8,Chronic tonsillitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Aminopar,Aminopar,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,J04AA01,C7H7NO3,153.14,C1=CC(=C(C=C1N)O)C(=O)O
9,Chronic tonsillitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Foscavir,foscarnet,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,,CNa3O5P,191.95,C(=O)(O)P(=O)(O)O


In [None]:
# from a specific disease SNOMED TERM : Lyme Disease -->  http://purl.bioontology.org/ontology/SNOMEDCT/23502006
df_CQ2[df_CQ2[rdflib.term.Variable('Disease_SNOMED_code')]==rdflib.term.URIRef("http://purl.bioontology.org/ontology/SNOMEDCT/23502006")]

Unnamed: 0,Disease,Disease_SNOMED_code,Drug_BrandName,Drug_CommonName,Drug_ID,Substance,atc,formula,mass,smiles
27,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,,cefditoren,https://pubchem.ncbi.nlm.nih.gov/compound/1004...,https://www.ebi.ac.uk/chembl/compound_report_c...,J01DD16,C19H18N6O5S3,506.59,CC1=C(SC=N1)C=CC2=C(N3C(C(C3=O)NC(=O)C(=NOC)C4...
28,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,,cefuroxime axetil,https://pubchem.ncbi.nlm.nih.gov/compound/1000...,https://www.ebi.ac.uk/chembl/compound_report_c...,,C20H22N4O10S,510.48,CC(OC(=O)C)OC(=O)C1=C(CSC2N1C(=O)C2NC(=O)C(=NO...


Antitoxin for a bacterial toxin

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq3 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

  SELECT DISTINCT ?Pathogen ?pathogen_NCIT_code ?gene_symbol ?Toxin ?toxin_name ?number_aminoacids ?Antitoxin ?Drug_Term_Antitoxin
         WHERE {
           ?pathogen_NCIT_code ohboi:hasBacteriaName ?Pathogen .
           ?pathogen_NCIT_code ncit:R41 ?gene .
           OPTIONAL{ ?gene ohboi:hasSymbol ?gs} 
           OPTIONAL{ ?gs ohboi:hasGeneSymbolCode ?gene_symbol} 
           ?protein sio:SIO_010079 ?gene .
           OPTIONAL{ ?protein ohboi:hasLength ?number_aminoacids} 
           OPTIONAL{ ?protein ohboi:hasName ?toxin_name} 
           ?proteinToxin rdfs:subClassOf ?Toxin .
           ?Toxin ohboi:isNeutralizedBy ?Antitoxin .
           OPTIONAL{ ?Antitoxin rdfs:subClassOf ?Drug_Term_Antitoxin}
         }
      """

cq3_res = g.query(cq3)
print(cq3_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f6dc8723ca0>


In [None]:
import json
import pandas as pd
df_CQ3 = pd.DataFrame.from_dict(cq3_res.bindings)
df_CQ3.head()

Unnamed: 0,Antitoxin,Drug_Term_Antitoxin,Pathogen,Toxin,gene_symbol,number_aminoacids,pathogen_NCIT_code,toxin_name
0,https://pubchem.ncbi.nlm.nih.gov/substance/160...,https://www.ebi.ac.uk/chembl/compound_report_c...,Streptococcus pneumoniae,http://purl.uniprot.org/uniprot/P0DPI1,purB,432,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,Adenylosuccinate lyase
1,https://pubchem.ncbi.nlm.nih.gov/substance/160...,https://pubchem.ncbi.nlm.nih.gov/substance/160...,Streptococcus pneumoniae,http://purl.uniprot.org/uniprot/P0DPI1,purB,432,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,Adenylosuccinate lyase
2,https://pubchem.ncbi.nlm.nih.gov/substance/160...,https://www.ebi.ac.uk/chembl/compound_report_c...,staphylococcus aureus,http://purl.uniprot.org/uniprot/P0DPI1,floA,329,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,Flotillin-like protein FloA
3,https://pubchem.ncbi.nlm.nih.gov/substance/160...,https://pubchem.ncbi.nlm.nih.gov/substance/160...,staphylococcus aureus,http://purl.uniprot.org/uniprot/P0DPI1,floA,329,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,Flotillin-like protein FloA
4,https://pubchem.ncbi.nlm.nih.gov/substance/160...,https://www.ebi.ac.uk/chembl/compound_report_c...,VPI 4440 Clostridium putrificum,http://purl.uniprot.org/uniprot/P0DPI1,FDG75_00035,206,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,Antitoxin botulinum a


In [None]:
# from a specific microbial Taxon - Clostridium Botulinum NCBI-TAXON ID 1491 -->  VPI 4440 Clostridium putrificum

columnslist=[rdflib.term.Variable('Pathogen'), rdflib.term.Variable('pathogen_NCIT_code'),  rdflib.term.Variable('Drug_Term_Antitoxin'), rdflib.term.Variable('Toxin'), rdflib.term.Variable('toxin_name'), rdflib.term.Variable('gene_symbol'), rdflib.term.Variable('number_aminoacids') ]
df_CQ3[df_CQ3[rdflib.term.Variable('Pathogen')]==rdflib.term.Literal('VPI 4440 Clostridium putrificum')][columnslist]

Unnamed: 0,Pathogen,pathogen_NCIT_code,Drug_Term_Antitoxin,Toxin,toxin_name,gene_symbol,number_aminoacids
4,VPI 4440 Clostridium putrificum,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,https://www.ebi.ac.uk/chembl/compound_report_c...,http://purl.uniprot.org/uniprot/P0DPI1,Antitoxin botulinum a,FDG75_00035,206
5,VPI 4440 Clostridium putrificum,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,https://pubchem.ncbi.nlm.nih.gov/substance/160...,http://purl.uniprot.org/uniprot/P0DPI1,Antitoxin botulinum a,FDG75_00035,206
26,VPI 4440 Clostridium putrificum,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,https://www.ebi.ac.uk/chembl/compound_report_c...,http://purl.uniprot.org/uniprot/P0DPI1,Botulinum neurotoxin type A,bont,1296
27,VPI 4440 Clostridium putrificum,http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus...,https://pubchem.ncbi.nlm.nih.gov/substance/160...,http://purl.uniprot.org/uniprot/P0DPI1,Botulinum neurotoxin type A,bont,1296


Symptoms associated to a disease

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq4 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 
  SELECT DISTINCT ?Disease ?Disease_SNOMED_code ?Symptom_code ?Symptom
         WHERE {
           ?Disease_DOID_code rdfs:subClassOf ?Disease_SNOMED_code .
           ?Disease_SNOMED_code ohboi:hasDiseaseLabel ?Disease .
           ?Disease_DOID_code ohboi:presents ?Symptom_code .
           ?Symptom_code ohboi:hasPAName ?Symptom
         }
      """

cq4_res = g.query(cq4)
print(cq4_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f6dc77841f0>


In [None]:
import json
import pandas as pd
df_CQ4 = pd.DataFrame.from_dict(cq4_res.bindings)
df_CQ4

Unnamed: 0,Disease,Disease_SNOMED_code,Symptom,Symptom_code
0,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Decreased facial muscle strength,http://purl.obolibrary.org/obo/HP_0030319
1,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Abnormality of the facial nerve,http://purl.obolibrary.org/obo/HP_0010827
2,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,CSF pleocytosis,http://purl.obolibrary.org/obo/HP_0012229
3,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Bell's palsy,http://purl.obolibrary.org/obo/HP_0010628
4,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Erythema chronicum migrans,http://purl.obolibrary.org/obo/HP_0031180
...,...,...,...,...
169,Cholera due to Vibrio cholerae O1 Classical bi...,http://purl.bioontology.org/ontology/SNOMEDCT/...,Non-occlusive coronary artery disease,http://purl.obolibrary.org/obo/HP_0012436
170,Cholera due to Vibrio cholerae O1 Classical bi...,http://purl.bioontology.org/ontology/SNOMEDCT/...,decreased itch response,http://purl.obolibrary.org/obo/MP_0010073
171,Cholera due to Vibrio cholerae O1 Classical bi...,http://purl.bioontology.org/ontology/SNOMEDCT/...,increased thigmotaxis,http://purl.obolibrary.org/obo/MP_0002797
172,Cholera due to Vibrio cholerae O1 Classical bi...,http://purl.bioontology.org/ontology/SNOMEDCT/...,decreased mast cell degranulation,http://purl.obolibrary.org/obo/MP_0008765


In [None]:
# from a specific disease SNOMED TERM : Lyme Disease -->  http://purl.bioontology.org/ontology/SNOMEDCT/23502006
df_CQ4[df_CQ4[rdflib.term.Variable('Disease_SNOMED_code')]==rdflib.term.URIRef("http://purl.bioontology.org/ontology/SNOMEDCT/23502006")]

Unnamed: 0,Disease,Disease_SNOMED_code,Symptom,Symptom_code
0,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Decreased facial muscle strength,http://purl.obolibrary.org/obo/HP_0030319
1,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Abnormality of the facial nerve,http://purl.obolibrary.org/obo/HP_0010827
2,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,CSF pleocytosis,http://purl.obolibrary.org/obo/HP_0012229
3,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Bell's palsy,http://purl.obolibrary.org/obo/HP_0010628
4,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Erythema chronicum migrans,http://purl.obolibrary.org/obo/HP_0031180
5,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,abnormal visual contrast sensitivity,http://purl.obolibrary.org/obo/MP_0011831
6,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Erythema,http://purl.obolibrary.org/obo/HP_0010783
7,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Fever,http://purl.obolibrary.org/obo/HP_0001945
8,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Lymphocytoma cutis,http://purl.obolibrary.org/obo/HP_0031549
9,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,Headache,http://purl.obolibrary.org/obo/HP_0002315


Prevalence of disease

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq5 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 
  SELECT DISTINCT ?Disease ?Disease_SNOMED_code ?Prevalence ?Date ?Country ?NameCity
  WHERE {
           ?Disease_DOID_code rdfs:subClassOf ?Disease_SNOMED_code .
           ?Disease_SNOMED_code ohboi:hasDiseaseLabel ?Disease .
           ?P_ID ohboi:inDate ?Date .
           ?P_ID ohboi:prevalence ?Prevalence .
           ?P_ID sio:SIO_001403 ?C_ID .
           ?C_ID ohboi:hasCode ?Country .
           OPTIONAL{ ?C_ID ohboi:contains ?City .
            ?City ohboi:hasCityCode ?NameCity }
         }
      """

cq5_res = g.query(cq5)
print(cq5_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f6dc7f0c340>


In [None]:
import json
import pandas as pd
df_CQ5 = pd.DataFrame.from_dict(cq5_res.bindings)
df_CQ5

Unnamed: 0,Country,Date,Disease,Disease_SNOMED_code,NameCity,Prevalence
0,Guatemala,01/01/2000,Lyme disease,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.041
1,Guatemala,01/01/2000,Disseminated tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.041
2,Guatemala,01/01/2000,Cellulitis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.041
3,Guatemala,01/01/2000,Pneumonia,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.041
4,Guatemala,01/01/2000,Listeriosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.041
...,...,...,...,...,...,...
3362,Venezuela (Bolivarian Republic of),01/01/2000,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.04
3363,Venezuela (Bolivarian Republic of),01/01/2000,Relapsing fever,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.04
3364,Venezuela (Bolivarian Republic of),01/01/2000,Multidrug resistant tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.04
3365,Venezuela (Bolivarian Republic of),01/01/2000,Tuberculosis of bones and/or joints,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.04


In [None]:
# from a specific disease SNOMED TERM : Pulmonary Tuberculosis -->  http://purl.bioontology.org/ontology/SNOMEDCT/154283005
df_CQ5[df_CQ5[rdflib.term.Variable('Disease_SNOMED_code')]==rdflib.term.URIRef("http://purl.bioontology.org/ontology/SNOMEDCT/154283005")]

Unnamed: 0,Country,Date,Disease,Disease_SNOMED_code,NameCity,Prevalence
8,Guatemala,01/01/2000,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.041
21,Guatemala,01/01/2021,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.035
34,Kiribati,01/01/2000,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.457
47,Kiribati,01/01/2002,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.328
60,New Zealand,01/01/2008,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.0091
...,...,...,...,...,...,...
3310,Palau,01/01/2018,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.126
3323,Syrian Arab Republic,01/01/2000,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.061
3336,Angola,01/01/2000,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.404
3349,Algeria,01/01/2008,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,,0.098


First Case Reported

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq6 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

  SELECT DISTINCT ?Disease ?Disease_SNOMED_code ?NameCity
         WHERE {
          ?Disease_DOID_code rdfs:subClassOf ?Disease_SNOMED_code .
          ?Disease_SNOMED_code ohboi:hasDiseaseLabel ?Disease .
          ?Disease_DOID_code ohboi:hasFirstCaseReportedIn ?City .
          ?City ohboi:hasCityCode ?NameCity
         }
      """

cq6_res = g.query(cq6)
print(cq6_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f4a71ab5970>


In [None]:
import json
import pandas as pd
df_CQ6 = pd.DataFrame.from_dict(cq6_res.bindings)
df_CQ6

Unnamed: 0,Disease,Disease_SNOMED_code,NameCity
0,Pulmonary tuberculosis,http://purl.bioontology.org/ontology/SNOMEDCT/...,Monaco


Pathways affected by organism and disease

In [None]:
import rdflib

g = rdflib.Graph()
g.parse(path_to_files+'OhBOI_RDF.nt')

# paying careful attention to the typo in the property schema:adress 
# that the materialized KG loaded because of a misspelling in the mapping rule declaration

cq7 = """
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX schema: <http://schema.org/> 
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
  PREFIX obo: <http://purl.obolibrary.org/obo/> 
  PREFIX snomedct: <http://purl.bioontology.org/ontology/SNOMEDCT/> 
  PREFIX ncit:  <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
  PREFIX sio: <http://semanticscience.org/resource/> 
  PREFIX mesh: <http://purl.bioontology.org/ontology/MESH/> 
  PREFIX wp: <http://vocabularies.wikipathways.org/wp#> 
  PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> 
  PREFIX ohboi: <https://w3id.org/ohboi#> 

  SELECT DISTINCT ?Disease ?Pathogen ?PathwayAffected_ReactomeCode
         WHERE {
           ?Disease_DOID_code rdfs:subClassOf ?Disease_SNOMED_code .
           ?Disease_SNOMED_code ohboi:hasDiseaseLabel ?Disease .
           ?pathogen_XCO_Code rdfs:subClassOf ?pathogen_NCIT_code .
           ?pathogen_NCIT_code ohboi:hasBacteriaName ?Pathogen .
           ?pathogen_XCO_Code snomedct:cause_of ?Disease_DOID_code .
           ?Disease_DOID_code snomedct:due_to ?pathogen_XCO_Code .
           ?Disease_DOID_code sio:SIO_001158 ?Pathway_ID .
           ?Pathway_ID ohboi:hasPathwayLabel ?PathwayAffected_ReactomeCode .
         }
      """

cq7_res = g.query(cq7)
print(cq7_res)

<rdflib.plugins.sparql.processor.SPARQLResult object at 0x7f6dca676bb0>


In [None]:
import json
import pandas as pd
df_CQ7 = pd.DataFrame.from_dict(cq7_res.bindings)
df_CQ7

Unnamed: 0,Disease,Pathogen,PathwayAffected_ReactomeCode
0,Disseminated tuberculosis,Mycobacterium tuberculosis,R-HSA-9637698.1
1,Pulmonary tuberculosis,Mycobacterium tuberculosis,R-HSA-9637698.1
2,Multidrug resistant tuberculosis,Mycobacterium tuberculosis,R-HSA-9637698.1
3,Tuberculosis of genitourinary system,Mycobacterium tuberculosis,R-HSA-9637698.1
4,Tuberculosis of bones and/or joints,Mycobacterium tuberculosis,R-HSA-9637698.1
