# Querying ComPath with CX RDF

This notebook provides a very brief introduction in using the [CX-RDF](https://github.com/cthoyt/cx-rdf) library, which exports CX networks to RDF for querying with SPARQL. This enables multiple networks to be represented together and easily queried (if they have the schema).

## Environment

In [1]:
# built-in
import sys
import os
import time

# third party
from cx_rdf import CX, cx_to_rdf_graph
import ndex2
from ndex2.niceCXNetwork import NiceCXNetwork
import pandas as pd

INFO:rdflib:RDFLib Version: 4.2.1


In [2]:
print(sys.version)

3.6.5 (default, Jun 17 2018, 12:13:06) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]


In [3]:
print(time.asctime())

Wed Jul 25 10:48:17 2018


## Get Data from ComPath

ComPath is a resource of curated equivalency and hierarchical mappings between various pathway databases from Domingo-Fernandez, *et al.*, 2018. More information can be found at https://compath.scai.fraunhofer.de and its accompanying manuscript can be viewed on [bioRxiv](https://doi.org/10.1101/353235) while it is under review.

In [4]:
compath_mappings_url = 'https://compath.scai.fraunhofer.de/export_mappings'
names = [
    'p1.name',
    'p1.id',
    'p1.source',
    'relation',
    'p2.name',
    'p2.id',
    'p2.source',
]

df = pd.read_csv(compath_mappings_url, sep='\t', header=0, names=names)
df.head()

Unnamed: 0,p1.name,p1.id,p1.source,relation,p2.name,p2.id,p2.source
0,AMPK signaling pathway - Homo sapiens (human),path:hsa04152,kegg,equivalentTo,AMP-activated Protein Kinase (AMPK) Signaling,WP1403,wikipathways
1,Leptin and adiponectin,WP3934,wikipathways,isPartOf,Adipocytokine signaling pathway - Homo sapiens...,path:hsa04920,kegg
2,"Alanine, aspartate and glutamate metabolism - ...",path:hsa00250,kegg,isPartOf,Amino Acid metabolism,WP3925,wikipathways
3,Alanine and aspartate metabolism,WP106,wikipathways,isPartOf,"Alanine, aspartate and glutamate metabolism - ...",path:hsa00250,kegg
4,Alcoholism - Homo sapiens (human),path:hsa05034,kegg,isPartOf,Common Pathways Underlying Drug Addiction,WP2636,wikipathways


## Conversion to CX

The schema of this data can quite simply be converted to CX easily with the ndex2 utility.

In [5]:
network = ndex2.create_nice_cx_from_pandas(
    df, 
    source_field='p1.id', 
    source_node_attr=['p1.name', 'p1.source'],
    target_field='p2.id',
    target_node_attr=['p2.name', 'p2.source'],
    edge_interaction='relation',
)
network.set_name('Pathway Mappings from ComPath')

In [6]:
network.upload_to(None, os.environ['NDEX_USERNAME'], os.environ['NDEX_PASSWORD'])

'http://public.ndexbio.org/v2/network/7bef0760-8fe7-11e8-a4bf-0ac135e8bacf'

## Conversion to RDF

While CX is a powerful interchange format, converting to CX allows for querying data that is underlying.

In [7]:
cx = network.to_cx()

In [8]:
rdf = cx_to_rdf_graph(cx)

print(f'Serialization resulted in {len(rdf)} triples')

Serialization resulted in 23833 triples


## Querying with SPARQL

This dictionary is passed to the RDFLib query function to make prettier queries without writing the prefixes directly in the SPARQL.

In [9]:
init_ns = {
    'cx': CX,
}

### Finding the property of a node

In [15]:
query = """
SELECT ?label ?name
WHERE {
    ?node a cx:node .
    ?node rdfs:label ?label .
    ?node cx:node_has_attribute ?name_attribute .
    ?name_attribute cx:attribute_has_name "p2.name" .
    ?name_attribute cx:attribute_has_value ?name .
}
LIMIT 5
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result), columns=['Identifier', 'Label'])

Unnamed: 0,Identifier,Label
0,path:hsa03010,Ribosome - Homo sapiens (human)
1,path:hsa04080,Neuroactive ligand-receptor interaction - Homo...
2,R-HSA-392499,Metabolism of proteins
3,R-HSA-5693548,Sensing of DNA Double Strand Breaks
4,WP357,Fatty Acid Biosynthesis


### Finding multiple properties of a node concurrently

In [14]:
query = """
SELECT ?source ?label ?name 
WHERE {
    ?node a cx:node .
    ?node rdfs:label ?label .

    ?node cx:node_has_attribute ?name_attribute .
    ?name_attribute cx:attribute_has_name "p2.name" .
    ?name_attribute cx:attribute_has_value ?name .

    ?node cx:node_has_attribute ?source_attribute .
    ?source_attribute cx:attribute_has_name "p2.source" .
    ?source_attribute cx:attribute_has_value ?source .
}
LIMIT 5
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result), columns=['Database', 'Identifier', 'Label'])

Unnamed: 0,Database,Identifier,Label
0,kegg,path:hsa00532,Glycosaminoglycan biosynthesis - chondroitin s...
1,kegg,path:hsa01230,Biosynthesis of amino acids - Homo sapiens (hu...
2,reactome,R-HSA-71182,Phenylalanine and tyrosine catabolism
3,kegg,path:hsa04710,Circadian rhythm - Homo sapiens (human)
4,wikipathways,WP23,B Cell Receptor Signaling Pathway


###  Reconstituting relationships from RDF

In [16]:
query = """
SELECT ?a_label ?b_label ?c_label
WHERE {
    ?a ?b ?c .
    ?a a cx:node .
    ?c a cx:node .
    ?a rdfs:label ?a_label .
    ?b cx:edge_has_interaction ?b_label .
    ?c rdfs:label ?c_label .
}
LIMIT 5
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result), columns=['Source', 'Relation', 'Target'])

Unnamed: 0,Source,Relation,Target
0,R-HSA-8941284,isPartOf,WP474
1,R-HSA-5689880,isPartOf,path:hsa03050
2,path:hsa00532,isPartOf,R-HSA-1430728
3,path:hsa01230,isPartOf,R-HSA-1430728
4,path:hsa01230,isPartOf,R-HSA-71291


### Find all subpathways for a KEGG pathway


This query focuses on path:hsa05200: "Pathways in cancer - Homo sapiens (human)".

In [17]:
query = """

SELECT ?node ?s_label
WHERE {
    ?node a cx:node ; 
        rdfs:label "path:hsa05200" .
    
    ?s ?p ?node .
    ?p cx:edge_has_interaction "isPartOf" .
    
    ?s rdfs:label ?s_label   
}
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result), columns=['Node', 'Label'])

Unnamed: 0,Node,Label
0,N5767cb49e0b24f889edc26fb9cfc3e14,WP2261
1,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-1169092
2,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-5632927
3,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-211163
4,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-5632928
5,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-111461
6,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-2033519
7,N5767cb49e0b24f889edc26fb9cfc3e14,R-HSA-111452
8,N5767cb49e0b24f889edc26fb9cfc3e14,WP2828
9,N5767cb49e0b24f889edc26fb9cfc3e14,WP1971
