This notebook provides a very brief introduction in using the [CX-RDF](https://github.com/cthoyt/cx-rdf) library, which exports CX networks to RDF for querying with SPARQL. This enables multiple networks to be represented together and easily queried (if they have the schema).

## Environment

In [1]:
# built-in
import sys
import os
import time

# third party
from cx_rdf import CX, cx_to_rdf_graph
import ndex2
from ndex2.niceCXNetwork import NiceCXNetwork
import pandas as pd

INFO:rdflib:RDFLib Version: 4.2.1


In [2]:
print(sys.version)

3.6.5 (default, Jun 17 2018, 12:13:06) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]


In [3]:
print(time.asctime())

Tue Jul 24 16:55:43 2018


## Get Data from ComPath

ComPath is a resource of curated equivalency and hierarchical mappings between various pathway databases from Domingo-Fernandez, *et al.*, 2018. More information can be found at https://compath.scai.fraunhofer.de and its accompanying manuscript can be viewed on [bioRxiv](https://doi.org/10.1101/353235) while it is under review.

In [4]:
compath_mappings_url = 'https://compath.scai.fraunhofer.de/export_mappings'
names = [
    'p1.name',
    'p1.id',
    'p1.source',
    'relation',
    'p2.name',
    'p2.id',
    'p2.source',
]

df = pd.read_csv(compath_mappings_url, sep='\t', header=0, names=names)
df.head()

Unnamed: 0,p1.name,p1.id,p1.source,relation,p2.name,p2.id,p2.source
0,AMPK signaling pathway - Homo sapiens (human),path:hsa04152,kegg,equivalentTo,AMP-activated Protein Kinase (AMPK) Signaling,WP1403,wikipathways
1,Leptin and adiponectin,WP3934,wikipathways,isPartOf,Adipocytokine signaling pathway - Homo sapiens...,path:hsa04920,kegg
2,"Alanine, aspartate and glutamate metabolism - ...",path:hsa00250,kegg,isPartOf,Amino Acid metabolism,WP3925,wikipathways
3,Alanine and aspartate metabolism,WP106,wikipathways,isPartOf,"Alanine, aspartate and glutamate metabolism - ...",path:hsa00250,kegg
4,Alcoholism - Homo sapiens (human),path:hsa05034,kegg,isPartOf,Common Pathways Underlying Drug Addiction,WP2636,wikipathways


## Conversion to CX

The schema of this data can quite simply be converted to CX easily with the ndex2 utility.

In [5]:
network = ndex2.create_nice_cx_from_pandas(
    df, 
    source_field='p1.id', 
    source_node_attr=['p1.name', 'p1.source'],
    target_field='p2.id',
    target_node_attr=['p2.name', 'p2.source'],
    edge_interaction='relation',
)
network.set_name('Pathway Mappings from ComPath')

In [6]:
network.upload_to(None, os.environ['NDEX_USERNAME'], os.environ['NDEX_PASSWORD'])

'http://public.ndexbio.org/v2/network/a58ffc9c-8f51-11e8-a4bf-0ac135e8bacf'

## Conversion to RDF

While CX is a powerful interchange format, converting to CX allows for querying data that is underlying.

In [7]:
cx = network.to_cx()

In [8]:
rdf = cx_to_rdf_graph(cx)

## Querying with SPARQL

This dictionary is passed to the RDFLib query function to make prettier queries without writing the prefixes directly in the SPARQL.

In [9]:
init_ns = {
    'cx': CX,
}

**Query**: Finding the property of a node

In [10]:
query = """
SELECT ?label ?name
WHERE {
    ?node a cx:node .
    ?node rdfs:label ?label .
    ?node cx:node_has_attribute ?name_attribute .
    ?name_attribute cx:attribute_has_name "p2.name" .
    ?name_attribute cx:attribute_has_value ?name .
}
LIMIT 5
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result))

Unnamed: 0,0,1
0,path:hsa04060,Cytokine-cytokine receptor interaction - Homo ...
1,path:hsa04216,Ferroptosis - Homo sapiens (human)
2,R-HSA-9006936,Signaling by TGF-beta family members
3,WP3869,Cannabinoid receptor signaling
4,WP1982,Sterol Regulatory Element-Binding Proteins (SR...


**Query**: Finding multiple properties

In [11]:
query = """
SELECT ?label ?name ?source
WHERE {
    ?node a cx:node .
    ?node rdfs:label ?label .

    ?node cx:node_has_attribute ?name_attribute .
    ?name_attribute cx:attribute_has_name "p2.name" .
    ?name_attribute cx:attribute_has_value ?name .

    ?node cx:node_has_attribute ?source_attribute .
    ?source_attribute cx:attribute_has_name "p2.source" .
    ?source_attribute cx:attribute_has_value ?source .
}
LIMIT 5
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result))

Unnamed: 0,0,1,2
0,R-HSA-212436,Generic Transcription Pathway,reactome
1,R-HSA-163359,Glucagon signaling in metabolic regulation,reactome
2,R-HSA-72613,Eukaryotic Translation Initiation,reactome
3,WP167,Eicosanoid Synthesis,wikipathways
4,R-HSA-71403,Citric acid cycle (TCA cycle),reactome


**Query**: reconstituting relationships from RDF

In [12]:
query = """
SELECT ?a_label ?b_label ?c_label
WHERE {
    ?a ?b ?c .
    ?a a cx:node .
    ?c a cx:node .
    ?a rdfs:label ?a_label .
    ?b cx:edge_has_interaction ?b_label .
    ?c rdfs:label ?c_label .
}
LIMIT 5
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result))

Unnamed: 0,0,1,2
0,R-HSA-168142,isPartOf,path:hsa04620
1,R-HSA-2559580,isPartOf,WP408
2,path:hsa00072,equivalentTo,R-HSA-74182
3,path:hsa00072,equivalentTo,WP311
4,path:hsa00072,isPartOf,R-HSA-1430728


Find all subpathways for the KEGG pathway: Pathways in cancer - Homo sapiens (human) (path:hsa05200).

In [41]:
query = """

SELECT ?s_label
WHERE {
    ?node a cx:node ; 
        rdfs:label "path:hsa05200" .
    
    ?s ?p ?node .
    ?p cx:edge_has_interaction "isPartOf" .
    
    ?s rdfs:label ?s_label   
}
"""

result = rdf.query(query, initNs=init_ns)
pd.DataFrame(list(result))

Unnamed: 0,0
0,WP2828
1,R-HSA-211163
2,R-HSA-111461
3,WP1971
4,R-HSA-111452
5,R-HSA-2033519
6,R-HSA-1169092
7,R-HSA-5632928
8,WP2261
9,R-HSA-5632927
