# Introduction

This notebook is an hands-on introduction to querying chemicals and reactions of biological interest using SPARQL.

| Source | Description | web site | SPARQL endpoint |
| :---- | :------ | -------: | -------: |
| CHEBI/Rhea | chemicals & reactions | [https://www.rhea\-db.org/](https://www.rhea-db.org/) | [https://sparql.rhea\-db.org/sparql](https://sparql.rhea-db.org/sparql) |  


  
**rhea_chemicals.ipynb**  focuses on chemistry.  
Reactions and enzymes will be presented in **rhea_enzymes.ipynb** notebook.  

This notebook is built with a **Python3 kernel** and uses the **SPARQLWrapper**, **pandas**  and **numpy** python libraries.  

**Note that you can simply use this notebook as a list of SPARQL queries that you can execute somewhere else.**  
Each query starts with a comment line (#) indicating the SPARQL endpoint to be used to run the query.  

Example: SPARQL_Q1 query  
\#endpoint: https://sparql.rhea-db.org/sparql  
Go to https://sparql.rhea-db.org/sparql and copy/paste Q1 query in the query form.



<div>
<img src="Image/what_is_rhea.png" alt=“What is Rhea?” width="70%" height="70%" border=1/>
</div>

**Rhea website**: `https://www.rhea-db.org`  
<div>
<img src="Image/rhea_website.png" alt=“Rhea website” width="70%" height="70%" border=1/>
</div>


**Rhea SPARQL endpoint**: `https://sparql.rhea-db.org/`
<div>
<img src="Image/rhea_sparql.png" alt=“Rhea SPARQL endpoint” width="70%" height="70%" border=1/>
</div>


## Rhea SPARQL endpoint content

**Preliminary remarks**
* The Rhea SPARQL endpoint (https://sparql.rhea-db.org/sparql) is built from
both rhea.rdf and chebi.owl (ftp://ftp.expasy.org/databases/rhea/rdf/).
The datasets are synchronized with UniProt RDF data releases (4-6 releases/y).  

* <span style="color:red">rhea.rdf does not contain the cross-references to UniProt protein entries</span>. They are available through the UniProt SPARQL endpoint
(https://sparql.uniprot.org/sparql).  


![Rhea stats](Image/rhea_sparql_stats.png)  

The statistics are available [here](https://sparql.rhea-db.org/.well-known/void)  


## Rhea data model


In Rhea RDF data model, all data are represented as subclasses of rdfs:Class, i.e
there are no instances.  

![Rhea schema](Image/rhea_rdf_schema.png) 

The documentation is available [here](https://ftp.expasy.org/databases/rhea/rdf/rhea_rdf_documentation.pdf)  


# Initialisation

## Python libraries

**Dependencies:**
- **SPARQLWrapper**  
SPARQLWrapper is a wrapper around a SPARQL service. It helps in creating the query URI and, possibly, convert the result into a more manageable format. The package is licensed under W3C license.  
useful links: https://rdflib.github.io/sparqlwrapper/ and https://pypi.org/project/SPARQLWrapper/
- **pandas**  
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive.  
useful link: https://pandas.pydata.org/   
- **numpy**  
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays.  
useful link: https://numpy.org/   



In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON

# pandas
import pandas as pd
from pandas import json_normalize
# display options
pd.options.display.max_colwidth=200

# numpy
import numpy as np


# Run a SPARQL query step by step: retrieve Rhea reactions and their chemical equations

**SPARQL query**   
`PREFIX rh:<http://rdf.rhea-db.org/>`  
`PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>` 

`SELECT ?rhea ?equation`   
`WHERE {`  
`     ?rhea rdfs:subClassOf rh:Reaction .`  
`     ?rhea rh:equation ?equation .`  
` }  `  

<div>
<img src="Image/rhea_equation.png" alt=“Rhea equation” width="80%" height="80%" border=1/>
</div>



In [2]:
# Simple use of SPARQLWrapper module
# the URL of the Rhea SPARQL endpoint is given and the JSON return format is used

# Initialize sparql with Rhea SPARQL endpoint
sparql=SPARQLWrapper('https://sparql.rhea-db.org/sparql')    

# Define the SPARQL query and return the first 5 rows
sparql.setQuery("""
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
}
LIMIT 5
""")

sparql.setReturnFormat(JSON)    
res = sparql.query().convert()
print('SPARQL query result (JSON format):\n')
print(res)


SPARQL query result (JSON format):

{'head': {'vars': ['rhea', 'equation']}, 'results': {'bindings': [{'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10000'}, 'equation': {'type': 'literal', 'value': 'H2O + pentanamide = NH4(+) + pentanoate'}}, {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10004'}, 'equation': {'type': 'literal', 'value': 'benzyl isothiocyanate = benzyl thiocyanate'}}, {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10008'}, 'equation': {'type': 'literal', 'value': '[protein]-dithiol + a hydroperoxide = [protein]-disulfide + an alcohol + H2O'}}, {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10012'}, 'equation': {'type': 'literal', 'value': '(R)-6-hydroxynicotine + H2O + O2 = 6-hydroxypseudooxynicotine + H2O2'}}, {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10016'}, 'equation': {'type': 'literal', 'value': 'H2O + O-sinapoylcholine = (E)-sinapate + choline + H(+)'}}]}}


**JSON** stands for **J**ava**S**cript **O**bject **N**otation (https://www.w3schools.com/whatis/whatis_json.asp)  

JSON is a lightweight format for storing and transporting data.  
JSON is often used when data is sent from a server to a web page.  

JSON is "self-describing" and easy to understand  

In [3]:
res.keys()

dict_keys(['head', 'results'])

In [4]:
res['head']

{'vars': ['rhea', 'equation']}

In [5]:
res['results']

{'bindings': [{'rhea': {'type': 'uri',
    'value': 'http://rdf.rhea-db.org/10000'},
   'equation': {'type': 'literal',
    'value': 'H2O + pentanamide = NH4(+) + pentanoate'}},
  {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10004'},
   'equation': {'type': 'literal',
    'value': 'benzyl isothiocyanate = benzyl thiocyanate'}},
  {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10008'},
   'equation': {'type': 'literal',
    'value': '[protein]-dithiol + a hydroperoxide = [protein]-disulfide + an alcohol + H2O'}},
  {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10012'},
   'equation': {'type': 'literal',
    'value': '(R)-6-hydroxynicotine + H2O + O2 = 6-hydroxypseudooxynicotine + H2O2'}},
  {'rhea': {'type': 'uri', 'value': 'http://rdf.rhea-db.org/10016'},
   'equation': {'type': 'literal',
    'value': 'H2O + O-sinapoylcholine = (E)-sinapate + choline + H(+)'}}]}

## store results in a pandas dataframe

In [6]:
print('SPARQL query result (pandas df):')
json_normalize(res["results"]["bindings"])

SPARQL query result (pandas df):


Unnamed: 0,rhea.type,rhea.value,equation.type,equation.value
0,uri,http://rdf.rhea-db.org/10000,literal,H2O + pentanamide = NH4(+) + pentanoate
1,uri,http://rdf.rhea-db.org/10004,literal,benzyl isothiocyanate = benzyl thiocyanate
2,uri,http://rdf.rhea-db.org/10008,literal,[protein]-dithiol + a hydroperoxide = [protein]-disulfide + an alcohol + H2O
3,uri,http://rdf.rhea-db.org/10012,literal,(R)-6-hydroxynicotine + H2O + O2 = 6-hydroxypseudooxynicotine + H2O2
4,uri,http://rdf.rhea-db.org/10016,literal,H2O + O-sinapoylcholine = (E)-sinapate + choline + H(+)


## Define a function to run a SPARQL query and process its results 

### sparql2pandas function

In [7]:
#sparql_uniprot_url = "https://sparql.uniprot.org/sparql/"
sparql_rhea_url = "https://sparql.rhea-db.org/sparql"

def sparql2pandas(sparqlQuery, sparql_service_url):
    """
    Query a SPARQL endpoint with a given query string and return the results as a processed pandas Dataframe.
    """
    sparql=SPARQLWrapper(sparql_service_url)
    sparql.setQuery(sparqlQuery)
    sparql.setReturnFormat(JSON)

    # run the SPARQL query
    res = sparql.query().convert()
    # convert the JSON result in pandas dataframe
    res_sparql_df = json_normalize(res["results"]["bindings"])

    # distinguish .type and .value
    col_type = [c for c in res_sparql_df.columns.tolist() if ".type" in c]
    col_value = [c for c in res_sparql_df.columns.tolist() if ".value" in c]
    col_datatype = [c for c in res_sparql_df.columns.tolist() if ".datatype" in c]

    # Remove prefix part from URI
    for i in range(0,len(col_type)):
        if 'uri' in res_sparql_df[col_type[i]].unique().tolist() :
            res_sparql_df[col_value[i]] = res_sparql_df[col_value[i]].str.split(pat='/').str.get(-1)

    # Remove .type columns
    res_sparql_df.drop(col_type,axis=1,inplace=True)
    # Remove .datatype columns
    res_sparql_df.drop(col_datatype,axis=1,inplace=True)

    # Remove ".value" from column names
    res_sparql_df = res_sparql_df.rename(columns = lambda col: col.replace(".value", ""))

    return res_sparql_df

## re-run our simple query



In [8]:
# define and print sparql_Q1

Q1="""
#endpoint: https://sparql.rhea-db.org/sparql
#query Q1: retrieve all Rhea reactions and their chemical equations.

PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
}
"""

print(Q1)


#endpoint: https://sparql.rhea-db.org/sparql
#query Q1: retrieve all Rhea reactions and their chemical equations.

PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
}



In [9]:
# Execute query Q1
try:
    df  = sparql2pandas(Q1,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query Q1')
    print(e)
    df = pd.DataFrame()

print('Q1 results: df.shape',df.shape)
# Display the 5 first rows
df.head(5)

Q1 results: df.shape (15453, 2)


Unnamed: 0,rhea,equation
0,10000,H2O + pentanamide = NH4(+) + pentanoate
1,10004,benzyl isothiocyanate = benzyl thiocyanate
2,10008,[protein]-dithiol + a hydroperoxide = [protein]-disulfide + an alcohol + H2O
3,10012,(R)-6-hydroxynicotine + H2O + O2 = 6-hydroxypseudooxynicotine + H2O2
4,10016,H2O + O-sinapoylcholine = (E)-sinapate + choline + H(+)


Compare the results with the Rhea web site:  
* https://www.rhea-db.org/rhea?query=  

<div>
<img src="Image/rhea_web_browse.png" alt=“Rhea equation” width="60%" height="60%" border=1/>
</div>



# Rhea reaction, status, transport

<div>
<img src="Image/rhea_status_istransport.png" alt=“Rhea equation” width="80%" height="80%" border=1/>
</div>


In [10]:
# define and print Q2

Q2="""
#endpoint: https://sparql.rhea-db.org/sparql
#query Q2: Retrieve Rhea reaction, equation, status, isTransport
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?rhea
       ?equation
       ?status
       ?isTransport
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:status ?status .
  ?rhea rh:isTransport ?isTransport .
}
"""

print(Q2)


#endpoint: https://sparql.rhea-db.org/sparql
#query Q2: Retrieve Rhea reaction, equation, status, isTransport
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>

SELECT ?rhea
       ?equation
       ?status
       ?isTransport
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:status ?status .
  ?rhea rh:isTransport ?isTransport .
}



In [11]:
# Execute Q2
try:
    df  = sparql2pandas(Q2,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query Q2')
    print(e)

print('Q2 results: df.shape',df.shape)
# Display the 5 first rows
df.head(5)

Q2 results: df.shape (15453, 4)


Unnamed: 0,rhea,isTransport,status,equation
0,10000,False,Approved,H2O + pentanamide = NH4(+) + pentanoate
1,10004,False,Approved,benzyl isothiocyanate = benzyl thiocyanate
2,10008,False,Approved,[protein]-dithiol + a hydroperoxide = [protein]-disulfide + an alcohol + H2O
3,10012,False,Approved,(R)-6-hydroxynicotine + H2O + O2 = 6-hydroxypseudooxynicotine + H2O2
4,10016,False,Approved,H2O + O-sinapoylcholine = (E)-sinapate + choline + H(+)


In [12]:
# Check how many Rhea reactions are approved and preliminary
df.status.value_counts()

Approved       15426
Preliminary       27
Name: status, dtype: int64

In [13]:
# Check how many Rhea reactions are transport
df.isTransport.value_counts()

false    14197
true      1256
Name: isTransport, dtype: int64

# Rhea reaction, reaction sides and reaction participants

In this part of the tutorial we will learn how to perform our search in Rhea database based on the individual reaction participants

**Reaction sides and participants:**  
<div>
<img src="Image/rhea_reaction_side_participant.png" alt=“Rhea reaction, side and participants” width="70%" height="70%" border=1/>
</div>

**Reaction participants:**  
<div>
<img src="Image/rhea_reaction_participant_01.png" alt=“Rhea participants” width="70%" height="70%" border=1/>
</div>

**Macromolecule as  reaction participant:**  
<div>
<img src="Image/rhea_reaction_participant_02.png" alt=“Rhea participant: macromolecule” width="70%" height="70%" border=1/>
</div>



## Q4: Retrieve Rhea reactions, sides, participants and their ChEBI 



In [14]:
Q4="""
#endpoint:https://sparql.rhea-db.org/sparql
#query sparql_Q4: Retrieve Rhea reactions, reaction sides, participants and ChEBI

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>

SELECT distinct ?rhea
                ?reactionSide
                ?compoundClass
                ?accession
                ?chebi

WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .

  ?rhea rh:side ?reactionSide .
  ?reactionSide rh:contains ?participant .
  ?participant rh:compound ?compound .

  # Reaction participant accession
  ?compound rh:accession ?accession .

  VALUES ?compoundClass {
    rh:SmallMolecule
    rh:GenericPolypeptide
    rh:GenericPolynucleotide
    rh:Polymer
  }
  # Reaction participant type (compoundClass)
  ?compound rdfs:subClassOf ?compoundClass .
  
  # ChEBI participant
  ?compound rh:chebi | rh:reactivePart/rh:chebi | rh:underlyingChebi ?chebi .
}
ORDER BY ?rhea
"""
print(Q4)


#endpoint:https://sparql.rhea-db.org/sparql
#query sparql_Q4: Retrieve Rhea reactions, reaction sides, participants and ChEBI

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>

SELECT distinct ?rhea
                ?reactionSide
                ?compoundClass
                ?accession
                ?chebi

WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .

  ?rhea rh:side ?reactionSide .
  ?reactionSide rh:contains ?participant .
  ?participant rh:compound ?compound .

  # Reaction participant accession
  ?compound rh:accession ?accession .

  VALUES ?compoundClass {
    rh:SmallMolecule
    rh:GenericPolypeptide
    rh:GenericPolynucleotide
    rh:Polymer
  }
  # Reaction participant type (compoundClass)
  ?compound rdfs:subClassOf ?compoundClass .
  
  # ChEBI participant
  ?compound rh:chebi | rh:reactivePart/rh:chebi | rh:underlyingChebi ?chebi .
}
ORDER BY ?rhea



In [15]:
try:
    df  = sparql2pandas(Q4,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query Q4')
    print(e)

print('Q4 results: df.shape',df.shape)
# Display the first 10 results
df.head(10)

Q4 results: df.shape (75381, 5)


Unnamed: 0,rhea,compoundClass,chebi,reactionSide,accession
0,10000,SmallMolecule,CHEBI_15377,10000_L,CHEBI:15377
1,10000,SmallMolecule,CHEBI_16459,10000_L,CHEBI:16459
2,10000,SmallMolecule,CHEBI_28938,10000_R,CHEBI:28938
3,10000,SmallMolecule,CHEBI_31011,10000_R,CHEBI:31011
4,10004,SmallMolecule,CHEBI_16017,10004_R,CHEBI:16017
5,10004,SmallMolecule,CHEBI_17484,10004_L,CHEBI:17484
6,10008,SmallMolecule,CHEBI_15377,10008_R,CHEBI:15377
7,10008,SmallMolecule,CHEBI_30879,10008_R,CHEBI:30879
8,10008,SmallMolecule,CHEBI_35924,10008_L,CHEBI:35924
9,10008,GenericPolypeptide,CHEBI_50058,10008_R,GENERIC:10593


In [34]:
# Unique ChEBIs per compoundClass
df[['compoundClass','chebi']].drop_duplicates().compoundClass.value_counts()

SmallMolecule            10970
GenericPolypeptide         701
GenericPolynucleotide      251
Polymer                    176
Name: compoundClass, dtype: int64

In [35]:
# Number of Rhea reactions involving proteins (compoundClass == GenericPolypeptide)
lst_rhea_prot = df[df.compoundClass=='GenericPolypeptide'].rhea.unique()
print('#rhea involving proteins as reaction participants =',len(lst_rhea_prot))

#rhea involving proteins as reaction participants = 2360


The result differs slightly from Rhea's public website  

[https://www.rhea\-db.org/rhea?query=](https://www.rhea-db.org/rhea?query=)  

\+ Filtering = reaction involving proteins  

Reason: the few proteins with a polymer as a functional residue have compoundClass == Polymer.  

Example: [https://www.rhea\-db.org/rhea/16213](https://www.rhea-db.org/rhea/16213)  


## Q5: Retrieve all approved reactions using L\-glutamate \(CHEBI:29985\) AND L\-glutamine \(CHEBI:58359\) in opposite reaction sides.



In [16]:
chebi1 = '29985'
chebi2 = '58359'

Q5="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q5: Retrieve reactions using
#          L-glutamate (CHEBI:29985) AND L-glutamine (CHEBI:58359)
#          in opposite reaction sides.

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>


SELECT distinct ?rhea ?equation
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .

  ?rhea rh:side ?side1 .
  ?side1 rh:contains ?participant1 .
  ?participant1 rh:compound ?compound1 .
  ?compound1 rh:chebi CHEBI:"""+chebi1+""" .

  ?rhea rh:side ?side2 .
  ?side2 rh:contains ?participant2 .
  ?participant2 rh:compound ?compound2 .
  ?compound2 rh:chebi CHEBI:""" +chebi2+ """ .
  
  # we want the two sides belonging to the same reaction
  ?side1 rh:transformableTo ?side2 .
}"""

print(Q5)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q5: Retrieve reactions using
#          L-glutamate (CHEBI:29985) AND L-glutamine (CHEBI:58359)
#          in opposite reaction sides.

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>


SELECT distinct ?rhea ?equation
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .

  ?rhea rh:side ?side1 .
  ?side1 rh:contains ?participant1 .
  ?participant1 rh:compound ?compound1 .
  ?compound1 rh:chebi CHEBI:29985 .

  ?rhea rh:side ?side2 .
  ?side2 rh:contains ?participant2 .
  ?participant2 rh:compound ?compound2 .
  ?compound2 rh:chebi CHEBI:58359 .
  
  # we want the two sides belonging to the same reaction
  ?side1 rh:transformableTo ?side2 .
}


In [17]:
try:
    df  = sparql2pandas(Q5,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query Q5')
    print(e)

print('Q5 results: df.shape',df.shape)
# Display result
df

Q5 results: df.shape (31, 2)


Unnamed: 0,rhea,equation
0,11672,chorismate + L-glutamine = 4-amino-4-deoxychorismate + L-glutamate
1,11680,ATP + H2O + L-glutamine + XMP = AMP + diphosphate + GMP + 2 H(+) + L-glutamate
2,12128,2 L-glutamate + 2 oxidized [2Fe-2S]-[ferredoxin] = 2-oxoglutarate + 2 H(+) + L-glutamine + 2 reduced [2Fe-2S]-[ferredoxin]
3,12228,ATP + H2O + L-aspartate + L-glutamine = AMP + diphosphate + H(+) + L-asparagine + L-glutamate
4,12544,"2 ATP + 2 H2O + hydrogenobyrinate + 2 L-glutamine = 2 ADP + 2 H(+) + hydrogenobyrinate a,c-diamide + 2 L-glutamate + 2 phosphate"
5,13237,D-fructose 6-phosphate + L-glutamine = D-glucosamine 6-phosphate + L-glutamate
6,13753,2 L-glutamate + NAD(+) = 2-oxoglutarate + H(+) + L-glutamine + NADH
7,14513,ATP + H2O + L-aspartyl-tRNA(Asn) + L-glutamine = ADP + 2 H(+) + L-asparaginyl-tRNA(Asn) + L-glutamate + phosphate
8,14905,5-phospho-beta-D-ribosylamine + diphosphate + L-glutamate = 5-phospho-alpha-D-ribose 1-diphosphate + H2O + L-glutamine
9,15501,2 L-glutamate + NADP(+) = 2-oxoglutarate + H(+) + L-glutamine + NADPH


### <span style='color:blue'>Exercise: Modify the query to only retrieve the transport reactions</span>



In [23]:
QEx2="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q6: Retrieve all approved reactions using 
#          L-glutamate (CHEBI:29985) AND L-glutamine (CHEBI:58359)
#          in opposite reaction sides.

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>


SELECT distinct ?rhea ?equation
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .

  ?rhea rh:side ?side1 .
  ?side1 rh:contains ?participant1 .
  ?participant1 rh:compound ?compound1 .
  ?compound1 rh:chebi CHEBI:"""+chebi1+""" .

  ?rhea rh:side ?side2 .
  ?side2 rh:contains ?participant2 .
  ?participant2 rh:compound ?compound2 .
  ?compound2 rh:chebi CHEBI:""" +chebi2+ """ .
  
  # we want the two sides belonging to the same reaction
  ?side1 rh:transformableTo ?side2 .
  
  # RESTRICT TO TRANSPORT REACTION
  ?rhea rh:isTransport TRUE .
}"""

print(QEx2)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q6: Retrieve all approved reactions using 
#          L-glutamate (CHEBI:29985) AND L-glutamine (CHEBI:58359)
#          in opposite reaction sides.

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>


SELECT distinct ?rhea ?equation
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .

  ?rhea rh:side ?side1 .
  ?side1 rh:contains ?participant1 .
  ?participant1 rh:compound ?compound1 .
  ?compound1 rh:chebi CHEBI:29985 .

  ?rhea rh:side ?side2 .
  ?side2 rh:contains ?participant2 .
  ?participant2 rh:compound ?compound2 .
  ?compound2 rh:chebi CHEBI:58359 .
  
  # we want the two sides belonging to the same reaction
  ?side1 rh:transformableTo ?side2 .
  
  # RESTRICT TO TRANSPORT REACTION
  ?rhea rh:isTransport TRUE .
}


In [24]:
try:
    df  = sparql2pandas(QEx2,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query QEx2')
    print(e)

In [25]:
# Display result
df

Unnamed: 0,rhea,equation
0,70883,H(+)(out) + L-glutamate(out) + L-glutamine(in) + Na(+)(out) = H(+)(in) + L-glutamate(in) + L-glutamine(out) + Na(+)(in)


# Structural search: search by InChIKey

<div>
<img src="Image/chemical_data_formats_01.png" alt=“chemical_data_formats_01” width="70%" height="70%" border=1/>
</div>

**InChI and InChIKey:**  

<div>
<img src="Image/inchikey.png" alt=“What is an InChIKey?” width="70%" height="70%" border=1/>
</div>

**WARNING: only small molecules with fully defined structure have an InChIKey:**  

<div>
<img src="Image/chemical_data_formats_02.png" alt=“chemical_data_formats_02” width="70%" height="70%" border=1/>
</div>



## Search by full InChIKey

In [26]:
# CHEBI:29985 L-glutamate InChIKey
inchikey = 'WHUUTDBJXJRKMK-VKHMYHEASA-M'
Q7="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q7: Search ChEBI by InChiKey

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX chebislash:<http://purl.obolibrary.org/obo/chebi/>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?chebi
                ?chebiName
                ?chebiUniprotName
WHERE {
  ?chebi rdfs:label ?chebiName .
  # retrieve UniProt synonym (if it exists)
  OPTIONAL{?chebi up:name ?chebiUniprotName .}
  ?chebi chebislash:inchikey ?inchikey .
  VALUES (?inchikey) {('""" + inchikey + """')}
}
"""
print(Q7)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q7: Search ChEBI by InChiKey

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX chebislash:<http://purl.obolibrary.org/obo/chebi/>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?chebi
                ?chebiName
                ?chebiUniprotName
WHERE {
  ?chebi rdfs:label ?chebiName .
  # retrieve UniProt synonym (if it exists)
  OPTIONAL{?chebi up:name ?chebiUniprotName .}
  ?chebi chebislash:inchikey ?inchikey .
  VALUES (?inchikey) {('WHUUTDBJXJRKMK-VKHMYHEASA-M')}
}



In [27]:
try:
    df  = sparql2pandas(Q7,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query Q7')
    print(e)
    stop

# Display result
df

Unnamed: 0,chebi,chebiName,chebiUniprotName
0,CHEBI_29985,L-glutamate(1-),L-glutamate


## Search by partial InChIKey (relax charge restriction)

In [30]:
# CHEBI:29985 L-glutamate InChIKey

# modify the inchikey to relax charge restriction
#inchikey = 'WHUUTDBJXJRKMK-VKHMYHEASA-M'
inchikey = 'WHUUTDBJXJRKMK'

Q8="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q8b: Search ChEBI by InChiKey

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX chebislash:<http://purl.obolibrary.org/obo/chebi/>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?inchikey
                ?chebi
                ?chebiName
                ?chebiUniprotName
WHERE {
  ?chebi rdfs:label ?chebiName .
  OPTIONAL{?chebi up:name ?chebiUniprotName .}
  ?chebi chebislash:inchikey ?inchikey .
  FILTER regex(str(?inchikey), '""" + inchikey + """') .
}
"""
print(Q8)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q8b: Search ChEBI by InChiKey

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX chebislash:<http://purl.obolibrary.org/obo/chebi/>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?inchikey
                ?chebi
                ?chebiName
                ?chebiUniprotName
WHERE {
  ?chebi rdfs:label ?chebiName .
  OPTIONAL{?chebi up:name ?chebiUniprotName .}
  ?chebi chebislash:inchikey ?inchikey .
  FILTER regex(str(?inchikey), 'WHUUTDBJXJRKMK') .
}



In [31]:
try:
    df  = sparql2pandas(Q8,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query Q8')
    print(e)
    stop

# Display result
df

Unnamed: 0,inchikey,chebi,chebiName,chebiUniprotName
0,WHUUTDBJXJRKMK-UHFFFAOYSA-M,CHEBI_14321,glutamate(1-),glutamate
1,WHUUTDBJXJRKMK-GSVOUGTGSA-N,CHEBI_15966,D-glutamic acid,
2,WHUUTDBJXJRKMK-VKHMYHEASA-N,CHEBI_16015,L-glutamic acid,
3,WHUUTDBJXJRKMK-UHFFFAOYSA-N,CHEBI_18237,glutamic acid,
4,WHUUTDBJXJRKMK-VKHMYHEASA-M,CHEBI_29985,L-glutamate(1-),L-glutamate
5,WHUUTDBJXJRKMK-GSVOUGTGSA-M,CHEBI_29986,D-glutamate(1-),D-glutamate
6,WHUUTDBJXJRKMK-UHFFFAOYSA-L,CHEBI_29987,glutamate(2-),
7,WHUUTDBJXJRKMK-VKHMYHEASA-L,CHEBI_29988,L-glutamate(2-),
8,WHUUTDBJXJRKMK-GSVOUGTGSA-L,CHEBI_29989,D-glutamate(2-),
9,WHUUTDBJXJRKMK-UXXIZXEISA-N,CHEBI_76051,"glutamic acid-2,3,3,4,4-d5",


## <span style="color:blue">Exercise: relax constraint on stereochemistry</span>

In [34]:
# CHEBI:29985 L-glutamate InChIKey

# modify the inchikey to relax charge and stereoisomery restriction
inchikey = 'WHUUTDBJXJRKMK'

QEx3="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q8c: Search ChEBI by InChiKey

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX chebislash:<http://purl.obolibrary.org/obo/chebi/>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?inchikey
                ?chebi
                ?chebiName
                ?chebiUniprotName 
WHERE {
  ?chebi rdfs:label ?chebiName .
  OPTIONAL{?chebi up:name ?chebiUniprotName .}
  ?chebi chebislash:inchikey ?inchikey .
  FILTER regex(str(?inchikey), '""" + inchikey + """') .
}
"""
print(QEx3)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q8c: Search ChEBI by InChiKey

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX chebislash:<http://purl.obolibrary.org/obo/chebi/>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?inchikey
                ?chebi
                ?chebiName
                ?chebiUniprotName 
WHERE {
  ?chebi rdfs:label ?chebiName .
  OPTIONAL{?chebi up:name ?chebiUniprotName .}
  ?chebi chebislash:inchikey ?inchikey .
  FILTER regex(str(?inchikey), 'WHUUTDBJXJRKMK') .
}



In [35]:
try:
    df  = sparql2pandas(QEx3,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query QEx3')
    print(e)
    stop

print('QEx3 results: df.shape',df.shape)

# Display result
df.sort_values('chebiName')

QEx3 results: df.shape (11, 4)


Unnamed: 0,inchikey,chebi,chebiName,chebiUniprotName
5,WHUUTDBJXJRKMK-GSVOUGTGSA-M,CHEBI_29986,D-glutamate(1-),D-glutamate
8,WHUUTDBJXJRKMK-GSVOUGTGSA-L,CHEBI_29989,D-glutamate(2-),
1,WHUUTDBJXJRKMK-GSVOUGTGSA-N,CHEBI_15966,D-glutamic acid,
4,WHUUTDBJXJRKMK-VKHMYHEASA-M,CHEBI_29985,L-glutamate(1-),L-glutamate
7,WHUUTDBJXJRKMK-VKHMYHEASA-L,CHEBI_29988,L-glutamate(2-),
2,WHUUTDBJXJRKMK-VKHMYHEASA-N,CHEBI_16015,L-glutamic acid,
10,WHUUTDBJXJRKMK-NKXUJHECSA-N,CHEBI_192079,L-glutamic acid-d5,
0,WHUUTDBJXJRKMK-UHFFFAOYSA-M,CHEBI_14321,glutamate(1-),glutamate
6,WHUUTDBJXJRKMK-UHFFFAOYSA-L,CHEBI_29987,glutamate(2-),
3,WHUUTDBJXJRKMK-UHFFFAOYSA-N,CHEBI_18237,glutamic acid,


# ChEBI hierarchy

<div>
<img src="Image/rhea_chebi_hierarchy.png" alt=“Search through ChEBI hierarchy” width="80%" height="80%" />
</div>


<div>
<img src="Image/rhea_chebi_classification.png" alt=“We take advantage of ChEBI classification” width="80%" height="80%" />
</div>


## Q9: Select all reactions with CHEBI:31488 (N-acylsphinganine) or one of its descendant. 

In [43]:
chebi = '31488'
Q9="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q9: Select Rhea reactions using CHEBI:31488 (N-acylsphinganine) or one of its descendant as reaction participant

PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?chebi ?chebiUniprotName ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:side/rh:contains/rh:compound/rh:chebi ?chebi .
  ?chebi rdfs:subClassOf CHEBI:""" + chebi + """ .
  ?chebi up:name ?chebiUniprotName .
}
ORDER BY ?chebi
"""
print(Q9)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q9: Select Rhea reactions using CHEBI:31488 (N-acylsphinganine) or one of its descendant as reaction participant

PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?chebi ?chebiUniprotName ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:side/rh:contains/rh:compound/rh:chebi ?chebi .
  ?chebi rdfs:subClassOf CHEBI:31488 .
  ?chebi up:name ?chebiUniprotName .
}
ORDER BY ?chebi



In [45]:
try:
    df  = sparql2pandas(Q9,sparql_rhea_url)
except Exception:
    print('ERROR in SPARQL query Q9')
    stop

print('Q9 results: df.shape',df.shape)
# Display the 5 first rows
df.head(50)

Q9 results: df.shape (27, 4)


Unnamed: 0,chebi,chebiUniprotName,rhea,equation
0,CHEBI_149661,N-(13Z-docosenoyl)-sphinganine,64048,(13Z)-docosenoyl-CoA + sphinganine = CoA + H(+) + N-(13Z-docosenoyl)-sphinganine
1,CHEBI_52962,N-hexacosanoylsphinganine,33599,2 Fe(II)-[cytochrome b5] + 2 H(+) + N-hexacosanoylsphinganine + O2 = 2 Fe(III)-[cytochrome b5] + H2O + N-hexacosanoyl-(4R)-hydroxysphinganine
2,CHEBI_52962,N-hexacosanoylsphinganine,33603,H(+) + N-hexacosanoylsphinganine + NADPH + O2 = H2O + N-(2-hydroxyhexacosanoyl)-sphinganine + NADP(+)
3,CHEBI_52962,N-hexacosanoylsphinganine,33719,"a 1,2-diacyl-sn-glycero-3-phospho-(1D-myo-inositol) + N-hexacosanoylsphinganine = a 1,2-diacyl-sn-glycerol + N-(hexacosanoyl)-sphinganine-1-(1D-myo-inositol)"
4,CHEBI_52962,N-hexacosanoylsphinganine,33351,hexacosanoyl-CoA + sphinganine = CoA + H(+) + N-hexacosanoylsphinganine
5,CHEBI_67021,N-docosanoylsphinganine,36535,docosanoyl-CoA + sphinganine = CoA + H(+) + N-docosanoylsphinganine
6,CHEBI_67027,N-eicosanoylsphinganine,36555,eicosanoyl-CoA + sphinganine = CoA + H(+) + N-eicosanoylsphinganine
7,CHEBI_67033,N-(octadecanoyl)-sphinganine,45008,H2O + N-(octadecanoyl)-sphinganine = octadecanoate + sphinganine
8,CHEBI_67033,N-(octadecanoyl)-sphinganine,36547,octadecanoyl-CoA + sphinganine = CoA + H(+) + N-(octadecanoyl)-sphinganine
9,CHEBI_67042,N-hexadecanoylsphinganine,41796,"a 1,2-diacyl-sn-glycero-3-phosphocholine + N-hexadecanoylsphinganine = a 1,2-diacyl-sn-glycerol + N-hexadecanoyl-sphinganine-1-phosphocholine"


### Retrieve descendants of N-acylsphinganine used in Rhea

In [46]:
df[['chebi','chebiUniprotName']].drop_duplicates().sort_values('chebiUniprotName',ascending=False)

Unnamed: 0,chebi,chebiUniprotName
17,CHEBI_74160,an N-tetracosenoylsphinganine
21,CHEBI_83247,an N-(2-hydroxyacyl)-sphinganine
25,CHEBI_86265,"an N-(1,2-saturated acyl)sphinganine"
20,CHEBI_82841,N-octanoylsphinganine
19,CHEBI_76226,N-hexanoyl-sphinganine
9,CHEBI_67042,N-hexadecanoylsphinganine
18,CHEBI_74162,N-hexacosenoylsphinganine
1,CHEBI_52962,N-hexacosanoylsphinganine
6,CHEBI_67027,N-eicosanoylsphinganine
5,CHEBI_67021,N-docosanoylsphinganine


# Search by (sub)structure 
[IDSM](https://idsm.elixir-czech.cz/) (**I**ntegrated **D**atabase of **S**mall **M**olecules) provides Sachem chemical cartridge for fingerprint-guided substructure and similarity searches.  



## Q10: Retrieve the Rhea reactions that involve cholesterol or cholesterol derivatives

Perfom a chemical substructure search using IDSM/Sachem service.  
Input: SMILES

<div>
<img src="Image/cholesterol.png" alt=cholesterol” width="80%" height="80%" />
</div>





In [47]:
smiles = "C1[C@@]2([C@]3(CC[C@]4([C@]([C@@]3(CC=C2C[C@H](C1)O)[H])(CC[C@@]4([C@H](C)CCCC(C)C)[H])[H])C)[H])C"

Q10='''
#endpoint:https://sparql.rhea-db.org/sparql

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX sachem:<http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX idsm:<https://idsm.elixir-czech.cz/sparql/endpoint/>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX rh:<http://rdf.rhea-db.org/>

SELECT DISTINCT ?chebi 
                ?chebiUniprotName 
                (count(?rhea) AS ?countRhea) 
WHERE {
  SERVICE idsm:chebi {
    ?chebi sachem:substructureSearch
    [ sachem:query "''' + smiles + '''" ] .
  }
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:side/rh:contains/rh:compound/rh:chebi ?chebi .
  ?chebi up:name ?chebiUniprotName .
}
GROUP BY ?chebi  ?chebiUniprotName 
'''

print(Q10)


#endpoint:https://sparql.rhea-db.org/sparql

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX sachem:<http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX idsm:<https://idsm.elixir-czech.cz/sparql/endpoint/>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX rh:<http://rdf.rhea-db.org/>

SELECT DISTINCT ?chebi 
                ?chebiUniprotName 
                (count(?rhea) AS ?countRhea) 
WHERE {
  SERVICE idsm:chebi {
    ?chebi sachem:substructureSearch
    [ sachem:query "C1[C@@]2([C@]3(CC[C@]4([C@]([C@@]3(CC=C2C[C@H](C1)O)[H])(CC[C@@]4([C@H](C)CCCC(C)C)[H])[H])C)[H])C" ] .
  }
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:side/rh:contains/rh:compound/rh:chebi ?chebi .
  ?chebi up:name ?chebiUniprotName .
}
GROUP BY ?chebi  ?chebiUniprotName 



In [48]:
try:
    df  = sparql2pandas(Q10,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query 11')
    print(e)

print('Q10 results: df.shape',df.shape)

# Display the 5 first rows
df.head(5)

Q10 results: df.shape (88, 3)


Unnamed: 0,chebi,chebiUniprotName,countRhea
0,CHEBI_88756,"(6Z,9Z,12Z-octadecatrienoyl)-cholesterol",1
1,CHEBI_17703,26-hydroxycholesterol,3
2,CHEBI_180497,7-oxo-25-hydroxycholesterol,1
3,CHEBI_87653,"(25R)-3beta,26-dihydroxycholest-5-en-7-one",2
4,CHEBI_84341,"(9Z,12Z,15Z-octadecatrienoyl)-cholesterol",2


# Retrieve stoichiometric coefficients -- EXPLAIN COEFF


## Q5: Q4 + stoichiometric coefficient and type of participants

<div>
<img src="Image/rhea_reactionParticipant.png" alt=“rh:ReactionParticipant” width="80%" height="80%" border=1/>
</div>

<div>
<img src="Image/rhea_contains.png" alt=“rh:contains” width="80%" height="80%" border=1/>
</div>



In [49]:
sparql_Q5="""
#endpoint:https://sparql.rhea-db.org/sparql
#query sparql_Q5: Retrieve Rhea reactions, reaction sides, participants and ChEBI 

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>

SELECT distinct ?rhea
                ?reactionSide
                ?compoundClass
                ?accession
                ?coeff
                ?chebi

WHERE {
  VALUES ?compoundClass {
    rh:SmallMolecule
    rh:GenericPolypeptide
    rh:GenericPolynucleotide
    rh:Polymer
  }
  ?rhea rh:side ?reactionSide .
  ?reactionSide rh:contains ?participant .
  ?participant rh:compound ?compound .

  # accession
  ?compound rh:accession ?accession .

  # stoichiometric coefficient
  ?reactionSide ?contains ?participant .
  ?contains rdfs:subPropertyOf rh:contains .
  ?contains rh:coefficient ?coeff .

  # compound class
  ?compound rdfs:subClassOf ?compoundClass .

  # ChEBI participants
  {
    ?compound rh:chebi ?chebi .
  }
  UNION {
    ?compound rh:reactivePart/rh:chebi ?chebi .
  }
  UNION {
    ?compound rh:underlyingChebi ?chebi .
  }
}
"""

print(sparql_Q5)


#endpoint:https://sparql.rhea-db.org/sparql
#query sparql_Q5: Retrieve Rhea reactions, reaction sides, participants and ChEBI 

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rh:<http://rdf.rhea-db.org/>

SELECT distinct ?rhea
                ?reactionSide
                ?compoundClass
                ?accession
                ?coeff
                ?chebi

WHERE {
  VALUES ?compoundClass {
    rh:SmallMolecule
    rh:GenericPolypeptide
    rh:GenericPolynucleotide
    rh:Polymer
  }
  ?rhea rh:side ?reactionSide .
  ?reactionSide rh:contains ?participant .
  ?participant rh:compound ?compound .

  # accession
  ?compound rh:accession ?accession .

  # stoichiometric coefficient
  ?reactionSide ?contains ?participant .
  ?contains rdfs:subPropertyOf rh:contains .
  ?contains rh:coefficient ?coeff .

  # compound class
  ?compound rdfs:subClassOf ?compoundClass .

  # ChEBI participants
  {
    ?compound rh:chebi ?chebi .
  }
  UNION {
    ?compound rh:reactivePart/rh:che

In [50]:
try:
    df  = sparql2pandas(sparql_Q5,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query sparql_Q5')
    print(e)

# Display the 5 first rows
df.head(5)

Unnamed: 0,coeff,compoundClass,reactionSide,chebi,rhea,accession
0,1,SmallMolecule,10000_L,CHEBI_15377,10000,CHEBI:15377
1,1,SmallMolecule,10008_R,CHEBI_15377,10008,CHEBI:15377
2,1,SmallMolecule,10012_L,CHEBI_15377,10012,CHEBI:15377
3,1,SmallMolecule,10016_L,CHEBI_15377,10016,CHEBI:15377
4,1,SmallMolecule,10020_L,CHEBI_15377,10020,CHEBI:15377


### Process results

In [21]:
# Rhea ID as integer
df.rhea = df.rhea.astype(int)
# ChEBI ID as integer
df.chebi = df.chebi.str.replace('CHEBI_','').astype(int)

# Replace reactionSide accessions (e.g 10000_L,10000_R) by leftSide = 1 | 0
df['leftSide'] = np.where(df.reactionSide.str.split(pat='_').str.get(-1)  == 'L',1,0)
# remove reactionSide
df = df.drop(columns=['reactionSide']).copy()


print('df.shape',df.shape)

df.head()

df.shape (75381, 6)


Unnamed: 0,coeff,compoundClass,chebi,rhea,accession,leftSide
0,1,SmallMolecule,15377,10000,CHEBI:15377,1
1,1,SmallMolecule,15377,10008,CHEBI:15377,0
2,1,SmallMolecule,15377,10012,CHEBI:15377,1
3,1,SmallMolecule,15377,10016,CHEBI:15377,1
4,1,SmallMolecule,15377,10020,CHEBI:15377,1


### Retrieve data for RHEA:10000

In [51]:
df[df.rhea==10000]

Unnamed: 0,coeff,compoundClass,reactionSide,chebi,rhea,accession


# Protonation states

<div>
<img src="Image/chebi_protonation_states.png" alt=“Protonation states in ChEBI” width="70%" height="70%" border=1/>
</div>

<div>
<img src="Image/SIB_compound_normalization.png" alt=“SIB compound normalization” width="70%" height="70%" border=1/>
</div>


In [52]:
# L-glutamic acid
chebi = '16015'
sparql_Q10="""
#endpoint:https://sparql.rhea-db.org/sparql
# Query 10
# Query: retrieve the protonation state used in Rhea
# 
# This query corresponds to the Rhea web site query :
# https://www.rhea-db.org/rhea?query=chebi:16015
#
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>
PREFIX chebihash: <http://purl.obolibrary.org/obo/chebi#>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?query ?query_label ?chebi ?chebi_label ?chebiUniprotName
WHERE {
  BIND(CHEBI:""" +chebi+""" AS ?query)
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:side/rh:contains/rh:compound ?compound .
  {
    ?compound rh:chebi ?query .
  }
  UNION
  {
    ?query rdfs:subClassOf ?chebiRestriction .
    ?chebiRestriction a owl:Restriction .
    ?chebiRestriction owl:onProperty chebihash:has_major_microspecies_at_pH_7_3 .
    ?chebiRestriction owl:someValuesFrom ?chebi .

    ?compound (rh:chebi|(rh:reactivePart/rh:chebi)|(rh:underlyingChebi/rh:chebi)) ?chebi .
  }
  ?chebi up:name ?chebiUniprotName .
  ?chebi rdfs:label ?chebi_label .
  ?query rdfs:label ?query_label .
}
"""

print(sparql_Q10)


#endpoint:https://sparql.rhea-db.org/sparql
# Query 10
# Query: retrieve the protonation state used in Rhea
# 
# This query corresponds to the Rhea web site query :
# https://www.rhea-db.org/rhea?query=chebi:16015
#
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>
PREFIX chebihash: <http://purl.obolibrary.org/obo/chebi#>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?query ?query_label ?chebi ?chebi_label ?chebiUniprotName
WHERE {
  BIND(CHEBI:16015 AS ?query)
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:side/rh:contains/rh:compound ?compound .
  {
    ?compound rh:chebi ?query .
  }
  UNION
  {
    ?query rdfs:subClassOf ?chebiRestriction .
    ?chebiRestriction a owl:Restriction .
    ?chebiRestriction owl:onProperty chebihash:has_major_microspecies_at_pH_7_3 .
    ?chebiRestriction owl:someValuesFrom ?chebi .

    ?compound (rh:chebi|(rh:reactivePart/rh:chebi)|(rh:underlyingChebi/rh:chebi)) ?chebi .
  }
  ?chebi up:name ?cheb

In [53]:
try:
    df  = sparql2pandas(sparql_Q10,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query sparql_Q10')
    print(e)
    stop

print('df.shape',df.shape)
# Display result
df

df.shape (1, 5)


Unnamed: 0,query,query_label,chebi,chebiUniprotName,chebi_label
0,CHEBI_16015,L-glutamic acid,CHEBI_29985,L-glutamate,L-glutamate(1-)


## <span style='color:blue'>Exercise: Retrieve all lipids used in Rhea \(hierarchical search \+ protonation states\)</span>

tips: CHEBI:18059 (lipid)



In [60]:
# Put your code here



chebi = '18059'

sparql_QEx4="""
#endpoint:https://sparql.rhea-db.org/sparql
#query Q9: Select Rhea reactions using CHEBI:31488 (N-acylsphinganine) or one of its descendant as reaction participant

PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?chebi ?chebiUniprotName ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:side/rh:contains/rh:compound/rh:chebi ?chebi .
  ?chebi rdfs:subClassOf/rdfs:subClassOf/rdfs:subClassOf CHEBI:""" + chebi + """ .
  ?chebi up:name ?chebiUniprotName .
}
ORDER BY ?chebi
"""


print(sparql_QEx4)


#endpoint:https://sparql.rhea-db.org/sparql
#query Q9: Select Rhea reactions using CHEBI:31488 (N-acylsphinganine) or one of its descendant as reaction participant

PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX CHEBI:<http://purl.obolibrary.org/obo/CHEBI_>
PREFIX up:<http://purl.uniprot.org/core/>

SELECT distinct ?chebi ?chebiUniprotName ?rhea ?equation 
WHERE {
  ?rhea rdfs:subClassOf rh:Reaction .
  ?rhea rh:equation ?equation .
  ?rhea rh:side/rh:contains/rh:compound/rh:chebi ?chebi .
  ?chebi rdfs:subClassOf/rdfs:subClassOf/rdfs:subClassOf CHEBI:18059 .
  ?chebi up:name ?chebiUniprotName .
}
ORDER BY ?chebi



In [61]:
try:
    df  = sparql2pandas(sparql_QEx4,sparql_rhea_url)
except Exception as e:
    print('ERROR in SPARQL query sparql_QEx4')
    print(e)
    stop

# Display result
df

Unnamed: 0,chebi,chebiUniprotName,rhea,equation
0,CHEBI_10036,a wax ester,38443,a fatty acyl-CoA + a long chain fatty alcohol = a wax ester + CoA
1,CHEBI_10036,a wax ester,13577,a wax ester + H2O = a long chain fatty alcohol + a long-chain fatty acid + H(+)
2,CHEBI_11320,13-hydroxydocosanoate,22316,13-hydroxydocosanoate + UDP-alpha-D-glucose = 13-(beta-D-glucosyloxy)docosanoate + H(+) + UDP
3,CHEBI_11851,3-methyl-2-oxobutanoate,24809,"(2R)-2,3-dihydroxy-3-methylbutanoate = 3-methyl-2-oxobutanoate + H2O"
4,CHEBI_11851,3-methyl-2-oxobutanoate,11824,"(6R)-5,10-methylene-5,6,7,8-tetrahydrofolate + 3-methyl-2-oxobutanoate + H2O = (6S)-5,6,7,8-tetrahydrofolate + 2-dehydropantoate"
...,...,...,...,...
1855,CHEBI_91294,"12,18-dihydroxyoctadecanoate",49376,"12-hydroxyoctadecanoate + O2 + reduced [NADPH--hemoprotein reductase] = 12,18-dihydroxyoctadecanoate + H(+) + H2O + oxidized [NADPH--hemoprotein reductase]"
1856,CHEBI_91295,(12R)-hydroxy-(9Z)-octadecenoate,49384,"(12R)-hydroxy-(9Z)-octadecenoate + O2 + reduced [NADPH--hemoprotein reductase] = (12R),18-dihydroxy-(9Z)-octadecenoate + H(+) + H2O + oxidized [NADPH--hemoprotein reductase]"
1857,CHEBI_91295,(12R)-hydroxy-(9Z)-octadecenoate,55956,(9Z)-octadecenoate + AH2 + O2 = (12R)-hydroxy-(9Z)-octadecenoate + A + H2O
1858,CHEBI_91300,"(12R),18-dihydroxy-(9Z)-octadecenoate",49384,"(12R)-hydroxy-(9Z)-octadecenoate + O2 + reduced [NADPH--hemoprotein reductase] = (12R),18-dihydroxy-(9Z)-octadecenoate + H(+) + H2O + oxidized [NADPH--hemoprotein reductase]"
