# Reaction Ontology (RXNO) Traversal
The goal of this notebook is to: 
1. Load the Royal Society of Chemistry's Reaction Ontology, [RXNO](https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl), into RDFLib
2. Traverse that graph with SPARQL and do some investigating of its structure   
3. Structure and output the data about individual reactions

In [1]:
from rdflib import Graph
import pandas as pd
import numpy as np

input_file = "https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl"
output_file = "reactions.csv"

## Downloading and Parsing RXNO
Downloading the data and parsing it into RDFLib `Graph` is very easy using RDFLib - no need for the urllib or requests modules.

In [2]:
g = Graph()
g.parse(input_file, format='xml')

<Graph identifier=Nd3f33fa4027e4248a3c47ff8136ce118 (<class 'rdflib.graph.Graph'>)>

## Annotations
All reactions are represented as classes within RXNO. The reactions themselves are leaves within an irregular classification tree, so they can be isolated by looking for elements with no children `FILTER NOT EXISTS {?x rdfs:subClassOf ?c}`

The first query identifies what types of predicates are used on the entire group of reactions.

In [3]:
q1 = """
    PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
    PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
    PREFIX obo:    <http://purl.obolibrary.org/obo/>
    select distinct ?predicate
    where { 
        ?reaction ?predicate ?object .
        ?reaction rdfs:subClassOf ?o .
        ?reaction oboInOwl:id ?reaction_id .
        FILTER NOT EXISTS { ?x rdfs:subClassOf ?reaction }   
        FILTER regex(?reaction_id, "^RXNO")
    } 
"""
res1 = g.query(q1)
pd.DataFrame(list(res1), columns=("predicate",))

Unnamed: 0,predicate
0,http://www.w3.org/2000/01/rdf-schema#subClassOf
1,http://www.geneontology.org/formats/oboInOwl#id
2,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3,http://www.w3.org/2000/01/rdf-schema#label
4,http://www.geneontology.org/formats/oboInOwl#h...
5,http://purl.obolibrary.org/obo/IAO_0000115
6,http://www.geneontology.org/formats/oboInOwl#h...
7,http://www.w3.org/2000/01/rdf-schema#comment
8,http://www.geneontology.org/formats/oboInOwl#h...
9,http://www.geneontology.org/formats/oboInOwl#c...


## SMIRKS
[SMIRKS](http://www.daylight.com/dayhtml/doc/theory/theory.smirks.html) is one of the reaction transformation languages defined by Daylight as an extension to SMILES. Some reactions have SMIRKS annotated to them, but most don't. The process of filling in the gaps will take quite a bit of manual effort. Below is a list of all of the reactions with SMIRKS annotations already in RXNO.

In [4]:
#http://purl.obolibrary.org/obo/rxno.obo#SMIRKS
q2 = """
    select ?label ?smirks
    where {
        ?s rxno:SMIRKS ?smirks .
        ?s rdfs:label ?label .
    }
"""
res2 = g.query(q2)
pd.DataFrame(list(res2), columns=("label", "smirks"))

Unnamed: 0,label,smirks
0,Scholl reaction,[c:1].[c:2]>>[c:1][c:2]
1,Paal-Knorr pyrrole synthesis,O=[C:1]-[C:2]-[C:3]-[C:4]=O.[N:5]>>[n:5]1[c:1]...
2,Paternò–Büchi reaction,[C:1]=[O:2].[C:3]=[C:4]>>[C:1]1[O:2][C:3][C:4]1
3,Demko-Sharpless reaction,"[#6,S,N;H0:1][C:2]#[N:3].[Na+].[N-:4]=[N+:5]=[..."
4,Tishchenko reaction,[C:1](=O).[C:2](=O)>>[C:1](=O)O[C:2]
5,Crabbé homologation,[C:1]=[C:2].[C:3]=O>>[C:1]=[C:2]=[C:3]
6,Paal-Knorr thiophene synthesis,O=[C:1]-[C:2]-[C:3]-[C:4]=O.S=P13SP2(=S)SP(=S)...
7,Michael addition,[C:1].[C:2]=[C:3][C:4]=[O:5]>>[C:1][C:2][C:3][...
8,Hofmann elimination,N1[C:2][C:3][C:4][C:5][C:6]1[C:7]>>[C:2]=[C:3]...
9,Diels-Alder reaction,[C:1]=[C:2][C:3]=[C:4].[C:5]=[C:6]>>[C:1]1[C:2...


## Bringing It All Together
Annotate the id, label, parent's label, description, and smirks to each reaction

In [5]:
q3 = """
    select distinct ?rid ?label ?parent ?des ?smirks
    where { 
        ?c rdfs:subClassOf ?o .
        ?o rdfs:label ?parent .
        ?c rdfs:label ?label .
        ?c oboInOwl:id ?rid .
        ?c obo:IAO_0000115 ?des .
        OPTIONAL { ?c rxno:SMIRKS ?smirks}
        FILTER NOT EXISTS { ?x rdfs:subClassOf ?c }   
        FILTER regex(?rid, "^RXNO")
    } 
"""
res3 = g.query(q3)
df = pd.DataFrame(list(res3), columns=("id", "label", "parent_label", "description", "smirks"))
df.head()

Unnamed: 0,id,label,parent_label,description,smirks
0,RXNO:0000451,aldehyde Sakurai reaction,"Sakurai reaction, aldehyde or ketone",A carbon-carbon coupling reaction in which an ...,
1,RXNO:0000026,Beckmann rearrangement,rearrangement step,A rearrangement where an oxime rearranges to f...,
2,RXNO:0000447,Duff reaction,formylation,Formylation of a phenol or aromatic amine with...,
3,RXNO:0000061,Mukaiyama aldol condensation,aldol condensation,An aldol condensation between an aldehyde or k...,
4,RXNO:0000193,Hiyama coupling,carbon-carbon coupling reaction,A carbon-carbon coupling reaction where an org...,


In [6]:
df.to_csv(output_file)