# RXNO Traversal
The goal of this notebook is to: 
1. Load the Royal Society of Chemistry's Reaction Ontology, [RXNO](https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl), into RDFLib
2. Traverse that graph with SPARQL
3. Structure the data about individual reactions
4. Output to CSV or relational database

In [1]:
from rdflib import Graph
import pandas as pd
import numpy as np

input_file = "https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl"
output_file = "reactions.csv"

Download the data and parse into RDFLib `Graph`

In [2]:
g = Graph()
g.parse(input_file, format='xml')

<Graph identifier=Nfbad4c2752cf46bd9ab10756b3744571 (<class 'rdflib.graph.Graph'>)>

Traverse the data to find only classes that don't have any subclasses

Example: http://www.ontobee.org/ontology/RXNO?iri=http://purl.obolibrary.org/obo/RXNO_0000251

In [3]:
q = """
    PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
    PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
    PREFIX obo:    <http://purl.obolibrary.org/obo/>
    select distinct ?rid ?label ?parent ?des
    where { 
        ?c rdfs:subClassOf ?o .
        ?o rdfs:label ?parent .
        ?c rdfs:label ?label .
        ?c oboInOwl:id ?rid .
        ?c obo:IAO_0000115 ?des .
        FILTER NOT EXISTS { ?x rdfs:subClassOf ?c }   
        FILTER regex(?rid, "^RXNO")
    } 
"""

res = g.query(q)

Wrangle the results into a Pandas DataFrame

In [4]:
df = pd.DataFrame(list(res), columns=("id", "name", "parent", "details"))
df.head()

Unnamed: 0,id,name,parent,details
0,RXNO:0000522,Betti reaction,molecular skeleton joining reaction,"Reaction of a phenol, an aromatic aldehyde and..."
1,RXNO:0000394,Knorr quinoline synthesis,quinoline synthesis,Synthesis of an alpha-hydroxyquinoline by reac...
2,RXNO:0000083,Paternò–Büchi reaction,[2+2] cycloaddition,A photochemical cycloaddition between an aldeh...
3,RXNO:0000496,Barton-Zard reaction,pyrrole synthesis,Reaction of a nitroalkene with an alpha-isocya...
4,RXNO:0000276,Aufbau reaction,polymerisation reaction,The insertion of alkenes (usually ethene) into...


Output Structured Data to CSV

In [5]:
df.to_csv(output_file)