# RXNO Traversal
The goal of this notebook is to: 
1. Load the Royal Society of Chemistry's Reaction Ontology, [RXNO](https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl), into RDFLib
2. Traverse that graph with SPARQL
3. Structure the data about individual reactions
4. Output to CSV or relational database

In [1]:
from rdflib import Graph
import pandas as pd
import numpy as np

input_file = "https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl"
output_file = "reactions.csv"

Download the data and parse into RDFLib `Graph`

In [2]:
g = Graph()
g.parse(input_file, format='xml')

<Graph identifier=N4f4ad04ed3cc418da7a89d807899d9a1 (<class 'rdflib.graph.Graph'>)>

Traverse the data to find only classes that don't have any subclasses

In [3]:
q = """
    PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
    PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
    PREFIX obo:    <http://purl.obolibrary.org/obo/>
    select distinct ?rid ?label ?parent ?des
    where { 
        ?c rdfs:subClassOf ?o .
        ?o rdfs:label ?parent .
        ?c rdfs:label ?label .
        ?c oboInOwl:id ?rid .
        ?c obo:IAO_0000115 ?des .
        FILTER NOT EXISTS { ?x rdfs:subClassOf ?c }   
        FILTER regex(?rid, "^RXNO")
    } 
"""

res = g.query(q)

Wrangle the results into a Pandas DataFrame

In [4]:
df = pd.DataFrame(list(res), columns=("id", "name", "parent", "details"))
df.head()

Unnamed: 0,id,name,parent,details
0,RXNO:0000550,Collins oxidation,alcohol oxidation,Oxidation of a primary alcohol with a pyridine...
1,RXNO:0000340,amide N-alkylation,N-alkylation,An N-alkylation where the reactive centre is a...
2,RXNO:0000038,Clemmensen reduction,functional group reduction,A functional group reduction where an aldehyde...
3,RXNO:0000050,pinacol rearrangement,rearrangement step,"A rearrangement of a 1,2-diol to give a carbon..."
4,RXNO:0000045,Friedel-Crafts acylation,Friedel-Crafts reaction,A carbon-carbon coupling reaction between an a...


Output Structured Data to CSV

In [5]:
df.to_csv(output_file)