# RXNO Traversal
The goal of this notebook is to: 
1. Load the Royal Society of Chemistry's Reaction Ontology, [RXNO](https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl), into RDFLib
2. Traverse that graph with SPARQL
3. Structure the data about individual reactions
4. Output to CSV or relational database

In [1]:
import rdflib
from rdflib import Graph
import pandas as pd
import numpy as np

input_file = "https://raw.githubusercontent.com/rsc-ontologies/rxno/master/rxno.owl"
output_file = "reactions.csv"

Download the data and parse into RDFLib `Graph`

In [2]:
g = Graph()
g.parse(input_file, format='xml')

<Graph identifier=N0b52165575f44a818c5bd311d08c4cd7 (<class 'rdflib.graph.Graph'>)>

Traverse the data to find only classes that don't have any subclasses

In [3]:
q = """
    PREFIX  rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
    PREFIX  foaf:   <http://xmlns.com/foaf/0.1/> 
    PREFIX obo:    <http://purl.obolibrary.org/obo/>
    select distinct ?rid ?label ?parent ?des
    where { 
        ?c rdfs:subClassOf ?o .
        ?o rdfs:label ?parent .
        ?c rdfs:label ?label .
        ?c oboInOwl:id ?rid .
        ?c obo:IAO_0000115 ?des .
        FILTER NOT EXISTS { ?x rdfs:subClassOf ?c }   
        FILTER regex(?rid, "^RXNO")
    } 
"""

res = g.query(q)

Wrangle the results into a Pandas DataFrame

In [4]:
df = pd.DataFrame(list(res), columns=("id", "name", "parent", "details"))
df

Unnamed: 0,id,name,parent,details
0,RXNO:0000276,Aufbau reaction,polymerisation reaction,The insertion of alkenes (usually ethene) into...
1,RXNO:0000350,alkyne reduction to alkane,hydrogenation,A hydrogenation reaction where an alkyne is re...
2,RXNO:0000183,Perkow reaction,molecular skeleton joining reaction,The reaction between an alpha-halocarbonyl com...
3,RXNO:0000146,Corey-Fuchs reaction,chain lengthening,A homologation reaction of an aldehyde to yiel...
4,RXNO:0000208,aromatic Claisen rearrangement,rearrangement step,"The [3,3]-sigmatropic rearrangement, and subse..."
5,RXNO:0000213,"Sakurai reaction, enone",Sakurai reaction,A carbon-carbon coupling reaction in which an ...
6,RXNO:0000040,Ullmann reaction,carbon-carbon homocoupling reaction,A carbon-carbon homocoupling reaction of an ar...
7,RXNO:0000538,Chugaev reaction,molecular skeleton elimination reaction,Elimination of a water molecule from a seconda...
8,RXNO:0000159,Doebner-Miller reaction,fused-ring-system formation,The formation of a quinoline from a primary ar...
9,RXNO:0000419,phthalimide deprotection,amino group deprotection,Conversion of a phthalimide into the correspon...


Output Structured Data to CSV

In [5]:
df.to_csv(output_file)