## Obtaining a Pandas dataframe from a SPARQL query

The package supporting SPARQL endpoint queries in Python is **`SPARQLWrapper`**.

The primary return formats of SPARQL endpoints for SELECT queries, XML and JSON, can be a bit unwieldy to deal with programmatically if the goal is a tabular object:

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper(endpoint="https://ubergraph.apps.renci.org/sparql",
                       returnFormat=JSON)
sparql.setQuery("""
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?o (STR(?oLabel) AS ?label)
FROM <http://reasoner.renci.org/ontology>
FROM <http://reasoner.renci.org/nonredundant>
WHERE {
  obo:UBERON_0001423 rdfs:subClassOf ?o .
  ?o rdfs:label ?oLabel .
}
""")

In [2]:
sparql.query().convert()

{'head': {'vars': ['o', 'label']},
 'results': {'bindings': [{'o': {'type': 'uri',
     'value': 'http://purl.obolibrary.org/obo/UBERON_0003466'},
    'label': {'type': 'literal', 'value': 'forelimb zeugopod bone'}},
   {'o': {'type': 'uri',
     'value': 'http://purl.obolibrary.org/obo/UBERON_0003607'},
    'label': {'type': 'literal', 'value': 'forelimb long bone'}},
   {'o': {'type': 'uri',
     'value': 'http://purl.obolibrary.org/obo/UBERON_0015001'},
    'label': {'type': 'literal', 'value': 'radius endochondral element'}}]}}

There are other return formats possible (for tabular results, in particular CSV and TSV), but they can require using POST and HTTP content negotation:

In [3]:
from SPARQLWrapper import CSV

sparql.setReturnFormat(CSV)
sparql.query().convert()



<xml.dom.minidom.Document at 0xffff8cbfa2d0>

The server falls back to RDF/XML, which one may not expect. Making it work requires two more adjustments:

In [4]:
from SPARQLWrapper import POST

sparql.setReturnFormat(CSV)
sparql.setOnlyConneg(True)
sparql.setMethod(POST)
res = sparql.query().convert()
res

b'o,label\r\nhttp://purl.obolibrary.org/obo/UBERON_0003466,forelimb zeugopod bone\r\nhttp://purl.obolibrary.org/obo/UBERON_0003607,forelimb long bone\r\nhttp://purl.obolibrary.org/obo/UBERON_0015001,radius endochondral element\r\n'

Now at least we have CSV (even if not with Unix EOLs, and even if byte-encoded), which we can decode and convert into a stream to read into Pandas:

In [5]:
from io import StringIO
import pandas as pd

pd.read_csv(StringIO(res.decode("utf-8")))

Unnamed: 0,o,label
0,http://purl.obolibrary.org/obo/UBERON_0003466,forelimb zeugopod bone
1,http://purl.obolibrary.org/obo/UBERON_0003607,forelimb long bone
2,http://purl.obolibrary.org/obo/UBERON_0015001,radius endochondral element


Fortunately, `SPARQLWrapper.get_sparql_dataframe()` is also available as a much more concise version, and also more type-aware. It uses JSON return format under the hood, which includes type information (though for this query it doesn't matter):

In [6]:
from SPARQLWrapper import get_sparql_dataframe

df = get_sparql_dataframe(endpoint="https://ubergraph.apps.renci.org/sparql", query="""
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?o (STR(?oLabel) AS ?label)
FROM <http://reasoner.renci.org/ontology>
FROM <http://reasoner.renci.org/nonredundant>
WHERE {
  obo:UBERON_0001423 rdfs:subClassOf ?o .
  ?o rdfs:label ?oLabel .
}
""")
df

Unnamed: 0,o,label
0,http://purl.obolibrary.org/obo/UBERON_0003466,forelimb zeugopod bone
1,http://purl.obolibrary.org/obo/UBERON_0003607,forelimb long bone
2,http://purl.obolibrary.org/obo/UBERON_0015001,radius endochondral element
