# RDF & SPARQL 

<a href="https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Sparql.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we will explore basic RDF and SPARQL queries.

# Setup Environment

In [None]:
%%capture
!pip3 install rdflib sparqlwrapper pydotplus graphviz

In [None]:
import rdflib
from rdflib import Graph
from rdflib.namespace import DC, RDF, FOAF, RDFS
from rdflib import URIRef, BNode, Literal
import networkx as nx
import io
import pydotplus
from IPython.display import display, Image
from rdflib.tools.rdf2dot import rdf2dot

In [None]:
# Helper function for vizualizing RDF graph
def visualize(g):
    stream = io.StringIO()
    rdf2dot(g, stream, opts = {display})
    dg = pydotplus.graph_from_dot_data(stream.getvalue())
    png = dg.create_png()
    display(Image(png)) 

# First Graph

In [None]:
g = Graph()
# Graph using N3 syntax
n3data = """\
@prefix : <http://www.snee.com/ns/demo#> .
:Jane :hasParent :Gene .
:Gene :hasParent :Pat ;
      :gender    :female .
:Joan :hasParent :Pat ;
      :gender    :female .
:Pat  :gender    :male .
:Mike :hasParent :Joan ."""
g.parse(data=n3data, format="n3")

Let us print all tripes:

In [None]:
#print all triples
for s, p, o in g:
   print((s, p, o))

As this is hard to read, let us visualize the RDF graph:

In [None]:
visualize(g)


As we have global identifiers we an also look up facts (triples) about a specific entity:

In [None]:
# Lookup Jane by global identifier
jane = URIRef('http://www.snee.com/ns/demo#Jane')
print([o for o in g.predicate_objects(subject=jane)])

# Custom Graph

Next, let us create a graph explicitly (i.e., by constructing nodes). 

Nodes can have different types (URI, B(lank)Node, or Literals.

In [None]:
bob = URIRef("http://example.org/people/Bob")
linda = BNode() #  Blank node with a autogenerated GUID

name = Literal('Bob') # passing a string
age = Literal(24) # passing a python int
height = Literal(76.5) # passing a python float

g = Graph()

g.add( (bob, RDF.type, FOAF.Person) )
g.add( (bob, FOAF.name, name) )
g.add( (bob, FOAF.knows, linda) )
g.add( (linda, RDF.type, FOAF.Person) )
g.add( (linda, FOAF.name, Literal('Linda') ) )

#print all triples
for s, p, o in g:
   print((s, p, o))

# Visualize the graph for easy interpretation
visualize(g)

# SPARQL

SPARQL allows us to query our graph using a SQL like language:

In [None]:
# list all facts (triples)
result = g.query(
    """SELECT *
  WHERE
  {?s ?p ?o}
""")

# Output result
for row in result:
    print(row)


We can leverage URIs, variables, and predicates to specify pattern we are looking for. In this case we want to idenify all pairs of people knowing each other. 

In [None]:
result = g.query(
    """SELECT DISTINCT ?aname ?bname
       WHERE {
          ?a foaf:knows ?b .
          ?a foaf:name ?aname .
          ?b foaf:name ?bname .
       }""", initNs={ 'foaf': FOAF })

# Output result
for row in result:
    print("%s knows %s" % row)

# Import external Data

There are a large number of RDF data sources available on the web, which we can leverage:

In [None]:
g1 = rdflib.Graph()
g1.parse("http://www.w3.org/People/Berners-Lee/card")

print("Graph has %s statements." % len(g1))

# print all tuples
for s, p, o in g1:
   print((s, p, o))

# RDF Schema

RDF Schema allows to specify classes and hierachies. These hierachies can be leverages for reasoning/inference. 

In [None]:
g = Graph()
# Adapted from https://www.w3.org/TR/rdf-primer/
n3vehicledata = """\
@prefix ex: <http://example.org/schemas/vehicles#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:MotorVehicle       rdf:type          rdfs:Class .
ex:PassengerVehicle   rdf:type          rdfs:Class .
ex:Van                rdf:type          rdfs:Class .
ex:Truck              rdf:type          rdfs:Class .
ex:MiniVan            rdf:type          rdfs:Class .

ex:PassengerVehicle   rdfs:subClassOf   ex:MotorVehicle .
ex:Van                rdfs:subClassOf   ex:MotorVehicle .
ex:Truck              rdfs:subClassOf   ex:MotorVehicle .

ex:MiniVan            rdfs:subClassOf   ex:Van .
ex:MiniVan            rdfs:subClassOf   ex:PassengerVehicle .
"""


g.parse(data=n3vehicledata, format="n3")
#print all triples
for s, p, o in g:
   print((s, p, o))

# Visualize the graph for easy interpretation
visualize(g)

As rdfs:SubClassOf+ is transitive and reflexive, we can deduce facts which are not directly stated in the original triples.

In [None]:
# Which vehicles are MotorVehicles
result = g.query(
    """SELECT DISTINCT ?s
  WHERE
  {
    ?s ?p ?o .
    ?s rdfs:subClassOf+ ex:MotorVehicle .
  }""", initNs={ 'rdfs': RDFS, 'rdf' : RDF, 'ex' : 'http://example.org/schemas/vehicles#' })


for row in result:
    print(row)



Note, that also MiniVan is showing up as MotorVehicle despite there is no direct fact about this. Feel free to check original statements about the subject subject MiniVan.

# DBpedia

As discussed there are a number of public RDF data sources available. [DBpedia](https://wiki.dbpedia.org/) is a semantic version of Wikipedia. 

Let us query DBpedia to identify birthdays of scientists (adapted from https://open.hpi.de/courses/knowledgegraphs2020).

In [None]:
from datetime import datetime
from SPARQLWrapper import SPARQLWrapper, JSON, XML, N3, RDF

In [None]:
sparql = SPARQLWrapper("http://dbpedia.org/sparql") #determine SPARQL endpoint

In [None]:
sparql.setQuery("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc:  <http://purl.org/dc/elements/1.1/>

Select distinct ?birthdate ?scientist ?name ?description  WHERE {
?scientist rdf:type dbo:Scientist ;
        dbo:birthDate ?birthdate ;
        rdfs:label ?name ;
        rdfs:comment ?description 
 FILTER ((lang(?name)="en")&&(lang(?description)="en")&&(STRLEN(STR(?birthdate))>6)&&(SUBSTR(STR(?birthdate),6)=SUBSTR(STR(bif:curdate('')),6))) .
} ORDER BY ?birthdate
""")

sparql.setReturnFormat(JSON)   # Return format is JSON
results = sparql.query().convert()   # execute SPARQL query and write result to "results"

In [None]:
print(results)