# Knowledge Representation on the Web -- RDF tutorial

In this tutorial we'll learn the basics of interacting with RDF graphs with Python. We'll be using rdflib for this, a widely used Ptyhon library for RDF (all documentation can be found [here](https://rdflib.readthedocs.io/en/stable/index.html))

1. can not parse "https://csarven.ca/" -> add format="n3"
2. hadAge not in foaf namespace -> changed to foaf:age
3. Teacher not in foaf namespace -> changed to TEACH:Teacher, and added a section on adding a new namespace

## Imports
These are the main classes and types we'll be using from rdflib

In [1]:
import sys
!{sys.executable} -m pip install rdflib

from rdflib import Graph, ConjunctiveGraph, Literal, BNode, Namespace, RDF, URIRef
from rdflib.namespace import DC, FOAF

import pprint




## Loading data remotely and from files

rdflib accepts importing RDF data from a variety of sources, either locally from a file (including an extensive support of serializations), or remotely via a URI (this is a great way of checking practically if URIs return RDF according to the 3rd Linked Data principle).

A Graph object is always required to load triples.
**Note**: to load quads, and hence supporting named graphs, you'll need to use an instance of ConjunctiveGraph instead


In [2]:
g = Graph()
h = Graph()
result = g.parse("http://www.w3.org/People/Berners-Lee/card")

result2 = h.parse("https://csarven.ca/", format="n3")

print("Graph has %s statements." % len(g))
print("Graph has %s statements." % len(h))

Graph has 86 statements.
Graph has 525 statements.


In [4]:
with open("demo.nt", "r") as f:
    print("This is an N-Triples file")
    print("=========================")
    for line in f:
        print(line)
    
with open("demo.xml", "r") as f:
    print("This is a RDF/XML file")
    print("======================")
    for line in f:
        print(line)
    
with open("demo.ttl", "r") as f:
    print("This is a Turtle file")
    print("=====================")
    for line in f:
        print(line)

This is an N-Triples file
<http://bigasterisk.com/foaf.rdf#drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

<http://bigasterisk.com/foaf.rdf#drewp> <http://example.com/says> "Hello world" .

This is a RDF/XML file
<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF

   xmlns:ns1="http://example.com/"

   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

>

  <rdf:Description rdf:about="http://bigasterisk.com/foaf.rdf#drewp">

    <ns1:says>Hello world</ns1:says>

    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

  </rdf:Description>

</rdf:RDF>

This is a Turtle file
@prefix ns1: <http://example.com/> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix xml: <http://www.w3.org/XML/1998/namespace> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .



<http://bigasterisk.com/foaf.rdf#drewp> a <http://xmlns.com/foaf/0.1/Person> ;

    ns1:says "H

In [None]:
g.serialize(destination='output.ttl', format='turtle')    

In [5]:
g = Graph()
g.parse("demo.ttl", format='turtle')

len(g) # prints 2

2

In [6]:
for stmt in g:
    pprint.pprint(stmt)

(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://example.com/says'),
 rdflib.term.Literal('Hello world'))
(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))


In [7]:
# You can also parse directly from a string
g = Graph()

g.parse(data = '<urn:a> <urn:p> <urn:b>.', format='n3')
for stmt in g:
    pprint.pprint(stmt)

(rdflib.term.URIRef('urn:a'),
 rdflib.term.URIRef('urn:p'),
 rdflib.term.URIRef('urn:b'))


## Saving RDF graphs

We use the function Graph.serialize(format)

In [8]:
g = Graph()
g.parse("demo.nt", format='n3')
# for a,b,c in g:
#     print(a,b,c)

print(g.serialize(format='nt').decode()) # 'html', 'hturtle', 'mdata', 'microdata', 'n3', 'nquads', 'nt', 'rdfa', 'rdfa1.0', 'rdfa1.1', 'trix', 'turtle', 'xml'



<http://bigasterisk.com/foaf.rdf#drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://bigasterisk.com/foaf.rdf#drewp> <http://example.com/says> "Hello world" .




##  Merging graphs

Merging graphs can be done via sequential parsings or by the overloaded operator +

**Note:** Set-theoretic graph semantics apply

In [9]:
graph = Graph()

# Sequential parsings merge *new* triples

graph.parse("demo.nt", format='nt')
graph.parse("demo.xml", format='xml')

print("Graph has {} triples".format(len(graph)))

Graph has 2 triples


In [10]:
g1 = Graph()
g1.parse("demo.nt", format='nt')
print("g1 has {} triples".format(len(g1)))

g2 = Graph()
g2.parse("demo.xml", format='xml')
print("g2 has {} triples".format(len(g2)))

graph = g1 + g2
print("g1 + g2 has {} triples".format(len(graph)))

g1 has 2 triples
g2 has 2 triples
g1 + g2 has 2 triples


In [11]:
# Now, if we merge graphs with different contents

tim_g = Graph()
tim_g.parse("http://www.w3.org/People/Berners-Lee/card")
print("Tim graph has {} triples".format(len(tim_g)))

g3 = g1 + tim_g
print("g3 has {} triples".format(len(g3)))

Tim graph has 86 triples
g3 has 88 triples



## Creating RDF triples

Triples are added to the graph with the function Graph.add()

The parameter is a triple given in a Python **tuple** (subject, predicate, object)

Notice the namespace convenience syntax!

In [12]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()
# donna = URIRef("http://example.org/donna")

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

for i in range(100):
    g.add((donna, FOAF.age, Literal(i)))

print(len(g))

for s in g:
    pprint.pprint(s)
    print()


104
(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'),
 rdflib.term.Literal('25', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'),
 rdflib.term.Literal('66', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'),
 rdflib.term.Literal('51', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'),
 rdflib.term.Literal('95', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'),
 rdflib.term.Literal('93', dataty

In [13]:
for stmt in g:
    print(stmt)

(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'), rdflib.term.Literal('25', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'), rdflib.term.Literal('66', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'), rdflib.term.Literal('51', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'), rdflib.term.Literal('95', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('Na793f5c4a7114b81bb41f17862dbad11'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/age'), rdflib.term.Literal('93', datatype=rdflib.term.URI

In [14]:
print(FOAF.Person)

http://xmlns.com/foaf/0.1/Person


In [17]:
print(FOAF.imadethisup)

AttributeError: "term 'imadethisup' not in namespace 'http://xmlns.com/foaf/0.1/'"

## Namespaces 
The namespace module defines many common namespaces such as RDF, RDFS, OWL, FOAF, SKOS, etc., but you can also easily add URIs within a different namespace:


In [24]:
TEACH = Namespace("http://linkedscience.org/teach/ns#")
TEACH.Teacher

rdflib.term.URIRef('http://linkedscience.org/teach/ns#Teacher')

Check out the specification to see which other terms are used within the TEACH namespace. http://linkedscience.org/teach/ns/#sec-specification. 
You can use a NamespaceManager to bind a prefix to a namespace: 

In [19]:
g.namespace_manager.bind('TEACH', URIRef('http://linkedscience.org/teach/ns#'))
TEACH.Teacher.n3(g.namespace_manager)

'TEACH:Teacher'

In [29]:
KRW = Namespace("http://krw.vu.nl/data#")
KRW.Teacher
KRW.Student


rdflib.term.URIRef('http://krw.vu.nl/data#Student')

## Navigating graphs

rdflib uses iterators to navigate Graphs. The methods for navigating subjects, predicates and objects are Graph.subjects, Graph.predicates, Graph.objects

In [20]:
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print(p)

--- printing raw triples ---
http://xmlns.com/foaf/0.1/member
http://xmlns.com/foaf/0.1/maker
http://www.w3.org/2000/10/swap/pim/contact#street
http://rdfs.org/sioc/ns#avatar
http://www.w3.org/2006/vcard/ns#street-address
http://www.w3.org/2003/01/geo/wgs84_pos#lat
http://xmlns.com/foaf/0.1/maker
http://www.w3.org/2006/vcard/ns#hasAddress
http://www.w3.org/ns/pim/space#preferencesFile
http://xmlns.com/foaf/0.1/account
http://xmlns.com/foaf/0.1/maker
http://xmlns.com/foaf/0.1/account
http://www.w3.org/2000/10/swap/pim/contact#homePage
http://www.w3.org/2003/01/geo/wgs84_pos#long
http://www.w3.org/ns/solid/terms#profileHighlightColor
http://xmlns.com/foaf/0.1/based_near
http://usefulinc.com/ns/doap#developer
http://www.w3.org/2000/10/swap/pim/contact#country
http://xmlns.com/foaf/0.1/maker
http://schema.org/owns
http://purl.org/dc/elements/1.1/title
http://usefulinc.com/ns/doap#developer
http://xmlns.com/foaf/0.1/title
http://xmlns.com/foaf/0.1/workplaceHomepage
http://xmlns.com/foaf/0.1

In [21]:
# Printing subjects, predicates and objects out of the tuple omits Python datatypes
print("--- printing raw triples ---")
for s, p, o in g:
    print(s, p, o)

--- printing raw triples ---
http://dig.csail.mit.edu/data#DIG http://xmlns.com/foaf/0.1/member https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card http://xmlns.com/foaf/0.1/maker https://www.w3.org/People/Berners-Lee/card#i
N4b04e4b796334726b2e922d66e6ee606 http://www.w3.org/2000/10/swap/pim/contact#street 32 Vassar Street
https://www.w3.org/People/Berners-Lee/card#i http://rdfs.org/sioc/ns#avatar http://www.w3.org/People/Berners-Lee/images/timbl-image-by-Coz-cropped.jpg
N95834223ae17482ab5740e011402d730 http://www.w3.org/2006/vcard/ns#street-address 32 Vassar Street
Nd45e1b87c3d8472bb648d16c784acab7 http://www.w3.org/2003/01/geo/wgs84_pos#lat 42.361860
http://dig.csail.mit.edu/2007/01/camp/data#course http://xmlns.com/foaf/0.1/maker https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2006/vcard/ns#hasAddress N95834223ae17482ab5740e011402d730
https://www.w3.org/People/Berners-Lee/card#i http://w

In [22]:
print("PRINTING SUBJECTS")
for s in g.subjects():
    print(s)

PRINTING SUBJECTS
http://dig.csail.mit.edu/data#DIG
http://www.w3.org/People/Berners-Lee/card
N4b04e4b796334726b2e922d66e6ee606
https://www.w3.org/People/Berners-Lee/card#i
N95834223ae17482ab5740e011402d730
Nd45e1b87c3d8472bb648d16c784acab7
http://dig.csail.mit.edu/2007/01/camp/data#course
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/DesignIssues/Overview.html
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
Nd45e1b87c3d8472bb648d16c784acab7
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator
N4b04e4b796334726b2e922d66e6ee606
http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card
http://www.w3.org/2000/10/swap/data#Cwm
https://www.w3.org/People/Berners-Lee/car

In [23]:
print("PRINTING PREDICATES")
for p in g.predicates():
    if len(p) > 13:
        print(p)


PRINTING PREDICATES
http://xmlns.com/foaf/0.1/member
http://xmlns.com/foaf/0.1/maker
http://www.w3.org/2000/10/swap/pim/contact#street
http://rdfs.org/sioc/ns#avatar
http://www.w3.org/2006/vcard/ns#street-address
http://www.w3.org/2003/01/geo/wgs84_pos#lat
http://xmlns.com/foaf/0.1/maker
http://www.w3.org/2006/vcard/ns#hasAddress
http://www.w3.org/ns/pim/space#preferencesFile
http://xmlns.com/foaf/0.1/account
http://xmlns.com/foaf/0.1/maker
http://xmlns.com/foaf/0.1/account
http://www.w3.org/2000/10/swap/pim/contact#homePage
http://www.w3.org/2003/01/geo/wgs84_pos#long
http://www.w3.org/ns/solid/terms#profileHighlightColor
http://xmlns.com/foaf/0.1/based_near
http://usefulinc.com/ns/doap#developer
http://www.w3.org/2000/10/swap/pim/contact#country
http://xmlns.com/foaf/0.1/maker
http://schema.org/owns
http://purl.org/dc/elements/1.1/title
http://usefulinc.com/ns/doap#developer
http://xmlns.com/foaf/0.1/title
http://xmlns.com/foaf/0.1/workplaceHomepage
http://xmlns.com/foaf/0.1/mbox_sha

In [None]:
print("PRINTING OBJECTS")
for o in g.objects():
    print(o)

We can also filter the subjects, predicates and objects we want to retrieve, and match their values like in a database "join" operation

In [None]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = URIRef('urn:donna')
ila = URIRef('urn:ila')
# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (ila, RDF.type, FOAF.Person) )
g.add( (ila, RDF.type, TEACH.Teacher) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )
g.add( (ila, FOAF.mbox, URIRef("mailto:ila@example.org")) )

# For each foaf:Person in the store print out its mbox property.
# print("--- printing mboxes ---")
# for person in g.subjects(RDF.type, FOAF.Person):
#     for mbox in g.objects(person, FOAF.mbox):
#         print(mbox)

# You can reuse matches of subjects to filter further e.g. objects
for entity in g.subjects(RDF.type, None):
    print(entity)
    for objects in g.objects(entity, RDF.type):
        print(objects)

### Basic triple matching (almost querying!)

We use method Graph.triples and a Python tuple that acts as a mask for specifying our criteria

In [None]:
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

tim = URIRef("https://www.w3.org/People/Berners-Lee/card#i")

if ( tim, RDF.type, FOAF.Person ) in g:
   print("This graph knows that Tim is a person!")

if ( tim, None, None ) in g:
    print("This graph contains triples about Tim!")

In [None]:
for s,p,o in g.triples( (None, None, None) ):
    print(s,p,o)

In [None]:
for s,p,o in g.triples( (tim, RDF.type, None) ):
    print(s,p,o)

## Namespaces and bindings

In [None]:
mid_uri = URIRef("http://purl.org/midi-ld/midi#")
mid = Namespace(mid_uri)

print(mid['hello'])  # as item - for things that are not valid python identifiers
print(mid.hello)     # as attribute

In [None]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

print(g.serialize(format='turtle').decode())

In [None]:
foaf_uri = URIRef("http://xmlns.com/foaf/0.1/")
foaf_namespace = Namespace(foaf_uri)

g = Graph()

# Bind a few prefix, namespace pairs for more readable output
g.bind("foaf", foaf_namespace)

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

print(g.serialize(format='turtle').decode())