# Knowledge Representation on the Web -- RDF tutorial

In this tutorial we'll learn the basics of interacting with RDF graphs with Python. We'll be using rdflib for this, a widely used Ptyhon library for RDF (all documentation can be found [here](https://rdflib.readthedocs.io/en/stable/index.html))

## Imports
These are the main classes and types we'll be using from rdflib

In [1]:
import sys
!{sys.executable} -m pip install rdflib

from rdflib import Graph, ConjunctiveGraph, Literal, BNode, Namespace, RDF, URIRef
from rdflib.namespace import DC, FOAF

import pprint




## Loading data remotely and from files

rdflib accepts importing RDF data from a variety of sources, either locally from a file (including an extensive support of serializations), or remotely via a URI (this is a great way of checking practically if URIs return RDF according to the 3rd Linked Data principle).

A Graph object is always required to load triples.
**Note**: to load quads, and hence supporting named graphs, you'll need to use an instance of ConjunctiveGraph instead


In [2]:
g = Graph()
h = Graph()
result = g.parse("http://www.w3.org/People/Berners-Lee/card")

result2 = h.parse("https://csarven.ca/")

print("Graph has %s statements." % len(g))
print("Graph has %s statements." % len(h))


Graph has 86 statements.
Graph has 50 statements.


In [5]:
with open("demo.nt", "r") as f:
    print("This is an N-Triples file")
    print("=========================")
    for line in f:
        print(line)
    
with open("demo.xml", "r") as f:
    print("This is a RDF/XML file")
    print("======================")
    for line in f:
        print(line)
    
with open("demo.ttl", "r") as f:
    print("This is a Turtle file")
    print("=====================")
    for line in f:
        print(line)

This is an N-Triples file
<http://bigasterisk.com/foaf.rdf#drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .

<http://bigasterisk.com/foaf.rdf#drewp> <http://example.com/says> "Hello world" .

This is a RDF/XML file
<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF

   xmlns:ns1="http://example.com/"

   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

>

  <rdf:Description rdf:about="http://bigasterisk.com/foaf.rdf#drewp">

    <ns1:says>Hello world</ns1:says>

    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

  </rdf:Description>

</rdf:RDF>

This is a Turtle file
@prefix ns1: <http://example.com/> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix xml: <http://www.w3.org/XML/1998/namespace> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .



<http://bigasterisk.com/foaf.rdf#drewp> a <http://xmlns.com/foaf/0.1/Person> ;

    ns1:says "H

In [6]:
g = Graph()
g.parse("demo.ttl", format='turtle')

len(g) # prints 2

2

In [7]:
for stmt in g:
    pprint.pprint(stmt)

(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://example.com/says'),
 rdflib.term.Literal('Hello world'))


In [8]:
# You can also parse directly from a string
g = Graph()

g.parse(data = '<urn:a> <urn:p> <urn:b>.', format='n3')
for stmt in g:
    pprint.pprint(stmt)

(rdflib.term.URIRef('urn:a'),
 rdflib.term.URIRef('urn:p'),
 rdflib.term.URIRef('urn:b'))


## Saving RDF graphs

We use the function Graph.serialize(format)

In [9]:
g = Graph()
g.parse("demo.nt", format='n3')
# for a,b,c in g:
#     print(a,b,c)

print(g.serialize(format='nt').decode()) # 'html', 'hturtle', 'mdata', 'microdata', 'n3', 'nquads', 'nt', 'rdfa', 'rdfa1.0', 'rdfa1.1', 'trix', 'turtle', 'xml'

<http://bigasterisk.com/foaf.rdf#drewp> <http://example.com/says> "Hello world" .
<http://bigasterisk.com/foaf.rdf#drewp> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .




##  Merging graphs

Merging graphs can be done via sequential parsings or by the overloaded operator +

**Note:** Set-theoretic graph semantics apply

In [10]:
graph = Graph()

# Sequential parsings merge *new* triples

graph.parse("demo.nt", format='nt')
graph.parse("demo.xml", format='xml')

print("Graph has {} triples".format(len(graph)))

Graph has 2 triples


In [11]:
g1 = Graph()
g1.parse("demo.nt", format='nt')
print("g1 has {} triples".format(len(g1)))

g2 = Graph()
g2.parse("demo.xml", format='xml')
print("g2 has {} triples".format(len(g2)))

graph = g1 + g2
print("g1 + g2 has {} triples".format(len(graph)))

g1 has 2 triples
g2 has 2 triples
g1 + g2 has 2 triples


In [12]:
# Now, if we merge graphs with different contents

tim_g = Graph()
tim_g.parse("http://www.w3.org/People/Berners-Lee/card")
print("Tim graph has {} triples".format(len(tim_g)))

g3 = g1 + tim_g
print("g3 has {} triples".format(len(g3)))

Tim graph has 86 triples
g3 has 88 triples



## Creating RDF triples

Triples are added to the graph with the function Graph.add()

The parameter is a triple given in a Python **tuple** (subject, predicate, object)

Notice the namespace convenience syntax!

In [14]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()
# donna = URIRef("http://example.org/donna")

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

for i in range(100):
    g.add((donna, FOAF.hadAge, Literal(i)))

print(len(g))

for s in g:
    pprint.pprint(s)
    print()


104
(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('48', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('49', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('12', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Litera

In [15]:
for stmt in g:
    print(stmt)

(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'), rdflib.term.Literal('5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'), rdflib.term.Literal('48', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'), rdflib.term.Literal('49', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'), rdflib.term.Literal('12', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.BNode('N491440306f7843c4b941c235b7ac206e'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'), rdflib.term.Literal('68', datatype=r

In [16]:
print(FOAF.Person)

http://xmlns.com/foaf/0.1/Person


In [18]:
print(FOAF.imadethisup)



http://xmlns.com/foaf/0.1/imadethisup


## Navigating graphs

rdflib uses iterators to navigate Graphs. The methods for navigating subjects, predicates and objects are Graph.subjects, Graph.predicates, Graph.objects

In [19]:
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print(p)

--- printing raw triples ---
http://xmlns.com/foaf/0.1/mbox_sha1sum
http://xmlns.com/foaf/0.1/primaryTopic
http://www.w3.org/2000/10/swap/pim/contact#participant
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/account
http://www.w3.org/2000/10/swap/pim/contact#preferredURI
http://rdfs.org/sioc/ns#avatar
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/10/swap/pim/contact#street2
http://www.w3.org/2006/vcard/ns#street-address
http://www.w3.org/ns/solid/terms#profileBackgroundColor
http://www.w3.org/2000/10/swap/pim/contact#postalCode
http://purl.org/dc/elements/1.1/title
http://www.w3.org/2006/vcard/ns#fn
http://www.w3.org/2000/10/swap/pim/contact#participant
http://xmlns.com/foaf/0.1/primaryTopic
http://xmlns.com/foaf/0.1/openid
http://purl.org/dc/elements/1.1/title
http://www.w3.org/ns/auth/cert#exponent
http://www.w3.org/ns/ldp#inbox
http://xmlns.com/foaf/0.1/nick
http://purl.org/dc/elements/1.1/title
http://xmlns.com/foaf/0.1/maker
htt

In [20]:
# Printing subjects, predicates and objects out of the tuple omits Python datatypes
print("--- printing raw triples ---")
for s, p, o in g:
    print(s, p, o)

--- printing raw triples ---
https://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/mbox_sha1sum 965c47c5a70db7407210cef6e4e6f5374a525c5c
http://www.w3.org/People/Berners-Lee/card http://xmlns.com/foaf/0.1/primaryTopic https://www.w3.org/People/Berners-Lee/card#i
http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01 http://www.w3.org/2000/10/swap/pim/contact#participant https://www.w3.org/People/Berners-Lee/card#i
N180720b2d4904ba2873355cbef7c8c72 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/ns/auth/cert#RSAPublicKey
https://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/account http://twitter.com/timberners_lee
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2000/10/swap/pim/contact#preferredURI https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i http://rdfs.org/sioc/ns#avatar http://www.w3.org/People/Berners-Lee/images/timbl-image-by-Coz-cropped.jpg
https://www.w3.o

In [21]:
print("PRINTING SUBJECTS")
for s in g.subjects():
    print(s)

PRINTING SUBJECTS
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card
http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01
N180720b2d4904ba2873355cbef7c8c72
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
Nc1dccd062a3048aa95a4679a6c2a779b
N7ff06c19838144c2b9e2ec042d01987e
https://www.w3.org/People/Berners-Lee/card#i
Nc1dccd062a3048aa95a4679a6c2a779b
http://www.w3.org/People/Berners-Lee/card
https://www.w3.org/People/Berners-Lee/card#i
http://wiki.ontoworld.org/index.php/_IRW2006
https://timbl.com/timbl/Public/friends.ttl
https://www.w3.org/People/Berners-Lee/card#i
http://dig.csail.mit.edu/breadcrumbs/blog/4
N180720b2d4904ba2873355cbef7c8c72
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/DesignIssues/Overview.html
https://timbl.com/timbl/Public/fr

In [22]:
print("PRINTING PREDICATES")
for p in g.predicates():
    if len(p) > 13:
        print(p)


PRINTING PREDICATES
http://xmlns.com/foaf/0.1/mbox_sha1sum
http://xmlns.com/foaf/0.1/primaryTopic
http://www.w3.org/2000/10/swap/pim/contact#participant
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/account
http://www.w3.org/2000/10/swap/pim/contact#preferredURI
http://rdfs.org/sioc/ns#avatar
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/10/swap/pim/contact#street2
http://www.w3.org/2006/vcard/ns#street-address
http://www.w3.org/ns/solid/terms#profileBackgroundColor
http://www.w3.org/2000/10/swap/pim/contact#postalCode
http://purl.org/dc/elements/1.1/title
http://www.w3.org/2006/vcard/ns#fn
http://www.w3.org/2000/10/swap/pim/contact#participant
http://xmlns.com/foaf/0.1/primaryTopic
http://xmlns.com/foaf/0.1/openid
http://purl.org/dc/elements/1.1/title
http://www.w3.org/ns/auth/cert#exponent
http://www.w3.org/ns/ldp#inbox
http://xmlns.com/foaf/0.1/nick
http://purl.org/dc/elements/1.1/title
http://xmlns.com/foaf/0.1/maker
http://www.w

In [23]:
print("PRINTING OBJECTS")
for o in g.objects():
    print(o)

PRINTING OBJECTS
965c47c5a70db7407210cef6e4e6f5374a525c5c
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/ns/auth/cert#RSAPublicKey
http://twitter.com/timberners_lee
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/images/timbl-image-by-Coz-cropped.jpg
http://xmlns.com/foaf/0.1/Person
MIT CSAIL Building 32
32 Vassar Street
#ffffff
02139
Tim Berners-Lee's FOAF file
Tim Berners-Lee
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/
timbl's blog on DIG
65537
https://timbl.com/timbl/Public/Inbox
timbl
Design Issues for the World Wide Web
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/
N180720b2d4904ba2873355cbef7c8c72
https://timbl.com/timbl/Data/preferences.n3
https://www.w3.org/People/Berners-Lee/card#i
Timothy Berners-Lee
Cambridge
https://timbl.com/timbl/Public/friends.ttl
mailto:tim

We can also filter the subjects, predicates and objects we want to retrieve, and match their values like in a database "join" operation

In [25]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = URIRef('urn:donna')
ila = URIRef('urn:ila')
# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (ila, RDF.type, FOAF.Person) )
g.add( (ila, RDF.type, FOAF.Teacher) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )
g.add( (ila, FOAF.mbox, URIRef("mailto:ila@example.org")) )

# For each foaf:Person in the store print out its mbox property.
# print("--- printing mboxes ---")
# for person in g.subjects(RDF.type, FOAF.Person):
#     for mbox in g.objects(person, FOAF.mbox):
#         print(mbox)

# You can reuse matches of subjects to filter further e.g. objects
for entity in g.subjects(RDF.type, None):
    print(entity)
    for objects in g.objects(entity, RDF.type):
        print(objects)

urn:ila
http://xmlns.com/foaf/0.1/Person
http://xmlns.com/foaf/0.1/Teacher
urn:ila
http://xmlns.com/foaf/0.1/Person
http://xmlns.com/foaf/0.1/Teacher
urn:donna
http://xmlns.com/foaf/0.1/Person


### Basic triple matching (almost querying!)

We use method Graph.triples and a Python tuple that acts as a mask for specifying our criteria

In [26]:
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

tim = URIRef("https://www.w3.org/People/Berners-Lee/card#i")

if ( tim, RDF.type, FOAF.Person ) in g:
   print("This graph knows that Tim is a person!")

if ( tim, None, None ) in g:
    print("This graph contains triples about Tim!")

This graph knows that Tim is a person!
This graph contains triples about Tim!


In [27]:
for s,p,o in g.triples( (None, None, None) ):
    print(s,p,o)

N726f445920bf4493b24fea73ee244d1f http://www.w3.org/2000/10/swap/pim/contact#street 32 Vassar Street
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/pim/space#storage https://timbl.com/timbl/Public/
http://www.w3.org/DesignIssues/Overview.html http://purl.org/dc/elements/1.1/title Design Issues for the World Wide Web
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2000/10/swap/pim/contact#publicHomePage http://www.w3.org/People/Berners-Lee/
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2006/vcard/ns#hasAddress Nd7a9b3cdc3724a75859ef35c855492c1
https://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/openid https://www.w3.org/People/Berners-Lee/
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/pim/space#preferencesFile https://timbl.com/timbl/Data/preferences.n3
https://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/account http://www.reddit.com/user/timbl/
N726f445920bf4493b24fea73ee244d1f 

In [28]:
for s,p,o in g.triples( (tim, RDF.type, None) ):
    print(s,p,o)

https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/10/swap/pim/contact#Male
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person


## Namespaces and bindings

In [29]:
mid_uri = URIRef("http://purl.org/midi-ld/midi#")
mid = Namespace(mid_uri)

print(mid['hello'])  # as item - for things that are not valid python identifiers
print(mid.hello)     # as attribute

http://purl.org/midi-ld/midi#hello
http://purl.org/midi-ld/midi#hello


In [30]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

print(g.serialize(format='turtle').decode())

@prefix ns1: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] a ns1:Person ;
    ns1:mbox <mailto:donna@example.org> ;
    ns1:name "Donna Fales" ;
    ns1:nick "donna"@foo .




In [31]:
foaf_uri = URIRef("http://xmlns.com/foaf/0.1/")
foaf_namespace = Namespace(foaf_uri)

g = Graph()

# Bind a few prefix, namespace pairs for more readable output
g.bind("foaf", foaf_namespace)

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

print(g.serialize(format='turtle').decode())

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] a foaf:Person ;
    foaf:mbox <mailto:donna@example.org> ;
    foaf:name "Donna Fales" ;
    foaf:nick "donna"@foo .


