# Knowledge Representation on the Web -- RDF tutorial

In this tutorial we'll learn the basics of interacting with RDF graphs with Python. We'll be using rdflib for this, a widely used Ptyhon library for RDF (all documentation can be found [here](https://rdflib.readthedocs.io/en/stable/index.html))

## Imports
These are the main classes and types we'll be using from rdflib

In [1]:
import sys
!{sys.executable} -m pip install rdflib

from rdflib import Graph, ConjunctiveGraph, Literal, BNode, Namespace, RDF, URIRef
from rdflib.namespace import DC, FOAF

import pprint





## Loading data remotely and from files

rdflib accepts importing RDF data from a variety of sources, either locally from a file (including an extensive support of serializations), or remotely via a URI (this is a great way of checking practically if URIs return RDF according to the 3rd Linked Data principle).

A Graph object is always required to load triples.
**Note**: to load quads, and hence supporting named graphs, you'll need to use an instance of ConjunctiveGraph instead


In [2]:
g = Graph()
h = Graph()
result = g.parse("http://www.w3.org/People/Berners-Lee/card")

result2 = h.parse("https://csarven.ca/")

print("Graph has %s statements." % len(g))
print("Graph has %s statements." % len(h))


Graph has 86 statements.
Graph has 50 statements.


In [3]:
# with open("demo.nt", "r") as f:
#     print("This is an N-Triples file")
#     print("=========================")
#     for line in f:
#         print(line)
    
# with open("demo.xml", "r") as f:
#     print("This is a RDF/XML file")
#     print("======================")
#     for line in f:
#         print(line)
    
with open("demo.ttl", "r") as f:
    print("This is a Turtle file")
    print("=====================")
    for line in f:
        print(line)

This is a Turtle file
@prefix ns1: <http://example.com/> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix xml: <http://www.w3.org/XML/1998/namespace> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .



<http://bigasterisk.com/foaf.rdf#drewp> a <http://xmlns.com/foaf/0.1/Person> ;

    ns1:says "Hello world" .





In [4]:
g = Graph()
g.parse("demo.ttl", format='turtle')

len(g) # prints 2

2

In [5]:
for stmt in g:
    pprint.pprint(stmt)

(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://example.com/says'),
 rdflib.term.Literal('Hello world'))


In [7]:
# You can also parse directly from a string
g = Graph()

g.parse(data = '<urn:a> <urn:p> <urn:b>.', format='n3')
for stmt in g:
    pprint.pprint(stmt)

(rdflib.term.URIRef('urn:a'),
 rdflib.term.URIRef('urn:p'),
 rdflib.term.URIRef('urn:b'))


## Saving RDF graphs

We use the function Graph.serialize(format)

In [11]:
g = Graph()
g.parse("demo.nt", format='n3')
# for a,b,c in g:
#     print(a,b,c)

print(g.serialize(format='ttl').decode()) # 'html', 'hturtle', 'mdata', 'microdata', 'n3', 'nquads', 'nt', 'rdfa', 'rdfa1.0', 'rdfa1.1', 'trix', 'turtle', 'xml'

@prefix ns1: <http://example.com/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://bigasterisk.com/foaf.rdf#drewp> a <http://xmlns.com/foaf/0.1/Person> ;
    ns1:says "Hello world" .




##  Merging graphs

Merging graphs can be done via sequential parsings or by the overloaded operator +

**Note:** Set-theoretic graph semantics apply

In [12]:
graph = Graph()

# Sequential parsings merge *new* triples

graph.parse("demo.nt", format='nt')
graph.parse("demo.xml", format='xml')

print("Graph has {} triples".format(len(graph)))

Graph has 2 triples


In [13]:
g1 = Graph()
g1.parse("demo.nt", format='nt')
print("g1 has {} triples".format(len(g1)))

g2 = Graph()
g2.parse("demo.xml", format='xml')
print("g2 has {} triples".format(len(g2)))

graph = g1 + g2
print("g1 + g2 has {} triples".format(len(graph)))

g1 has 2 triples
g2 has 2 triples
g1 + g2 has 2 triples


In [14]:
# Now, if we merge graphs with different contents

tim_g = Graph()
tim_g.parse("http://www.w3.org/People/Berners-Lee/card")
print("Tim graph has {} triples".format(len(tim_g)))

g3 = g1 + tim_g
print("g3 has {} triples".format(len(g3)))

Tim graph has 86 triples
g3 has 88 triples



## Creating RDF triples

Triples are added to the graph with the function Graph.add()

The parameter is a triple given in a Python **tuple** (subject, predicate, object)

Notice the namespace convenience syntax!

In [15]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()
# donna = URIRef("http://example.org/donna")

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

for i in range(100):
    g.add((donna, FOAF.hadAge, Literal(i)))

print(len(g))

for s in g:
    pprint.pprint(s)
    print()


104
(rdflib.term.BNode('N73308942226f48be8b11481235e03031'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('41', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N73308942226f48be8b11481235e03031'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('36', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N73308942226f48be8b11481235e03031'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('72', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N73308942226f48be8b11481235e03031'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Literal('54', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))

(rdflib.term.BNode('N73308942226f48be8b11481235e03031'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/hadAge'),
 rdflib.term.Liter

In [None]:
for stmt in g:
    print(stmt)

In [None]:
print(FOAF.Person)

In [None]:
print(FOAF.imadethisup)

TD - 

## Navigating graphs

rdflib uses iterators to navigate Graphs. The methods for navigating subjects, predicates and objects are Graph.subjects, Graph.predicates, Graph.objects

In [17]:
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print(s)

--- printing raw triples ---
http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01
http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk
https://timbl.com/timbl/Public/friends.ttl
N2f96ba07be9b485c8b66078c4be18410
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card#i
N855db7ebb43547b9ac908665e4a8cf63
https://timbl.com/timbl/Public/friends.ttl
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://dig.csail.mit.edu/2007/01/camp/data#course
N0f7c32b6e5fd4e7c8552a03c0b88a8ac
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
Nb802d6439ab445fa8c078d7518a89c83
http://www.w3.org/People/Berners-Lee/card
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card
http://dig.csail.mit.edu/breadcrumbs/blog/4
https://www.w3.org/People/Berners-Lee/car

In [18]:
# Printing subjects, predicates and objects out of the tuple omits Python datatypes
print("--- printing raw triples ---")
for s, p, o in g:
    print(s, p, o)

--- printing raw triples ---
http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01 http://www.w3.org/2000/10/swap/pim/contact#participant https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk http://xmlns.com/foaf/0.1/maker https://www.w3.org/People/Berners-Lee/card#i
https://timbl.com/timbl/Public/friends.ttl http://xmlns.com/foaf/0.1/primaryTopic https://www.w3.org/People/Berners-Lee/card#i
N2f96ba07be9b485c8b66078c4be18410 http://www.w3.org/2003/01/geo/wgs84_pos#lat 42.361860
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/pim/space#storage https://timbl.inrupt.net/
http://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/auth/cert#key N0f7c32b6e5fd4e7c8552a03c0b88a8ac
N855db7ebb43547b9ac908665e4a8cf63 http://www.w3.org/2006/vcard/ns#postal-code 02139
https://timbl.com/timbl/Public/friends.ttl http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/PersonalProfileDocument
https

In [19]:
print("PRINTING SUBJECTS")
for s in g.subjects():
    print(s)

PRINTING SUBJECTS
http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01
http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk
https://timbl.com/timbl/Public/friends.ttl
N2f96ba07be9b485c8b66078c4be18410
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card#i
N855db7ebb43547b9ac908665e4a8cf63
https://timbl.com/timbl/Public/friends.ttl
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://dig.csail.mit.edu/2007/01/camp/data#course
N0f7c32b6e5fd4e7c8552a03c0b88a8ac
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
Nb802d6439ab445fa8c078d7518a89c83
http://www.w3.org/People/Berners-Lee/card
https://www.w3.org/People/Berners-Lee/card#i
https://www.w3.org/People/Berners-Lee/card#i
http://www.w3.org/People/Berners-Lee/card
http://dig.csail.mit.edu/breadcrumbs/blog/4
https://www.w3.org/People/Berners-Lee/card#i
https:/

In [20]:
print("PRINTING PREDICATES")
for p in g.predicates():
    if len(p) > 13:
        print(p)


PRINTING PREDICATES
http://www.w3.org/2000/10/swap/pim/contact#participant
http://xmlns.com/foaf/0.1/maker
http://xmlns.com/foaf/0.1/primaryTopic
http://www.w3.org/2003/01/geo/wgs84_pos#lat
http://www.w3.org/ns/pim/space#storage
http://www.w3.org/ns/auth/cert#key
http://www.w3.org/2006/vcard/ns#postal-code
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://xmlns.com/foaf/0.1/givenname
http://www.w3.org/ns/solid/terms#profileHighlightColor
http://xmlns.com/foaf/0.1/account
http://xmlns.com/foaf/0.1/maker
http://www.w3.org/ns/auth/cert#modulus
http://xmlns.com/foaf/0.1/nick
http://www.w3.org/2006/vcard/ns#hasAddress
http://www.w3.org/2003/01/geo/wgs84_pos#location
http://xmlns.com/foaf/0.1/maker
http://xmlns.com/foaf/0.1/nick
http://www.w3.org/ns/ldp#inbox
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#seeAlso
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/ns/pim/space#storage
http://www.w3.org/2003/01/geo/wgs84_pos#long
http:/

In [None]:
print("PRINTING OBJECTS")
for o in g.objects():
    print(o)

We can also filter the subjects, predicates and objects we want to retrieve, and match their values like in a database "join" operation

In [21]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = URIRef('urn:donna')
ila = URIRef('urn:ila')
# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (ila, RDF.type, FOAF.Person) )
g.add( (ila, RDF.type, FOAF.Teacher) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )
g.add( (ila, FOAF.mbox, URIRef("mailto:ila@example.org")) )

# For each foaf:Person in the store print out its mbox property.
# print("--- printing mboxes ---")
# for person in g.subjects(RDF.type, FOAF.Person):
#     for mbox in g.objects(person, FOAF.mbox):
#         print(mbox)

# You can reuse matches of subjects to filter further e.g. objects
for entity in g.subjects(RDF.type, None):
    print(entity)
    for objects in g.objects(entity, RDF.type):
        print(objects)

urn:ila
http://xmlns.com/foaf/0.1/Teacher
http://xmlns.com/foaf/0.1/Person
urn:ila
http://xmlns.com/foaf/0.1/Teacher
http://xmlns.com/foaf/0.1/Person
urn:donna
http://xmlns.com/foaf/0.1/Person


### Basic triple matching (almost querying!)

We use method Graph.triples and a Python tuple that acts as a mask for specifying our criteria

In [22]:
g = Graph()
g.parse("http://www.w3.org/People/Berners-Lee/card")

tim = URIRef("https://www.w3.org/People/Berners-Lee/card#i")

if ( tim, RDF.type, FOAF.Person ) in g:
    print("This graph knows that Tim is a person!")

if ( tim, None, None ) in g:
    print("This graph contains triples about Tim!")

This graph knows that Tim is a person!
This graph contains triples about Tim!


In [23]:
for s,p,o in g.triples( (None, None, None) ):
    print(s,p,o)

https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/2000/10/swap/pim/contact#assistant https://www.w3.org/People/Berners-Lee/card#amy
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/solid/terms#oidcIssuer https://timbl.com
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/solid/terms#publicTypeIndex https://timbl.com/timbl/Public/PublicTypeIndex.ttl
https://www.w3.org/People/Berners-Lee/card#i http://xmlns.com/foaf/0.1/weblog http://dig.csail.mit.edu/breadcrumbs/blog/4
Nf2ec36081fb445eb9cacf8d531c29d4f http://www.w3.org/2006/vcard/ns#locality Cambridge
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/ns/pim/space#storage https://timbl.com/timbl/Public/
http://www.w3.org/People/Berners-Lee/card http://creativecommons.org/ns#license http://creativecommons.org/licenses/by-nc/3.0/
N0df9e65de2954efba0d6cb1855b53ae9 http://www.w3.org/2000/10/swap/pim/contact#street 32 Vassar Street
https://www.w3.org/People/Berners-Lee/card#i http://

In [24]:
for s,p,o in g.triples( (tim, RDF.type, None) ):
    print(s,p,o)

https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/10/swap/pim/contact#Male
https://www.w3.org/People/Berners-Lee/card#i http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person


## Namespaces and bindings

In [None]:
mid_uri = URIRef("http://purl.org/midi-ld/midi#")
mid = Namespace(mid_uri)

print(mid['hello'])  # as item - for things that are not valid python identifiers
print(mid.hello)     # as attribute

In [25]:
g = Graph()

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )

print(g.serialize(format='turtle').decode())

@prefix ns1: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] a ns1:Person ;
    ns1:mbox <mailto:donna@example.org> ;
    ns1:name "Donna Fales" ;
    ns1:nick "donna"@foo .




In [1]:
foaf_uri = URIRef("http://xmlns.com/foaf/0.1/")
foaf_namespace = Namespace(foaf_uri)

g = Graph()

# Bind a few prefix, namespace pairs for more readable output
g.bind("foaf", foaf_namespace)

# Create an identifier to use as the subject for Donna.
donna = BNode()

# Add triples using store's add method.
g.add( (donna, RDF.type, FOAF.Person) )
g.add( (donna, FOAF.nick, Literal("donna", lang="foo")) )
g.add( (donna, FOAF.name, Literal("Donna Fales")) )
g.add( (donna, FOAF.mbox, URIRef("mailto:donna@example.org")) )
file = open("email.txt", "w")
file.write(g.serialize(format='turtle'))
#print(g.serialize(format='turtle').decode())

NameError: name 'URIRef' is not defined