# The Semantic Web Lab – RDF and RDFlib

## Session objectives:

- Familiarization with the Python language for creating simple Semantic Web programs.

- Familiarization with the RDFLib library to load and manipulate RDF graphs.

## 1.  Introduction

Over this lab we will use Python2 for developing simple programs that access, query, and manipulate RDF semantic data.

You can see a quick python reference here:

- Basic tutorial: http://docs.python.org/2/tutorial/

A python program is run by invoking python followed by the name of the ‘.py’ file. Interactive mode is also possible through the python shell.

The ITL machines are already prepared to run this lab sheet without further setup. If you want to work in your own computer you will need to install python, and the following additional python libraries:

- sparql-wrapper https://github.com/RDFLib/sparqlwrapper 

- rdflib https://github.com/RDFLib/rdflib

- pyparsing http://pyparsing.wikispaces.com/

- networkx http://networkx.lanl.gov/

You can also download sample scripts for this week from QMplus. 


## 2. RDFLib Graph Processing

In http://workingontologist.org/Examples/Chapter3/shakespeare.n3 you can download an RDF graph serialized using Turtle syntax. Download (saved as `shakespeare.n3`) and open the file using a text editor, and count how many triples are defined there.

Have a look at the prefixes and defined triples. How many different concepts are defined in the graph? How many properties?

Before starting up Python, if you are in the ITL set up the python environment by invoking the following command from the shell:

```
export PYTHONPATH=/import/linux/python/lib/python
```

We will now use python to programmatically process the file. Open python in interactive shell mode (type `python` from command line). You should see a console output similar to:
```
Python 2.7.2 (default, Oct 27 2011, 01:40:22)
[GCC 4.6.1 20111003 (Red Hat 4.6.1-10)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>>
```

You can familiarize with python syntax and the interactive shell by running some commands from:

http://docs.python.org/tutorial/introduction.html#using-python-as-a-calculator

Before using rdflib classes you will need to add some import declarations. Write the following sentences in the shell:

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import rdflib
from rdflib.graph import Graph, Store, URIRef, BNode, Literal
from rdflib.namespace import Namespace, RDF, RDFS
from rdflib import plugin

The Graph class represents in RDFLib a set of triples. Create one new object, and populate it with the definitions contained in the attached file by:

In [3]:
g = Graph()
g.parse('shakespeare.n3',format='n3')

<Graph identifier=N6754178a53da47ff97c95b825fc3faa7 (<class 'rdflib.graph.Graph'>)>

Note that you can alternatively parse a file directly from the internet:

In [4]:
g.parse('http://workingontologist.org/Examples/Chapter3/shakespeare.n3',format='n3')

<Graph identifier=N6754178a53da47ff97c95b825fc3faa7 (<class 'rdflib.graph.Graph'>)>

Now you got your model loaded in memory. You can check the number of statements with the `len(g)` operator (can be used with any Python collection). Other useful commands are `repr(g)`, to see a representation of any object in-memory (similar to Java `toString()`)

You can iterate through the contents of a graph by using Python `for` loops (you can check loop syntax in http://docs.python.org/tutorial/controlflow.html#for-statements). Be careful with the indentation (in python instead of using `{}` sentences inside the loop must be indented with spaces or tab.
```
for st in g:
    print(st)
```

You can also retrieve directly subject, predicate, and object elements by slightly changing the loop:
```
for s,p,o in g:
    print "\nsubject is: " + str(s) +  "\npredicate is: " + str(p) + "\nobject is: " + str(o)
```

The actual elements from the triples are Python classes representing URIRefs, blank nodes and literals. You can check this with python function type, which returns the type of a variable e.g., try `type(s)` `type(p)` and `type(o)`.

You can also see a complete overview of the Graph class here:
https://rdflib.readthedocs.io/en/latest/intro_to_graphs.html

## 3. Serialization formats

With your Shakespeare graph, you can compare the representation in different serialization formats:
```
print g.serialize(format='nt')
print g.serialize(format='turtle')
print g.serialize(format='xml')
```
Note the decreased verbosity of turtle compared to N-triples (nt). 

Also familiarize yourself with the XML serialization. 

Try saving these to a file for later study (python file-io docs).

## 4. RDF Store

In this part of the lab, you will learn how to setup and RDF Store (repository that persists RDF triples) with RDFLib.
RDFLib defines a plugin mechanism that allows persisting the RDF triples in multiple ways, including external repositories such as relational databses (e.g. MySQL). In order to simplify it, we use a persistence option that does not depend on external storage (e.g. a MySQL database).

### For an example Python script:

In [5]:
import rdflib
from rdflib.graph import Graph, Store, URIRef, Literal
from rdflib.namespace import Namespace, RDF, RDFS
from rdflib import plugin
# An RDF/XML example (taken from:http://www.w3schools.com/rdf/rdf_example.asp)
rdf_xml_data = '''<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire_Burlesque">
  <cd:artist>Bob_Dylan</cd:artist>
  <cd:country>USA</cd:country>
  <cd:company>Columbia</cd:company>
  <cd:price>10.90</cd:price>
  <cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Hide_your_heart">
  <cd:artist>Bonnie_Tyler</cd:artist>
  <cd:country>UK</cd:country>
  <cd:company>CBS_Records</cd:company>
  <cd:price>9.90</cd:price>
  <cd:year>1988</cd:year>
</rdf:Description>
</rdf:RDF>
'''

### PART 1: create and RDF store in memory
 
NOTE: other storage backend types may be Sleepycat (Berkley DB), MySQL, SQLite, etc...

In [6]:
memory_store = plugin.get('IOMemory', Store)()

Create an URI identifying the store:

In [7]:
graph_id = URIRef('http://example.com/foo')

￼Create an RDF graph using the store and ID defined above:

In [8]:
g = Graph(store=memory_store, identifier=graph_id)

### PART 2: manually add a triple to the graph

Think of a few literals that share the same properties (e.g. wrote, partOf, married) of the previous graph and define a new graph based on the same concepts.

In order to define programmatically triples in RDFLib you have two options. You can either define the URIRef by providing the fully qualified name as s String (see previous piece of code), or you can define URIRefs through Namespaces, following the syntax `ns[‘resourceFragment’]`.

RDF, and RDFS schemas are predefined namespaces you can already use (e.g. to retrieve). The following lines define additional namespace prefixes that appeared in the imported Turtle file:

In [9]:
nslit = Namespace('http://www.workingontologist.org/Examples/Chapter3/shakespeare.owl#')
nsbio = Namespace('http://www.workingontologist.org/Examples/Chapter3/biography.owl#')
g.bind('lit',nslit)
g.bind('bio', nsbio)

You can check the prefixes in your bound namespaces with:
```
for (p,n) in g.namespaces():
    print "Prefix: " + str(p) + ". Corresponds to namespace: " + str(n)
```
Once you have defined your namespaces you can add triples to the graph with the method `g.add( (a,b,c ) )`:

In [10]:
g.add((nsbio['Cervantes'],RDF.type, nsbio['Person'] ) )
g.add((nsbio['Cervantes'],RDFS.label, Literal('Miguel_de_Cervantes')))
g.add((URIRef(u'http://example.com/bar'), RDFS.label, Literal('bar')))

Print the serialized graph in the Turtle RDF syntax:

In [11]:
print g.serialize(destination=None, format='turtle', base=None, encoding=None)

@prefix bio: <http://www.workingontologist.org/Examples/Chapter3/biography.owl#> .
@prefix lit: <http://www.workingontologist.org/Examples/Chapter3/shakespeare.owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.com/bar> rdfs:label "bar" .

bio:Cervantes a bio:Person ;
    rdfs:label "Miguel_de_Cervantes" .




NOTE: predicate RDFS.label will be extended as a full URI: http://www.w3.org/2000/01/rdf-schema#label

Bind a namespace prefix for the example URI reference:

In [12]:
g.bind('ex','http://example.com/')

Print the serialized graph in the Turtle RDF syntax.

Note the subject “bar” is abbreviated now that we bound the namespace example.com

In [13]:
print g.serialize(destination=None, format='turtle', base=None, encoding=None)

@prefix bio: <http://www.workingontologist.org/Examples/Chapter3/biography.owl#> .
@prefix ex: <http://example.com/> .
@prefix lit: <http://www.workingontologist.org/Examples/Chapter3/shakespeare.owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:bar rdfs:label "bar" .

bio:Cervantes a bio:Person ;
    rdfs:label "Miguel_de_Cervantes" .




### PART 3: parse XML/RDF data provided earlier in a string (`rdf_xml_data`) and add into the graph

In [14]:
print "Number of triples in the graph: %i" %len(g)

Number of triples in the graph: 3


In [15]:
g.parse(data=rdf_xml_data, format="application/rdf+xml")
print "Number of triples in the graph after parsing the string: %i" %len(g)

Number of triples in the graph after parsing the string: 13


Check the actual namespaces bound in the graph store:

NOTE: there should be one additional namespace after parsing the string.

In [16]:
for ns in g.namespaces():
     print "Prefix: %s => URI: %s" %ns

Prefix: xml => URI: http://www.w3.org/XML/1998/namespace
Prefix: bio => URI: http://www.workingontologist.org/Examples/Chapter3/biography.owl#
Prefix: rdfs => URI: http://www.w3.org/2000/01/rdf-schema#
Prefix: cd => URI: http://www.recshop.fake/cd#
Prefix: lit => URI: http://www.workingontologist.org/Examples/Chapter3/shakespeare.owl#
Prefix: rdf => URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#
Prefix: ex => URI: http://example.com/
Prefix: xsd => URI: http://www.w3.org/2001/XMLSchema#


Serialise the graph:

Note that we have simply joined the two graphs although they are describing different datasets.

In [17]:
print "\nContents of Graph store in Turtle format\n"
print g.serialize(destination=None, format='turtle')


Contents of Graph store in Turtle format

@prefix bio: <http://www.workingontologist.org/Examples/Chapter3/biography.owl#> .
@prefix cd: <http://www.recshop.fake/cd#> .
@prefix ex: <http://example.com/> .
@prefix lit: <http://www.workingontologist.org/Examples/Chapter3/shakespeare.owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:bar rdfs:label "bar" .

<http://www.recshop.fake/cd/Empire_Burlesque> cd:artist "Bob_Dylan" ;
    cd:company "Columbia" ;
    cd:country "USA" ;
    cd:price "10.90" ;
    cd:year "1985" .

<http://www.recshop.fake/cd/Hide_your_heart> cd:artist "Bonnie_Tyler" ;
    cd:company "CBS_Records" ;
    cd:country "UK" ;
    cd:price "9.90" ;
    cd:year "1988" .

bio:Cervantes a bio:Person ;
    rdfs:label "Miguel_de_Cervantes" .




Manually retrieve the names of artists stored in a graph:
    
`g.objects` is a generator for objects matching the specified subject and predicate. 

(See https://rdflib.readthedocs.org/en/latest/using_graphs.html)

In [18]:
for artist in g.objects(subject=None, predicate=URIRef("http://www.recshop.fake/cd#artist")):
    print artist

Bonnie_Tyler
Bob_Dylan


#### Experiment with
1. Adding new CD/artist data to the recordstore graph.
2. Searching the for subjects (so far we searched for objects)
3. Searching the graph for predicates. (e.g., `Graph.predicates` )

## 5. Visualizing Graphs

Use graphviz (see Oreilly book Chapter 3) to produce and visualize graphically the graphs you have created.

The following code creates a DOT file containing those same triples, to be used by Graphviz.

In [19]:
def triplestodot(triples, filename, nsdict):
    out=file(filename, 'w')
    out.write('graph "SimpleGraph" {\n')
    out.write('overlap = "scale";\n')
    for t in triples:
        write_string = '"%s" -- "%s " [label="%s"] ;\n' % (t[0].encode('utf-8'), t[2].encode('utf-8'), t[1].encode('utf-8'))
        for item in nsdict:
            write_string = write_string.replace(item, nsdict[item])
        out.write(write_string)
    out.write('}')

namespacedict = {}
for (p,n) in g.namespaces():
    print n, p+ ':'
    namespacedict[n] = p + ':'
    
triplestodot(g,'my_graph_lab1.dot', namespacedict)

http://www.w3.org/XML/1998/namespace xml:
http://www.workingontologist.org/Examples/Chapter3/biography.owl# bio:
http://www.w3.org/2000/01/rdf-schema# rdfs:
http://www.recshop.fake/cd# cd:
http://www.workingontologist.org/Examples/Chapter3/shakespeare.owl# lit:
http://www.w3.org/1999/02/22-rdf-syntax-ns# rdf:
http://example.com/ ex:
http://www.w3.org/2001/XMLSchema# xsd:


This should save a file called `my_graph_lab1.dot`. 

There are some online DOT viewers that are quite handy to view `.dot` as image, such as:
- http://www.webgraphviz.com/
- http://viz-js.com/
- https://dreampuf.github.io/GraphvizOnline/

## 6. Extra Practice

Work through the examples available on the RDF lib documentation https://rdflib.readthedocs.org/en/latest/index.html

You can also have a look at the examples from the OReilly book (http://proquestcombo.safaribooksonline.com/book/web-development/9780596802141) to see how to manipulate RDFLib graphs. However, keep in mind that they were written using an older version of the library, and some small differences may appear (such as the packages where classes are imported from).