# Chapter 4: Just Enough RDF

## Basic capabilities of RDFLib

This initial part of the chapter is focused on exploring main capabilities of RDFLib to deal with RDF ans semantic information. In that sense, we have defined three bbasic function required for opening and printing semantic information: 

In [18]:
import rdflib
from rdflib import ConjunctiveGraph, URIRef, Graph, Literal
from rdflib.namespace import FOAF, RDF
from rdflib_sqlalchemy.store import SQLAlchemy
from rdflib import Namespace

def getGraph(url, strLang):
    graph = ConjunctiveGraph()
    if strLang == None:
        graph.parse(url)
    else:
        graph.parse(url, format=strLang)
    return graph

def printTriples (graph):
    for triple in graph:
        print (triple)

Based on this function, we initially will collect "Collins" information from the web

In [3]:
g = getGraph("https://raw.githubusercontent.com/agnantis/semantic.o/master/semantico/data/colin.nt", "nt")
printTriples(g)

(rdflib.term.URIRef('http://semprog.com/people/colin'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.URIRef('http://kiwitobes.com/toby.rdf#ts'))
(rdflib.term.URIRef('http://kiwitobes.com/toby.rdf#ts'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/mbox'), rdflib.term.URIRef('mailto:toby@segaran.com'))
(rdflib.term.BNode('N2089fc7c85794078b43a39f0bc7c8d7e'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/name'), rdflib.term.Literal('Jamie Taylor'))
(rdflib.term.URIRef('http://semprog.com/people/colin'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'), rdflib.term.BNode('N2089fc7c85794078b43a39f0bc7c8d7e'))
(rdflib.term.URIRef('http://kiwitobes.com/toby.rdf#ts'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))
(rdflib.term.BNode('N2089fc7c85794078b43a39f0bc7c8d7e'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Pe

As the reader can see, all information are in form of triples. That means, in form of <subject, object , predicate>. Considering simple queries filtering (later we will see SPARQL for querying), we will get all people who knows Collin: 

In [10]:
list(g.triples((None, FOAF.knows, None)))

[(rdflib.term.URIRef('http://semprog.com/people/colin'),
  rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'),
  rdflib.term.URIRef('http://kiwitobes.com/toby.rdf#ts')),
 (rdflib.term.URIRef('http://semprog.com/people/colin'),
  rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'),
  rdflib.term.BNode('N2089fc7c85794078b43a39f0bc7c8d7e'))]

Another interesting aspect is to serialize the different graphs in different languages. for that, we will use the "writeGraph" function: 

In [39]:
outfile = open("colin.xml", "w")
outfile.write(str(g.serialize(format="pretty-xml").decode('utf-8')))
outfile.close()

Now, we will see the difference between the serialized and in memory graphs to demonstrate that there is not any difference between graph information: 

In [40]:
newg = getGraph("colin.xml",None)
newg.serialize(format="n3")

b'@prefix ns1: <http://xmlns.com/foaf/0.1/> .\n@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .\n@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n@prefix xml: <http://www.w3.org/XML/1998/namespace> .\n@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .\n\n<http://semprog.com/people/colin> a ns1:Person ;\n    ns1:knows [ a ns1:Person ;\n            ns1:mbox <mailto:jamie@semprog.com> ;\n            ns1:name "Jamie Taylor" ],\n        <http://kiwitobes.com/toby.rdf#ts> ;\n    ns1:mbox <mailto:colin@metaweb.com> ;\n    ns1:name "Colin Evans" .\n\n<http://kiwitobes.com/toby.rdf#ts> a ns1:Person ;\n    ns1:mbox <mailto:toby@segaran.com> ;\n    ns1:name "Toby Segaran" .\n\n'

Now, we will see if the both graphs are the same. For that, we will substract both graphs:

In [42]:
newg -= g
len (newg)

4

In [43]:
g.isomorphic(newg)

False

## Add Triples into the Graph
This part is focused on showing how to add triples into a graph. For that, we need to use the "add" function of a graph. In that case, we will add in the Colling graph information about us: 


In [46]:
me = URIRef("http://my.uri.com/robert")
g.add ((me, RDF.type, FOAF.Person))
g.add ((URIRef("http://semprog.com/people/colin"), FOAF.knows, me))
list(g.triples((None, FOAF.knows, me)))

[(rdflib.term.URIRef('http://semprog.com/people/colin'),
  rdflib.term.URIRef('http://xmlns.com/foaf/0.1/knows'),
  rdflib.term.URIRef('http://my.uri.com/robert'))]

## Persisting Semantic Data

For persisting semantic data we need to install RDFLib-SQLAlquemy in order to store semantic data in MySQL or PostgreSQL as many other options. Once installed, we need to enable a database an store the corresponding triples:

In [12]:
store = SQLAlchemy(configuration="postgresql://aitor:1234@localhost:32769")
gPersistent = Graph(store, identifier=URIRef("rdflib_test"))
gPersistent.open("postgresql://aitor:1234@localhost:32769", create=True)
semprog = rdflib.Namespace("http://semprog.com/people/")
foaf = rdflib.Namespace("http://xmlns.com/foaf/0.1/")
gPersistent.add((semprog["jamie"], foaf["name"], Literal("Jamie Taylor")))
gPersistent.add((semprog["jamie"], foaf["mbox"], Literal("jamie@semprog.com")))
gPersistent.serialize(format="nt")  # just to check our work
gPersistent.commit()

Once stored the information, we will cconsume stored information as follows: 

In [48]:
store.open("postgresql://aitor:1234@localhost:32769", create=False)
readg= rdflib.ConjunctiveGraph(store)
readg.serialize(format="nt")

b'<http://semprog.com/people/jamie> <http://xmlns.com/foaf/0.1/name> "Jamie Taylor" .\n<http://semprog.com/people/jamie> <http://xmlns.com/foaf/0.1/mbox> "jamie@semprog.com" .\n\n'

## Using SPARQL in RDFLib

To underestand how to use SPARQL queries in RDF LIb, we will perform a series of SELECT, CONSTRUCT, ASK queries ilustrated in the book. 

### 1. Getting al films and release year.

To get all films with the release llear we need to find the following triple: 

<fimlms, fb:film.film.initial_release_date, years>

This transferred to SPARQL means: 
```
SELECT ?film ?year
WHERE{
  film fb:film.film.initial_release_date ?year.
}
``` 

In [52]:
FBNAMESPACE = Namespace("http://rdf.freebase.com/ns/")
g = ConjunctiveGraph()
g.parse("moviedata.n3", format="n3")
results = g.query("""SELECT ?film ?year
    WHERE { ?film fb:film.film.initial_release_date ?year. }""", \
                      initNs={'fb': FBNAMESPACE})


As a result, we get all films and their corresponding release date. 

In [51]:
printTriples(results)

(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.hollywood_homicide'), rdflib.term.Literal('2003'))
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.becoming_dick'), rdflib.term.Literal('2000'))
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.the_weight_of_water_2002'), rdflib.term.Literal('2002'))
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.k_19_the_widowmaker'), rdflib.term.Literal('2002'))
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.body_of_lies'), rdflib.term.Literal('2008'))


## 2. Creating a graph regarding when the actors where employed

The idea of this query is to evaluate the use of the CONSTRUCT queries. Based on that, we will use the following query: 

```
CONSTRUCT{
  ?who <http://employment.history/was_employed_in> ?year
}
WHERE{
  ?film fb:film.film.starring ?who .
  ?film fb:film.film.initial_release_date ?year .
}
```

In [53]:
results = g.query("""CONSTRUCT {
?who <http://employment.history/was_employed_in> ?year }
WHERE {
?film fb:film.film.starring ?who .
?film fb:film.film.initial_release_date ?year .
}""", initNs={'fb':FBNAMESPACE}).serialize(format="xml")

results

b'<?xml version="1.0" encoding="UTF-8"?>\n<rdf:RDF\n   xmlns:ns1="http://employment.history/"\n   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n>\n  <rdf:Description rdf:about="http://rdf.freebase.com/ns/en.mark_strong">\n    <ns1:was_employed_in>2008</ns1:was_employed_in>\n  </rdf:Description>\n  <rdf:Description rdf:about="http://rdf.freebase.com/ns/en.joss_ackland">\n    <ns1:was_employed_in>2002</ns1:was_employed_in>\n  </rdf:Description>\n  <rdf:Description rdf:about="http://rdf.freebase.com/ns/en.elizabeth_hurley">\n    <ns1:was_employed_in>2002</ns1:was_employed_in>\n  </rdf:Description>\n  <rdf:Description rdf:about="http://rdf.freebase.com/ns/en.sean_penn">\n    <ns1:was_employed_in>2002</ns1:was_employed_in>\n  </rdf:Description>\n  <rdf:Description rdf:about="http://rdf.freebase.com/ns/en.robert_wagner">\n    <ns1:was_employed_in>2003</ns1:was_employed_in>\n    <ns1:was_employed_in>2000</ns1:was_employed_in>\n  </rdf:Description>\n  <rdf:Description rdf:about="htt

## 3. Getting all films released after 2005

In this query, we will make us of the FILTER query. In that sense, the following query will be performed: 

```
SELECT ?film ?year
WHERE{
?film fb:film.film.initial_release_date ?year .
FILTER (?year > "2005")
}
``` 

In [57]:
results = g.query("""
Select ?film ?year where{ ?film fb:film.film.initial_release_date ?year . FILTER (?year > "2005") }""", \
                      initNs={'fb': FBNAMESPACE})

printTriples(results)

(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.body_of_lies'), rdflib.term.Literal('2008'))


## 4. Paginating the queries results

This example will be focused on ordering and paginating the query results when there exist in the query numerous results. As an example, if we want to make the pagination over the following query:
```
SELECT ?name ?year 
WHERE{
  ?movie fb:film.film.initial_release_date ?year . 
  ?movie fb:film.film.starring ?actor .
  ?actor fb:type.object.name ?name . 
}
```

Initially we need to Order the query results: 

In [59]:
results = g.query("""
Select ?name ?year where{ ?movie fb:film.film.initial_release_date ?year . 
?movie fb:film.film.starring ?actor . 
?actor fb:type.object.name ?name .} ORDER BY ?year ?name""", \
                      initNs={'fb': FBNAMESPACE})

printTriples(results)

(rdflib.term.Literal('Bob Saget'), rdflib.term.Literal('2000'))
(rdflib.term.Literal('Robert Wagner'), rdflib.term.Literal('2000'))
(rdflib.term.Literal('Elizabeth Hurley'), rdflib.term.Literal('2002'))
(rdflib.term.Literal('Harrison Ford'), rdflib.term.Literal('2002'))
(rdflib.term.Literal('Sean Penn'), rdflib.term.Literal('2002'))
(rdflib.term.Literal('Harrison Ford'), rdflib.term.Literal('2003'))
(rdflib.term.Literal('Kurupt'), rdflib.term.Literal('2003'))
(rdflib.term.Literal('Robert Wagner'), rdflib.term.Literal('2003'))
(rdflib.term.Literal('Mark Strong'), rdflib.term.Literal('2008'))
(rdflib.term.Literal('Russell Crowe'), rdflib.term.Literal('2008'))


Now, if we wnat to paginate the queries, qe need to make use of the LIMIT and OFFSET query options:

In [65]:
limit = 2
page = 0
results = True

while results:
    print ("---- page: " + str(page+1) + "----")
    results = g.query("""PREFIX fb:<http://rdf.freebase.com/ns/> 
    SELECT ?film ?year
    WHERE { ?film fb:film.film.initial_release_date ?year. 
    } ORDER BY ?year LIMIT """ + str(limit) + " OFFSET " + str((page*limit)))
    
    for triple in results: 
        print (triple)
    page += 1

---- page: 1----
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.becoming_dick'), rdflib.term.Literal('2000'))
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.the_weight_of_water_2002'), rdflib.term.Literal('2002'))
---- page: 2----
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.k_19_the_widowmaker'), rdflib.term.Literal('2002'))
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.hollywood_homicide'), rdflib.term.Literal('2003'))
---- page: 3----
(rdflib.term.URIRef('http://rdf.freebase.com/ns/en.body_of_lies'), rdflib.term.Literal('2008'))
---- page: 4----


As we can see, the results are divided in different pages. This kind of queries are really usefuk where huge amount of information is being queried. 