# Semantic Web - Tehnologies - Presentation



* **Tehnology**: rdflib Python package
* **Student**: Brebu Andrei-Alexandru, SCPD
* **Deadline**: 2022.03.23

### Short recap

[RDF](https://www.w3.org/RDF/) allows us to make statements about resources. A statement always has the following structure:

```<subject> <predicate> <object>```

An RDF statement expresses a relationship between two resources. The subject and the object represent the two resources being related; the predicate represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a property. Because RDF statements consist of three elements, they are called triples.

# rdflib

RDFLib is a pure Python package for working with RDF. RDFLib contains most of the things you need to work with RDF, including the following:
- parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, RDFa, Microdata and JSON-LD (via a plugin)
- a Graph interface that can be backed by any one of a number of Store implementations
- store implementations for in-memory storage and persistent storage on top of the Berkeley DB
- a SPARQL 1.1 implementation supporting Queries and Update statements

See https://rdflib.readthedocs.org/en/latest/

In [8]:
!pip install rdflib

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m


## More information about rdflib package

In [9]:
!pip show rdflib

Name: rdflib
Version: 6.1.1
Summary: RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
Home-page: https://github.com/RDFLib/rdflib
Author: Daniel 'eikeon' Krech
Author-email: eikeon@eikeon.com
License: bsd-3-clause
Location: /usr/local/lib/python3.9/site-packages
Requires: isodate, pyparsing, setuptools
Required-by: 


## Short intro

The primary interface that RDFLib exposes for working with RDF is a [Graph](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph).

RDFLib graphs are un-sorted containers; they have ordinary set operations (e.g. [add](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.Graph.add)() to add a triple) plus methods that search triples and return them in arbitrary order.

RDFLib graphs also redefine certain built-in Python methods in order to behave in a predictable way: they [emulate container types](http://docs.python.org/release/2.5.2/ref/sequence-types.html) and are best thought of as a set of 3-item tuples (“triples”, in RDF-speak):

```python
[
    (subject0, predicate0, object0),
    (subject1, predicate1, object1),
    ...
    (subjectN, predicateN, objectN)
]
```

### Example 1

In [10]:
from rdflib import Graph

In [11]:
g = Graph() # init a graph

In [12]:
RDF_GRAPH = 'http://dbpedia.org/resource/Michael_Jackson'

# there are many different ways how you can parse a graph
# - local graph
#     - from the file
#     - from a string
# - external graph
#     - URL


In [13]:
g.parse(RDF_GRAPH) # parse in an RDF file

<Graph identifier=N20e78d32b88849d1b258ac9b5440670d (<class 'rdflib.graph.Graph'>)>

In [14]:
# Loop through each triple in the graph (subj, pred, obj)
for index, (sub, pred, obj) in enumerate(g):
    print(sub, pred, obj)
    if index == 10:
        break

http://dbpedia.org/resource/Michael_Jackson http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Rolling_Stone
http://dbpedia.org/resource/Ricky_Lawson http://dbpedia.org/ontology/associatedMusicalArtist http://dbpedia.org/resource/Michael_Jackson
http://dbpedia.org/resource/Martin_Scorsese_filmography http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Michael_Jackson
http://dbpedia.org/resource/Triumph_(The_Jacksons_album) http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Michael_Jackson
http://dbpedia.org/resource/Shake_Your_Body_(Down_to_the_Ground) http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Michael_Jackson
http://dbpedia.org/resource/Meanwhile,_Back_at_the_Ranch_(album) http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Michael_Jackson
http://dbpedia.org/resource/Holland–Dozier–Holland http://dbpedia.org/ontology/associatedMusicalArtist http://dbpedia.org/resource/Mi

In [15]:
# Print the size of the Graph
print(f"graph has {len(g)} triples/facts")

graph has 9635 triples/facts


In [16]:

# Print out the entire Graph in the RDF Turtle format
# print(g.serialize(format='ttl'))

### Creating Nodes

The subjects and objects of the triples make up the nodes in the graph where the nodes are URI references, Blank Nodes or Literals. In the RDFLib, these node types are represented by the classes **URIRef**, **BNode**, **Literal**. URIRefs and BNodes can both be thought of as resources, such as a person, a company, a website, etc.
- A BNode is a node where the exact URI is not known.
- A URIRef is a node where the exact URI is known. URIRefs are also used to represent the properties/predicates in the RDF graph.
- Literals represent attribute values, such as a name, a date, a number, etc. The most common literal values are XML data types (e.g. string, int, ...).


### Example 2
In this example, we are going to create a node. We want to identify this node by RDF URI.

In [17]:
from rdflib import Graph
g = Graph()

In [18]:
# Create an RDF URI node to use as the subject for multiple triples

from rdflib import URIRef

In [19]:

cosmin = URIRef("http://example.org/cosmin")
cosmin

rdflib.term.URIRef('http://example.org/cosmin')

In [20]:
# Add triples/facts

from rdflib import RDF, Literal
from rdflib.namespace import FOAF, XSD



In [21]:
g.add((cosmin, RDF.type, FOAF.Person))

<Graph identifier=Nfc0fd35812f84ffa86a391ab9263f6f0 (<class 'rdflib.graph.Graph'>)>

In [22]:
g.add((cosmin, FOAF.nick, Literal("cosmos", lang="en")))

<Graph identifier=Nfc0fd35812f84ffa86a391ab9263f6f0 (<class 'rdflib.graph.Graph'>)>

In [23]:
g.add((cosmin, FOAF.name, Literal("Cosmin Kosmin")))

<Graph identifier=Nfc0fd35812f84ffa86a391ab9263f6f0 (<class 'rdflib.graph.Graph'>)>

In [24]:
g.add((cosmin, FOAF.mbox, URIRef("mailto:cosmin@yahoo.com")))

<Graph identifier=Nfc0fd35812f84ffa86a391ab9263f6f0 (<class 'rdflib.graph.Graph'>)>

In [25]:
# Add another person

maria = URIRef("http://example.org/maria")

# Add triples/facts
g.add((maria, RDF.type, FOAF.Person))
g.add((maria, FOAF.nick, Literal("maria", datatype=XSD.string))) # here we defined a datatype instead of a language for this literal
g.add((maria, FOAF.name, Literal("Maria Santana")))
g.add((maria, FOAF.mbox, URIRef("mailto:maria.santana@stud.acs.upb.ro")))

<Graph identifier=Nfc0fd35812f84ffa86a391ab9263f6f0 (<class 'rdflib.graph.Graph'>)>

In [26]:
for sub, pred, obj in g:
    print(sub, pred, obj)

http://example.org/maria http://xmlns.com/foaf/0.1/nick maria
http://example.org/cosmin http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person
http://example.org/cosmin http://xmlns.com/foaf/0.1/mbox mailto:cosmin@yahoo.com
http://example.org/maria http://xmlns.com/foaf/0.1/name Maria Santana
http://example.org/cosmin http://xmlns.com/foaf/0.1/nick cosmos
http://example.org/maria http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person
http://example.org/cosmin http://xmlns.com/foaf/0.1/name Cosmin Kosmin
http://example.org/maria http://xmlns.com/foaf/0.1/mbox mailto:maria.santana@stud.acs.upb.ro


In [27]:
# For each foaf:Person in the graph, print out their nickname's value
for person in g.subjects(RDF.type, FOAF.Person):
    for nick in g.objects(person, FOAF.nick):
        print(nick)

cosmos
maria


In [28]:
# Bind the FOAF namespace to a prefix for more readable output
g.bind("foaf", FOAF)

In [29]:
# Print all the data in the n3 format
print(g.serialize(format="n3"))
# g.serialize(destination="tbl.ttl")

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/cosmin> a foaf:Person ;
    foaf:mbox <mailto:cosmin@yahoo.com> ;
    foaf:name "Cosmin Kosmin" ;
    foaf:nick "cosmos"@en .

<http://example.org/maria> a foaf:Person ;
    foaf:mbox <mailto:maria.santana@stud.acs.upb.ro> ;
    foaf:name "Maria Santana" ;
    foaf:nick "maria"^^xsd:string .




To visualise our RDF, go to https://www.ldf.fi/service/rdf-grapher.

## More about RDFLib
When discussing about RDF data storing, we have the following classification:

- **Native RDF Stores**: RDF stores that implement their own database engine without reusing the storage and retrieval functionalities of other database management systems (e.g. `Apache Jena TDB`)
- **DBMS-backed Stores**: RDF Stores that use the storage and retrieval functionality provided by another database management system (e.g. `rdflib`, `Apache Jena SDB`)
- **Non-RDF DB support**
- **Hybrid Stores**: RDF Stores that supports both architectural styles (native and DBMS-backed) (e.g. `rdf4j`)

### Latest release
RDFlib 6.1.1 on 20 Dec 2021### RDFlib Family of packages
The RDFlib community maintains many RDF-related Python code repositories with different purposes. For example:

 - rdflib - the RDFLib core
 - sparqlwrapper - a simple Python wrapper around a SPARQL service to remotely execute your queries
 - pyLODE - An OWL ontology documentation tool using Python and templating, based on LODE.
 
Please see the list for all packages/repositories here:

https://github.com/RDFLib

### Latest release
RDFlib 6.1.1 on 20 Dec 2021

### Persistence

RDFLib provides an [abstracted Store API](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.store.Store) for persistence of RDF and Notation 3.

The [Graph](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph) class works with instances of this API (as the first argument to its constructor) for triple-based management of an RDF store including: garbage collection, transaction management, update, pattern matching, removal, length, and database management ([open](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph.open)() / [close](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph.close)() / [destroy](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.html#rdflib.graph.Graph.destroy)()).


###  Stores currently shipped with core RDFLib

 - [Memory](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.plugins.stores.html#rdflib.plugins.stores.memory.Memory) - not persistent!
 - [BerkeleyDB](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.plugins.stores.html#rdflib.plugins.stores.berkeleydb.BerkeleyDB) - on disk persistence via Python’s berkeleydb package
 - [SPARQLStore](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.plugins.stores.html#rdflib.plugins.stores.sparqlstore.SPARQLStore) - a read-only wrapper around a remote SPARQL Query endpoint
 - [SPARQLUpdateStore](https://rdflib.readthedocs.io/en/stable/apidocs/rdflib.plugins.stores.html#rdflib.plugins.stores.sparqlstore.SPARQLUpdateStore) - a read-write wrapper around a remote SPARQL query/update endpoint pair


See https://rdflib.readthedocs.io/en/stable/persistence.html


### Example 3 - BerkeleyDB for persistence

Please make sure you have installed `Oracle Berkeley DB` before running next cells.

For MacOS, you can do it with brew:
```bash

brew install berkeley-db
```

When installation is completed, install the python package `berkeleydb`

In [30]:
!(YES_I_HAVE_THE_RIGHT_TO_USE_THIS_BERKELEY_DB_VERSION=1 pip install berkeleydb)
# --berkeley-db=/usr/local/Cellar/berkeley-db

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m


In [64]:
from rdflib import Graph

graph = Graph(store="BerkeleyDB")

First time, we will create the store.

In [65]:
graph.open("../data/berkeleydb_rdflib_store", create=True)


1

Now, we will store the data that we created earlier in our new persistent store.

In [66]:
data = g.serialize(format="ttl")
print(data)

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/cosmin> a foaf:Person ;
    foaf:mbox <mailto:cosmin@yahoo.com> ;
    foaf:name "Cosmin Kosmin" ;
    foaf:nick "cosmos"@en .

<http://example.org/maria> a foaf:Person ;
    foaf:mbox <mailto:maria.santana@stud.acs.upb.ro> ;
    foaf:name "Maria Santana" ;
    foaf:nick "maria"^^xsd:string .




In [68]:
graph.parse(data=data, format="ttl")

<Graph identifier=N38b51482611c4a658c0d9ccad0e32159 (<class 'rdflib.graph.Graph'>)>

Now, graph is persisted on-disk  

In [69]:
# when done!
graph.close()