# Knowledge Graphs and Semantic Technologies -- RDF tutorial

In this tutorial we'll learn the basics of interacting with RDF graphs with Python. We'll be using rdflib for this, a widely used Ptyhon library for RDF (all documentation can be found [here](https://rdflib.readthedocs.io/en/stable/index.html))

## Imports
These are the main classes and types we'll be using from rdflib

In [1]:
import sys

from rdflib import Graph, ConjunctiveGraph, Literal, BNode, Namespace, RDF, URIRef, RDFS
from rdflib.namespace import DC, FOAF

import pprint


## Loading data remotely and from files

rdflib accepts importing RDF data from a variety of sources, either locally from a file (including an extensive support of serializations), or remotely via a URI (this is a great way of checking practically if URIs return RDF according to the 3rd Linked Data principle).

A Graph object is always required to load triples.
**Note**: to load quads, and hence supporting named graphs, you'll need to use an instance of ConjunctiveGraph instead

**Exercise 1** 

For each step, use a different cell: 
1. create two graphs using rdflib:
    - and load one with triples from the site https://csarven.ca/ and/or http://www.w3.org/People/Berners-Lee/card 
    - load one with triples from ./data/ingredients.rdf. 

In [2]:
#TIP: look at the documentation of the rdflib library for how to LOAD and PARSE a graph - https://rdflib.readthedocs.io/en/stable/gettingstarted.html
graph = Graph()
ingredients_graph = Graph()
result = graph.parse("http://www.w3.org/People/Berners-Lee/card")
ingredients = ingredients_graph.parse("data/ingredients.rdf")

print("Graph has %s statements." % len(ingredients_graph))

Graph has 837 statements.


## Serialising and saving RDF graphs

There are different formats for storing RDF triples. Semantically, these mean the same, they differ only in their syntax. 


Use the function Graph.serialize(format). 

**Exercise 2**

1. serialise one of the graphs to the .ttl, .xml and .nt format, and print the first n lines to compare the syntax
1. save your graph in the turtle format to the ./data/ folder

In [3]:
#serialize the chosen graph
n=5
ttl_data = ingredients_graph.serialize(format="ttl")
print("Turtle format:")
for line in ttl_data.splitlines()[:n]:
    print(line)
print(" ")

xml_data = ingredients_graph.serialize(format="xml")
print("Xml format:")
for line in xml_data.splitlines()[:n]:
    print(line)
print(" ")

nt_data = ingredients.serialize(format="nt")
print("Nt format:")
for line in nt_data.splitlines()[:n]:
    print(line)

#save the graph in ttl format
ingredients_graph.serialize(destination="data/ingredients.ttl", format="ttl")


Turtle format:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ind: <http://purl.org/heals/ingredient/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 
Xml format:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
   xmlns:dcterms="http://purl.org/dc/terms/"
   xmlns:owl="http://www.w3.org/2002/07/owl#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 
Nt format:
<http://purl.org/heals/ingredient/AppleCiderVinegar> <http://purl.org/heals/food/hasGluten> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://purl.org/heals/ingredient/Coconut> <http://www.w3.org/2004/02/skos/core#definition> "the large, hard-shelled seed of the coconut palm, lined with a white edible meat, and containing a milky liquid" .
<http://purl.org/heals/ingredient/Tart> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#NamedIndividual> .
<http://purl.or

<Graph identifier=N2bddf24bf4a34ab3a34c195d15387000 (<class 'rdflib.graph.Graph'>)>

##  Merging graphs

Merging graphs can be done via sequential parsings or by the overloaded operator +

**Note:** Set-theoretic graph semantics apply

The Food knowledge graph FoodKG contains a graph of statements about ingredients, as well as a graph with statements about recipes. 

**Exercise 3**: 

1. load ./data/ingredients.rdf and ./data/ghostbusters.ttl into a single graph, either by sequential parsing or using the operator +.

2. count the number of statements in each graph, and the intersection of the two graphs. 

3. check whether the combined graph is connected (using graph.connected()) 

4. load ./data/ingredients.rdf and ./data/recipes.rdf into a single graph, either by sequential parsing or using the operator +. 

5. count the number of statements in each graph, and the intersection of the two graphs. 

6. check whether the combined graph is connected (using graph.connected()). Explain the result with respect to point 3! 

In [4]:
# + operator
ingredients = Graph()
ingredients.parse("data/ingredients.rdf")
print("ingredients has {} triples".format(len(ingredients)))

recipes = Graph()
recipes.parse("data/ghostbusters.ttl", format="ttl")
print("Ghost graph has {} triples".format(len(recipes)))

graph = ingredients + recipes
print("The union has {} triples".format(len(graph)))

intersection = ingredients & recipes
print("The intersection has {} triples".format(len(intersection)))

print(f"is graph connected : {graph.connected()}")

ingredients has 837 triples
Ghost graph has 52337 triples
The union has 53174 triples
The intersection has 0 triples
is graph connected : False


In [5]:
# + operator
ingredients = Graph()
ingredients.parse("data/ingredients.rdf")
print("ingredients has {} triples".format(len(ingredients)))

recipes = Graph()
recipes.parse("data/recipes.rdf")
print("Ghost graph has {} triples".format(len(recipes)))

graph = ingredients + recipes
print("The union has {} triples".format(len(graph)))

intersection = ingredients & recipes
print("The intersection has {} triples".format(len(intersection)))

print(f"is graph connected : {graph.connected()}")


ingredients has 837 triples
Ghost graph has 480 triples
The union has 1299 triples
The intersection has 18 triples
is graph connected : False


Both the merged graphs are not fully connected, because ...

## Namespaces 

Remind yourself what namespaces are. 

In RDFLib, the namespace module defines many common namespaces such as RDF, RDFS, OWL, FOAF, SKOS, etc., but you can also easily add URIs within a different namespace:


In [6]:
TEACH = Namespace("http://linkedscience.org/teach/ns#")
TEACH.Teacher

rdflib.term.URIRef('http://linkedscience.org/teach/ns#Teacher')

Check out the specification to see which other terms are used within the TEACH namespace. http://linkedscience.org/teach/ns/#sec-specification. 
You can use a NamespaceManager to bind a prefix to a namespace: 

In [7]:
g = Graph()
g.namespace_manager.bind('TEACH', URIRef('http://linkedscience.org/teach/ns#'))
TEACH.Teacher.n3(g.namespace_manager)

'TEACH:Teacher'

In [8]:
KRW = Namespace("http://krw.vu.nl/data#")

#creating individuals within your namespace
KRW.Teacher
KRW.Student

rdflib.term.URIRef('http://krw.vu.nl/data#Student')

**Exercise 4:**
1. create your own namespace (can be made up) 

In [9]:
MY = Namespace("https://github.com/")
MY.FieBergli
MY.repos

rdflib.term.URIRef('https://github.com/repos')


## Creating RDF triples

Triples are added to the graph with the function Graph.add()

The parameter is a triple given in a Python **tuple** (subject, predicate, object)

Notice the namespace convenience syntax!

**Exercise 5:** 

1. create a new graph and add triples (~10) within your made-up namespace using Graph.add(). These triples can be about anything, for instance ingredients or recipes. Make sure they include the predicates RDF.type, RDFS.label and RDFS.subClassOf

2. open yourRDF.ttl, and write your triples out by hand in a syntax of your choice (turtle is recommended, notice the file extension!). Load the triples here with rdflib. 

In [31]:
#create graph
food_graph = Graph()

#example namespace
EX = Namespace("http://cookbook.org/")

# Add triples using store's add method.
food_graph.add((EX.ice_cream, RDF.type, EX.Dessert))
food_graph.add((EX.ice_cream, RDFS.label, Literal("Ice Cream")))
food_graph.add((EX.olives, RDF.type, EX.Starter))
food_graph.add((EX.Starter, RDFS.subClassOf, EX.Meal))
food_graph.add((EX.boiled_egg, RDF.type, EX.Breakfast))
food_graph.add((EX.Pizza, RDFS.subClassOf, EX.Dinner))
food_graph.add((EX.margaritha, RDF.type, EX.Pizza))
food_graph.add((EX.margaritha, RDFS.label, Literal("Margaritha")))
food_graph.add((EX.Cake, RDFS.subClassOf, EX.Dessert))
food_graph.add((EX.brownie, RDF.type, EX.Cake))
food_graph.add((EX.coffee, RDF.type, EX.Drink))

<Graph identifier=N2f0b1361584e486fbe481a820fc0dace (<class 'rdflib.graph.Graph'>)>

In [32]:
#save the graph to destination in ttl format - myRDF.ttl (look at RDFLib documentation - Loading and saving RDF)
save = food_graph.serialize(destination="cookbook_RDF.ttl")

In [33]:
#load the saved graph and print it in ttl format
cookbook = Graph()
cookbook.parse("cookbook_RDF.ttl")
print(cookbook.serialize(format='ttl'))

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://cookbook.org/boiled_egg> a <http://cookbook.org/Breakfast> .

<http://cookbook.org/brownie> a <http://cookbook.org/Cake> .

<http://cookbook.org/coffee> a <http://cookbook.org/Drink> .

<http://cookbook.org/ice_cream> a <http://cookbook.org/Dessert> ;
    rdfs:label "Ice Cream" .

<http://cookbook.org/margaritha> a <http://cookbook.org/Pizza> ;
    rdfs:label "Margaritha" .

<http://cookbook.org/olives> a <http://cookbook.org/Starter> .

<http://cookbook.org/Cake> rdfs:subClassOf <http://cookbook.org/Dessert> .

<http://cookbook.org/Pizza> rdfs:subClassOf <http://cookbook.org/Dinner> .

<http://cookbook.org/Starter> rdfs:subClassOf <http://cookbook.org/Meal> .




## Navigating graphs

rdflib uses iterators to navigate Graphs. The methods for navigating subjects, predicates and objects are Graph.subjects, Graph.predicates, Graph.objects

**Exercise 6:**

1. print all the triples in yourRDF.ttl
2. print all subjects in yourRDF.ttl
3. print all predicates in yourRDF.ttl
4. print all objects in yourRDF.ttl


In [34]:
#TIP you have to loop in the graph 
#6.1 
for s, p, o in cookbook.triples((None, None, None)):
    print(f'triple: {s, p, o}')

triple: (rdflib.term.URIRef('http://cookbook.org/boiled_egg'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://cookbook.org/Breakfast'))
triple: (rdflib.term.URIRef('http://cookbook.org/margaritha'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'), rdflib.term.Literal('Margaritha'))
triple: (rdflib.term.URIRef('http://cookbook.org/Cake'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#subClassOf'), rdflib.term.URIRef('http://cookbook.org/Dessert'))
triple: (rdflib.term.URIRef('http://cookbook.org/Starter'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#subClassOf'), rdflib.term.URIRef('http://cookbook.org/Meal'))
triple: (rdflib.term.URIRef('http://cookbook.org/ice_cream'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://cookbook.org/Dessert'))
triple: (rdflib.term.URIRef('http://cookbook.org/coffee'), rdflib.term.URIRef('http://www.w3.org/1999/02/22

In [35]:
#6.2 
for s in cookbook.subjects():
    print(f'subject: {s}')

subject: http://cookbook.org/boiled_egg
subject: http://cookbook.org/margaritha
subject: http://cookbook.org/Cake
subject: http://cookbook.org/Starter
subject: http://cookbook.org/ice_cream
subject: http://cookbook.org/coffee
subject: http://cookbook.org/olives
subject: http://cookbook.org/brownie
subject: http://cookbook.org/ice_cream
subject: http://cookbook.org/margaritha
subject: http://cookbook.org/Pizza


In [36]:
#6.3
for p in cookbook.predicates():
    print(f'predicate: {p}')

predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
predicate: http://www.w3.org/2000/01/rdf-schema#label
predicate: http://www.w3.org/2000/01/rdf-schema#subClassOf
predicate: http://www.w3.org/2000/01/rdf-schema#subClassOf
predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
predicate: http://www.w3.org/2000/01/rdf-schema#label
predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type
predicate: http://www.w3.org/2000/01/rdf-schema#subClassOf


In [37]:
#6.4
for o in cookbook.objects():
    print(f'object: {o}')

object: http://cookbook.org/Breakfast
object: Margaritha
object: http://cookbook.org/Dessert
object: http://cookbook.org/Meal
object: http://cookbook.org/Dessert
object: http://cookbook.org/Drink
object: http://cookbook.org/Starter
object: http://cookbook.org/Cake
object: Ice Cream
object: http://cookbook.org/Pizza
object: http://cookbook.org/Dinner


We can also filter the subjects, predicates and objects we want to retrieve, and match their values like in a database "join" operation


**Exercise 7:**

1. print all subject types in yourRDF.ttl
2. print all subject labels yourRDF.ttl

In [None]:
#7.1
for s,p,o in cookbook.triples( (None, RDF.type, None) ):
    print(f'subject type: {o}')

subject type: http://cookbook.org/Breakfast
subject type: http://cookbook.org/Cake
subject type: http://cookbook.org/Drink
subject type: http://cookbook.org/Dessert
subject type: http://cookbook.org/Pizza
subject type: http://cookbook.org/Starter


In [None]:
#7.2
for s, p, o in cookbook.triples((None, RDFS.label, None)):
    print(f'subject label: {o}')

subject label: Ice Cream
subject label: Margaritha


### Basic triple matching (almost querying!)

We use method Graph.triples and a Python tuple that acts as a mask for specifying our criteria

**Exercise 8:**

1. check whether a triple is in your graph -> print true or false
2. print all triples related to a certain subject in your graph
3. print all triples related to a certain object in your graph

In [None]:
#8.1 
print(True if (EX.ice_cream, RDF.type, EX.Dessert) in cookbook else False)
print(True if (EX.pizza, RDFS.subClassOf, EX.Breakfast) in cookbook else False)

True
False


In [48]:
#8.2
for s, p, o in cookbook.triples((EX.margaritha, None, None)):
    print(s, p, o )

http://cookbook.org/margaritha http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://cookbook.org/Pizza
http://cookbook.org/margaritha http://www.w3.org/2000/01/rdf-schema#label Margaritha


In [49]:
for s, p, o in cookbook.triples((None, None, EX.Dessert)):
    print(s, p, o)

http://cookbook.org/ice_cream http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://cookbook.org/Dessert
http://cookbook.org/Cake http://www.w3.org/2000/01/rdf-schema#subClassOf http://cookbook.org/Dessert


## Restaurant Exercise - Part 1

You are a chef in a restaurant, and you need to serve someone that is gluten intolerant. 

1. load the ./data/recipes.rdf and ./data/ingredients.rdf datasets in one graph
2. query your graph (as we did in previous exercises) to retrieve all recipes without gluten
3. query your graph for all recipes that you can make for your gluten intolerant guest. 
4. the guest asks you whether there are more options. Can you find the recipes for which an ingredient with gluten can be replaced, solely using pattern matching? (Hint: you need to write multiple of these pattern matching queries, and check the predicate __substitutesFor__) 
5. another guest is allergic to pecan nuts, which recipes could you serve them (including those for which pecan nuts can be replaced) 

**Note that this is a bit tedious: later on, we will be querying more complicated patterns with SPARQL!**

## HI ontology exploration

In your project, you will be working with a Hybrid Intelligence (HI) ontology. This is an opportunity for you to get acquainted with its structure. Applying the skills from the exercises above perform the following actions:

1. Load the HI ontology from the data folder (hi_ontology.ttl) with RDFlib.
2. Create an "HI" Namespace.
3. Count the number of triples.
4. List all subjects.
5. List all predicates.
6. List all pairs of subjects and their corresponding objects linked by a rdf:type predicate.