### <font color='red'>NOTE: Please do not edit this file. </font> Go to <font color='blue'>*File > Save a copy in Drive*</font>.

# **openHPI Course: Knowledge Graphs 2023**
## **Week 2: Basic Knowledge Graph Infrastructure**
### **Notebook 2.1: RDFLib**

---


This is the python notebook for week 2 (Basic Knowledge Graph Infrastructure) in the openHPI Course **Knowledge Graphs 2023**.

In this colab notebook you will learn how to make use of the RDFlib library in python for RDF serialization and graph visualization.

## RDFlib

**[RDFlib](https://github.com/RDFLib/rdflib)** is is a Python package for working with RDF. It contains:
* Parsers & Serializers
  * for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, JSON-LD, HexTuples, RDFa and Microdata
* Store implementations
  * memory stores
  * persistent, on-disk stores, using databases such as BerkeleyDB
  * remote SPARQL endpoints
* Graph interface
  * to a single graph
  * or to multiple Named Graphs within a dataset
*SPARQL 1.1 implementation
  * both Queries and Updates are supported



We have to install the following packages:


*   **RDFlib** for working with RDF
*   **PyDotPlus**, **Graphviz** and **kglab** for visualization



In [1]:
%%capture
!pip3 install rdflib pydotplus graphviz kglab

In [6]:
pip install pydotplus


Defaulting to user installation because normal site-packages is not writeable
Collecting pydotplus
  Using cached pydotplus-2.0.2.tar.gz (278 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: pydotplus
  Building wheel for pydotplus (setup.py) ... [?25ldone
[?25h  Created wheel for pydotplus: filename=pydotplus-2.0.2-py3-none-any.whl size=24575 sha256=788ee8022ac6550d3f2368f92fbb2babbbd881426ee0a35abd26ccc0557e8b17
  Stored in directory: /Users/gokcesoylu/Library/Caches/pip/wheels/89/e5/de/6966007cf223872eedfbebbe0e074534e72e9128c8fd4b55eb
Successfully built pydotplus
Installing collected packages: pydotplus
Successfully installed pydotplus-2.0.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip

In [2]:
import rdflib
from rdflib import Graph, Namespace
from rdflib.namespace import RDF, FOAF, RDFS, XSD #import already in RDFlib integrated namespaces
from rdflib import URIRef, BNode, Literal         #in case we need URIs, blank nodes, or literals

#the rest is for visualization
import io
import pydotplus
from IPython.display import display, Image
from rdflib.tools.rdf2dot import rdf2dot

First, let's **create an RDF graph** about movies. The example is given in RDF Turtle serialization.

In [3]:
g = Graph()
# create graph using turtle
turtledata = """\
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

dbr:John_Travolta 		rdf:type   				dbo:Actor ;
      			  				dbo:awards 				dbr:67th_Academy_Awards ;
      			  				ex:portrays				dbr:Vincent_Vega .
dbr:Pulp_Fiction  		rdf:type  				dbo:Film ;
											rdfs:label				"Pulp_Fiction"@en ,
											"Кримінальне чтиво"@ua ;
      			  				dbo:genre     			dbr:Neo_noir ;
      			  				ex:playsIn 				dbr:Los_Angeles ;
      			  				ex:fictionalCharacter 	dbr:Vincent_Vega ;
      			  				dbo:starring  			dbr:John_Travolta ,
      			  				dbr:Uma_Thurman ,
      			  				dbr:Bruce_Willis .
dbr:Vincent_Vega 	  	rdf:type 				dbo:Fictional_character .
dbr:Quentin_Tarantino rdf:type 				dbo:Director .
dbr:Uma_Thurman				rdf:type 				dbo:Actor ;
											ex:portrays				dbr:Mia_Wallace ;
											dbo:awards  			dbr:67th_Academy_Awards .
dbr:Bruce_Willis			rdf:type   				dbo:Actor .
dbr:The_Green_Mile		rdf:type  				dbo:Film ;
											rdfs:label				"The Green Mile"@en ,
											"Зелена миля"@ua ;
											dbo:starring			dbr:Tom_Hanks,
											dbr:David_Morse .
dbr:Tom_Hanks					rdf:type 				dbo:Actor .
dbr:David_Morse				rdf:type 				dbo:Actor .
dbr:Tenet							rdf:type  				dbo:Film ;
											rdfs:label				"Tenet"@en ;
											dbo:starring 			dbr:Robert_Pattinson ,
											dbr:Elizabeth_Debicki ,
											dbr:John_David_Washington .
dbr:Robert_Pattinson 	rdf:type 				dbo:Actor .
dbr:Elizabeth_Debicki rdf:type 				dbo:Actor .
dbr:John_David_Washington rdf:type 				dbo:Actor ."""

g.parse(data=turtledata, format="turtle")

<Graph identifier=Ne9c51d57221548fc84265f308d83f02e (<class 'rdflib.graph.Graph'>)>

Let's print out all the triples in our graph.

In [4]:
#print all triples
for s, p, o in g:
   print((s, p, o))

(rdflib.term.URIRef('http://dbpedia.org/resource/The_Green_Mile'), rdflib.term.URIRef('http://dbpedia.org/ontology/starring'), rdflib.term.URIRef('http://dbpedia.org/resource/David_Morse'))
(rdflib.term.URIRef('http://dbpedia.org/resource/Pulp_Fiction'), rdflib.term.URIRef('http://example.org/fictionalCharacter'), rdflib.term.URIRef('http://dbpedia.org/resource/Vincent_Vega'))
(rdflib.term.URIRef('http://dbpedia.org/resource/Pulp_Fiction'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'), rdflib.term.Literal('Pulp_Fiction', lang='en'))
(rdflib.term.URIRef('http://dbpedia.org/resource/Pulp_Fiction'), rdflib.term.URIRef('http://example.org/playsIn'), rdflib.term.URIRef('http://dbpedia.org/resource/Los_Angeles'))
(rdflib.term.URIRef('http://dbpedia.org/resource/Tenet'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'), rdflib.term.Literal('Tenet', lang='en'))
(rdflib.term.URIRef('http://dbpedia.org/resource/John_Travolta'), rdflib.term.URIRef('http://examp

In [5]:
#Save the graph
g.serialize(destination="filmgraph.ttl")

<Graph identifier=Ne9c51d57221548fc84265f308d83f02e (<class 'rdflib.graph.Graph'>)>

### RDF Graph Serialization
We can select different serialization formats.

In [None]:
print(g.serialize(format="xml"))    #print RDF/XML

### Visualizing the Graph

In [16]:
pip install kglab


Defaulting to user installation because normal site-packages is not writeable
[0mCollecting kglab
  Using cached kglab-0.6.6-py3-none-any.whl.metadata (14 kB)
Collecting aiohttp>=3.8 (from kglab)
  Using cached aiohttp-3.11.10-cp39-cp39-macosx_11_0_arm64.whl.metadata (7.7 kB)
Collecting chocolate>=0.0.2 (from kglab)
  Using cached chocolate-0.0.2-py3-none-any.whl.metadata (1.2 kB)
Collecting csvwlib>=0.3.2 (from kglab)
  Using cached csvwlib-0.3.2-py3-none-any.whl.metadata (4.9 kB)
Collecting cryptography>=35.0 (from kglab)
  Using cached cryptography-44.0.0-cp39-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Collecting fsspec>=2022.2 (from fsspec[gs,s3]>=2022.2->kglab)
  Using cached fsspec-2024.10.0-py3-none-any.whl.metadata (11 kB)
Collecting gcsfs>=2022.2 (from kglab)
  Using cached gcsfs-2024.10.0-py2.py3-none-any.whl.metadata (1.6 kB)
[0mCollecting icecream>=2.1 (from kglab)
  Using cached icecream-2.1.3-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting morph-kgc>=2.0.0 (fro

[kglab](https://github.com/DerwenAI/kglab):
The kglab library provides a simple abstraction layer in Python 3.7+ for building knowledge graphs, leveraging Pandas, NetworkX, RAPIDS, RDFLib, Morph-KGC, pythonPSL, and many more.

In [None]:
pip install rdflib matplotlib numpy pandas pyvis


In [17]:
pip show kglab


[0mNote: you may need to restart the kernel to use updated packages.


In [18]:
import kglab
kg = kglab.KnowledgeGraph().load_rdf("filmgraph.ttl")


ModuleNotFoundError: No module named 'kglab'

In [14]:
#Let's measure the graph and print all numbers of nodes and edges.
measure = kglab.Measure()
measure.measure_graph(kg)
print("edges: {}\n".format(measure.get_edge_count()))
print("nodes: {}\n".format(measure.get_node_count()))

NameError: name 'kglab' is not defined

In [9]:
#Nodes with a dbr prefix should be orange and nodes with a dbo prefix should be blue to distinguish classes and instances
VIS_STYLE = {
    "dbr": {
        "color": "orange",
        "size": 40,
    },
    "dbo":{
        "color": "blue",
        "size": 50,
    },
}

subgraph = kglab.SubgraphTensor(kg)
pyvis_graph = subgraph.build_pyvis_graph(notebook=True, style=VIS_STYLE)



In [21]:
#Next, we can create the html document containing the visualization

from google.colab import files

pyvis_graph.force_atlas_2based()
pyvis_graph.show("/content/filmgraph.html")

#To display the graph, download the html and open it in the browser of your choice.
files.download('/content/filmgraph.html')

ModuleNotFoundError: No module named 'google.colab'

Here is another graph visualization that simply creates a png.
This visualization is more static, but also shows the URIs for all resources.


In [19]:
# Helper function for visualizing RDF graphs
def visualize(g):
    stream = io.StringIO()
    rdf2dot(g, stream, opts = {display})
    dg = pydotplus.graph_from_dot_data(stream.getvalue())
    png = dg.create_png()
    display(Image(png))

In [20]:
visualize(g)

InvocationException: GraphViz's executables not found