# **Creating RDF triples**
 This notebook was adapted from Patimir's code: https://github.com/Patimir/yt-notebooks/blob/main/Creating_RDF_triples.ipynb



---


RDF allows us to make statements about resources. A statement always has the following structure:\
`<subject> <predicate> <object>`.

An RDF statement expresses a relationship between two resources. The subject and the object represent the two resources being related; the predicate represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a property. Because RDF statements consist of three elements they are called triples.

In [1]:
!pip install rdflib

Collecting rdflib
  Downloading rdflib-6.1.1-py3-none-any.whl (482 kB)
[?25l[K     |▊                               | 10 kB 15.9 MB/s eta 0:00:01[K     |█▍                              | 20 kB 16.4 MB/s eta 0:00:01[K     |██                              | 30 kB 11.5 MB/s eta 0:00:01[K     |██▊                             | 40 kB 10.1 MB/s eta 0:00:01[K     |███▍                            | 51 kB 8.1 MB/s eta 0:00:01[K     |████                            | 61 kB 8.4 MB/s eta 0:00:01[K     |████▊                           | 71 kB 7.9 MB/s eta 0:00:01[K     |█████▍                          | 81 kB 8.7 MB/s eta 0:00:01[K     |██████                          | 92 kB 8.1 MB/s eta 0:00:01[K     |██████▉                         | 102 kB 7.5 MB/s eta 0:00:01[K     |███████▌                        | 112 kB 7.5 MB/s eta 0:00:01[K     |████████▏                       | 122 kB 7.5 MB/s eta 0:00:01[K     |████████▉                       | 133 kB 7.5 MB/s eta 0:00:01[K 

## Creating Nodes

The subjects and objects of the triples make up the nodes in the graph where the nodes are URI references, Blank Nodes or Literals. In RDFLib, these node types are represented by the classes **URIRef**, **BNode**, **Literal**. *URIRefs* and *Bnodes* can both be thought of as resources, such a person, a company, a website, etc.
- A *BNode* is a node where the exact URI is not known.
- A *URIRef* is a node where the exact URI is know. *URIRefs* are also used to represent the properties/predicates in the RDF graph.
- *Literals* represent attribute values, such as a name, a date, a number, etc. The most common literal values are XML data types, e.g. string, int..

## Example RDF Graph

![RDF Graph](https://raw.githubusercontent.com/MaastrichtU-IDS/UM_KEN4256_KnowledgeGraphs/master/2022-resources/lab3-rdf-basics/RDFGraphExample.png)

### (Informal) Representation of the Graph
``` 
<Vincent_van_Gogh> <is a> <Artist>. 
<Vincent_van_Gogh> <born in> <Zundert>. 
<Vincent_van_Gogh> <is born on> <the 30th of March 1853>.
<Zundert> <is a> <City>.
<Zundert> <part of> <the Netherlands>. 
<The Starry Night> <is created by> <Vincent_van_Gogh>. 
<The Starry Night> <is a> <Artwork>.
```

In [2]:
from rdflib import URIRef, BNode, Literal, Namespace
from rdflib.namespace import XSD, RDF, RDFS

#define a namespace
EX = Namespace('http://example.org/')
starry_night = EX['Starry_Night']
#starry_night = URIRef('http://example.org/Starry_Night')

vincent_van_gogh = EX['Vincent_van_Gogh']
zundert = EX['Zundert']
netherlands = EX['Netherlands']

birth_date = Literal("1853-03-30", datatype=XSD['date'])
title = Literal('The Starry Night', lang='en')

In [3]:
title

rdflib.term.Literal('The Starry Night', lang='en')

## Add triples (statements) into a RDF Graph

In [4]:
from rdflib import Graph
# initialize a graph
g = Graph()

# Bind prefix to namespace
g.bind('ex', EX)

# individuals and types
g.add((vincent_van_gogh, RDF.type, EX['Artist']))
g.add((zundert, RDF.type, EX['City']))
g.add((netherlands, RDF.type, EX['Country']))
g.add((starry_night, RDF.type, EX['Artwork']))

# relations
g.add((vincent_van_gogh, EX['bornIn'], zundert))
g.add((zundert, EX['partOf'], netherlands))
g.add((vincent_van_gogh, EX['createdBy'], starry_night))

# data properties
g.add((vincent_van_gogh, EX['birthDate'], birth_date))
g.add((starry_night, RDFS.label, title))


<Graph identifier=Naeca2756ab8f42cfb3c05f87c40f0cec (<class 'rdflib.graph.Graph'>)>

### Serialize/save RDF graph

In [5]:
print(g.serialize(format='nt'))

<http://example.org/Netherlands> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Country> .
<http://example.org/Starry_Night> <http://www.w3.org/2000/01/rdf-schema#label> "The Starry Night"@en .
<http://example.org/Vincent_van_Gogh> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Artist> .
<http://example.org/Starry_Night> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Artwork> .
<http://example.org/Vincent_van_Gogh> <http://example.org/bornIn> <http://example.org/Zundert> .
<http://example.org/Zundert> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/City> .
<http://example.org/Vincent_van_Gogh> <http://example.org/createdBy> <http://example.org/Starry_Night> .
<http://example.org/Zundert> <http://example.org/partOf> <http://example.org/Netherlands> .
<http://example.org/Vincent_van_Gogh> <http://example.org/birthDate> "1853-03-30"^^<http://www.w3.org/2001/XMLSchema#date> .




## Use of namespaces


In [6]:
#Define the namespaces
WD = Namespace('https://www.wikidata.org/wiki/')
DB = Namespace('http://dbpedia.org/ontology/')
PAV = Namespace('http://purl.org/pav/')
starry_night = WD['Q45585']
vincent_van_gogh = WD['Q5582']
zundert = EX['Zundert']
netherlands = EX['Netherlands']

In [7]:
# Create a RDF Graph by binding preficies to namespaces
from rdflib import Graph
g = Graph()

# Bind prefix to namespace
g.bind('ex', EX)

# individuals and types
g.add((vincent_van_gogh, RDF.type, DB['Artist']))
g.add((zundert, RDF.type, DB['City']))
g.add((netherlands, RDF.type, EX['Country']))
g.add((starry_night, RDF.type, DB['Artwork']))

# relations
g.add((vincent_van_gogh, DB['birthPlace'], zundert))
g.add((zundert, DB['locatedInArea'], netherlands))
g.add((vincent_van_gogh, PAV['createdBy'], starry_night))

# data properties
g.add((vincent_van_gogh, DB['birthDate'], birth_date))
g.add((starry_night, RDFS.label, title))

<Graph identifier=Nd5f1ba510dcf4e1887bb8a4f6f7dad68 (<class 'rdflib.graph.Graph'>)>

In [9]:
print(g.serialize(format='ttl'))


@prefix ex: <http://example.org/> .
@prefix ns1: <http://purl.org/pav/> .
@prefix ns2: <http://dbpedia.org/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://www.wikidata.org/wiki/Q5582> a ns2:Artist ;
    ns2:birthDate "1853-03-30"^^xsd:date ;
    ns2:birthPlace ex:Zundert ;
    ns1:createdBy <https://www.wikidata.org/wiki/Q45585> .

ex:Netherlands a ex:Country .

ex:Zundert a ns2:City ;
    ns2:locatedInArea ex:Netherlands .

<https://www.wikidata.org/wiki/Q45585> a ns2:Artwork ;
    rdfs:label "The Starry Night"@en .




In [10]:
# As you can see ns1, ns2 namespace are not meaningful/readable.
# We need to bind prefixes to namespaces so that namespaces are more readable.
g = Graph()

# Bind prefix to namespace
g.bind('ex', EX)
# Bind prefix to namespace
g.bind('wd', WD)
g.bind('db', DB)
g.bind('pav', PAV)

# individuals and types
g.add((vincent_van_gogh, RDF.type, DB['Artist']))
g.add((zundert, RDF.type, DB['City']))
g.add((netherlands, RDF.type, EX['Country']))
g.add((starry_night, RDF.type, DB['Artwork']))

# relations
g.add((vincent_van_gogh, DB['birthPlace'], zundert))
g.add((zundert, DB['locatedInArea'], netherlands))
g.add((vincent_van_gogh, PAV['createdBy'], starry_night))

# data properties
g.add((vincent_van_gogh, DB['birthDate'], birth_date))
g.add((starry_night, RDFS.label, title))

<Graph identifier=Nc8f9939091fd4803a4c594a9dd0681dc (<class 'rdflib.graph.Graph'>)>

In [None]:
print(g.serialize(format='ttl'))

In [None]:
import os
os.getcwd()

'/content'

### save the graph in the Turtle syntax

In [11]:
g.serialize(destination='output.ttl', format='ttl')

<Graph identifier=Nc8f9939091fd4803a4c594a9dd0681dc (<class 'rdflib.graph.Graph'>)>

In [None]:
#save the graph in the Ntriples syntax
g.serialize(destination='output.nt', format='nt')

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


  "NTSerializer always uses UTF-8 encoding. "


In [None]:
#save the graph in the Json-LD syntax
g.serialize(destination='output.json-ld', format='json-ld')

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [12]:
#loop through each triple in the graph
for (sub, pred, obj) in g:
  print (sub, pred, obj)

http://example.org/Netherlands http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/Country
https://www.wikidata.org/wiki/Q45585 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Artwork
http://example.org/Zundert http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/City
https://www.wikidata.org/wiki/Q5582 http://purl.org/pav/createdBy https://www.wikidata.org/wiki/Q45585
https://www.wikidata.org/wiki/Q5582 http://dbpedia.org/ontology/birthDate 1853-03-30
https://www.wikidata.org/wiki/Q5582 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Artist
https://www.wikidata.org/wiki/Q5582 http://dbpedia.org/ontology/birthPlace http://example.org/Zundert
http://example.org/Zundert http://dbpedia.org/ontology/locatedInArea http://example.org/Netherlands
https://www.wikidata.org/wiki/Q45585 http://www.w3.org/2000/01/rdf-schema#label The Starry Night


In [None]:
#load or parse the RDF file from local drive
g.parse('output.ttl',format='ttl') 

<Graph identifier=Nd45edea276d34fb5b46e9d39bc2b0b01 (<class 'rdflib.graph.Graph'>)>

## Load or parse the RDF file from web

In [13]:

g.parse('https://dbpedia.org/resource/Maastricht') 

<Graph identifier=Nc8f9939091fd4803a4c594a9dd0681dc (<class 'rdflib.graph.Graph'>)>

In [14]:
#loop through each triple in the graph
for (sub, pred, obj) in g:
  print (sub, pred, obj)

http://dbpedia.org/resource/Artifort http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Maastricht
http://dbpedia.org/resource/Timeline_of_paleontology http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Maastricht
http://dbpedia.org/resource/Maastricht http://xmlns.com/foaf/0.1/depiction http://commons.wikimedia.org/wiki/Special:FilePath/Shang-Chi_Master_of_Kung_Fu_Vol_1_126.png
http://dbpedia.org/resource/Maastricht http://dbpedia.org/property/augHighC 23
http://dbpedia.org/resource/Willemina_Ogterop http://dbpedia.org/property/birthPlace http://dbpedia.org/resource/Maastricht
http://dbpedia.org/resource/Jean_de_Hocsem http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Maastricht
http://dbpedia.org/resource/Maastricht http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Ryanair
http://dbpedia.org/resource/Karin_Stevens http://dbpedia.org/ontology/birthPlace http://dbpedia.org/resource/Maastricht
http

In [15]:
# print size of the graph
print (len(g))

3165


In [None]:
#list all subjects that are different than Maastricht
for sub in g.subjects():
  if sub != URIRef('http://dbpedia.org/resource/Maastricht'):
    print (sub)