# **Creating RDF triples**
## This notebook was adapted from Patimir's code: https://github.com/Patimir/yt-notebooks/blob/main/Creating_RDF_triples.ipynb

RDF allows us to make statements about resources. A statement always has the following structure:\
`<subject> <predicate> <object>`.

An RDF statement expresses a relationship between two resources. The subject and the object represent the two resources being related; the predicate represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a property. Because RDF statements consist of three elements they are called triples.

In [2]:
!pip install rdflib

Collecting rdflib
  Downloading rdflib-6.1.1-py3-none-any.whl (482 kB)
[?25l[K     |▊                               | 10 kB 24.9 MB/s eta 0:00:01[K     |█▍                              | 20 kB 27.6 MB/s eta 0:00:01[K     |██                              | 30 kB 17.0 MB/s eta 0:00:01[K     |██▊                             | 40 kB 12.0 MB/s eta 0:00:01[K     |███▍                            | 51 kB 8.1 MB/s eta 0:00:01[K     |████                            | 61 kB 8.5 MB/s eta 0:00:01[K     |████▊                           | 71 kB 7.2 MB/s eta 0:00:01[K     |█████▍                          | 81 kB 8.0 MB/s eta 0:00:01[K     |██████                          | 92 kB 6.7 MB/s eta 0:00:01[K     |██████▉                         | 102 kB 7.3 MB/s eta 0:00:01[K     |███████▌                        | 112 kB 7.3 MB/s eta 0:00:01[K     |████████▏                       | 122 kB 7.3 MB/s eta 0:00:01[K     |████████▉                       | 133 kB 7.3 MB/s eta 0:00:01[K 

## Creating Nodes

The subjects and objects of the triples make up the nodes in the graph where the nodes are URI references, Blank Nodes or Literals. In RDFLib, these node types are represented by the classes **URIRef**, **BNode**, **Literal**. *URIRefs* and *Bnodes* can both be thought of as resources, such a person, a company, a website, etc.
- A *BNode* is a node where the exact URI is not known.
- A *URIRef* is a node where the exact URI is know. *URIRefs* are also used to represent the properties/predicates in the RDF graph.
- *Literals* represent attribute values, such as a name, a date, a number, etc. The most common literal values are XML data types, e.g. string, int..

## Example RDF Graph

![RDF Graph](https://raw.githubusercontent.com/MaastrichtU-IDS/UM_KEN4256_KnowledgeGraphs/master/2022-resources/lab3-rdf-basics/RDFGraphExample.png)

### (Informal) Representation of the Graph
`<Vincent_van_Gogh> <is a> <Artist>.`\
`<Vincent_van_Gogh> <born in> <Zundert>.`\
`<Vincent_van_Gogh> <is born on> <the 30th of March 1853>.`\
`<Zundert> <is a> <City>.`\
`<Zundert> <part of> <the Netherlands>.` \
`<The Starry Night> <is created by> <Vincent_van_Gogh>.` \
`<The Starry Night> <is a> <Artwork>`

In [3]:
from rdflib import URIRef, BNode, Literal, Namespace
from rdflib.namespace import XSD, RDF, RDFS

)
#define a namespace
EX = Namespace('http://example.org/')
starry_night = EX['Starry_Night']
#starry_night = URIRef('http://example.org/Starry_Night')

vincent_van_gogh = EX['Vincent_van_Gogh']
zundert = EX['Zundert']
netherlands = EX['Netherlands']

birth_date = Literal("1853-03-30", datatype=XSD['date'])
title = Literal('The Starry Night', lang='en')

In [4]:
title

rdflib.term.Literal('The Starry Night', lang='en')

In [5]:
from rdflib import Graph
# initialize a graph
g = Graph()

# Bind prefix to namespace
g.bind('ex', EX)

# individuals and types
g.add((vincent_van_gogh, RDF.type, EX['Artist']))
g.add((zundert, RDF.type, EX['City']))
g.add((netherlands, RDF.type, EX['Country']))
g.add((starry_night, RDF.type, EX['Artwork']))

# relations
g.add((vincent_van_gogh, EX['bornIn'], zundert))
g.add((zundert, EX['partOf'], netherlands))
g.add((vincent_van_gogh, EX['createdBy'], starry_night))

# data properties
g.add((vincent_van_gogh, EX['birthDate'], birth_date))
g.add((starry_night, RDFS.label, title))


<Graph identifier=Nbb88fd4157dc4decbb93f5150d4532c6 (<class 'rdflib.graph.Graph'>)>

In [6]:
print(g.serialize(format='ttl'))

@prefix ex: <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:Vincent_van_Gogh a ex:Artist ;
    ex:birthDate "1853-03-30"^^xsd:date ;
    ex:bornIn ex:Zundert ;
    ex:createdBy ex:Starry_Night .

ex:Netherlands a ex:Country .

ex:Starry_Night a ex:Artwork ;
    rdfs:label "The Starry Night"@en .

ex:Zundert a ex:City ;
    ex:partOf ex:Netherlands .




In [7]:

WD = Namespace('https://www.wikidata.org/wiki/')
DB = Namespace('http://dbpedia.org/ontology/')
PAV = Namespace('http://purl.org/pav/')
starry_night = WD['Q45585']
vincent_van_gogh = WD['Q5582']
zundert = EX['Zundert']
netherlands = EX['Netherlands']

In [8]:
from rdflib import Graph
g = Graph()

# Bind prefix to namespace
g.bind('ex', EX)

# individuals and types
g.add((vincent_van_gogh, RDF.type, DB['Artist']))
g.add((zundert, RDF.type, DB['City']))
g.add((netherlands, RDF.type, EX['Country']))
g.add((starry_night, RDF.type, DB['Artwork']))

# relations
g.add((vincent_van_gogh, DB['birthPlace'], zundert))
g.add((zundert, DB['locatedInArea'], netherlands))
g.add((vincent_van_gogh, PAV['createdBy'], starry_night))

# data properties
g.add((vincent_van_gogh, DB['birthDate'], birth_date))
g.add((starry_night, RDFS.label, title))

<Graph identifier=N827d3aac209a49d7ab31952c4a5df58c (<class 'rdflib.graph.Graph'>)>

In [9]:
print(g.serialize(format='ttl'))


@prefix ex: <http://example.org/> .
@prefix ns1: <http://dbpedia.org/ontology/> .
@prefix ns2: <http://purl.org/pav/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://www.wikidata.org/wiki/Q5582> a ns1:Artist ;
    ns1:birthDate "1853-03-30"^^xsd:date ;
    ns1:birthPlace ex:Zundert ;
    ns2:createdBy <https://www.wikidata.org/wiki/Q45585> .

ex:Netherlands a ex:Country .

ex:Zundert a ns1:City ;
    ns1:locatedInArea ex:Netherlands .

<https://www.wikidata.org/wiki/Q45585> a ns1:Artwork ;
    rdfs:label "The Starry Night"@en .




### As you can see ns1, ns2 namespace are not meaningful/readable. We need to bind prefixes to namespaces so that namespaces are more readable.



In [10]:
g = Graph()

# Bind prefix to namespace
g.bind('ex', EX)
# Bind prefix to namespace
g.bind('wd', WD)
g.bind('db', DB)
g.bind('pav', PAV)

# individuals and types
g.add((vincent_van_gogh, RDF.type, DB['Artist']))
g.add((zundert, RDF.type, DB['City']))
g.add((netherlands, RDF.type, EX['Country']))
g.add((starry_night, RDF.type, DB['Artwork']))

# relations
g.add((vincent_van_gogh, DB['birthPlace'], zundert))
g.add((zundert, DB['locatedInArea'], netherlands))
g.add((vincent_van_gogh, PAV['createdBy'], starry_night))

# data properties
g.add((vincent_van_gogh, DB['birthDate'], birth_date))
g.add((starry_night, RDFS.label, title))

<Graph identifier=Nd45edea276d34fb5b46e9d39bc2b0b01 (<class 'rdflib.graph.Graph'>)>

In [11]:
print(g.serialize(format='ttl'))

@prefix db: <http://dbpedia.org/ontology/> .
@prefix ex: <http://example.org/> .
@prefix pav: <http://purl.org/pav/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wd: <https://www.wikidata.org/wiki/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

wd:Q5582 a db:Artist ;
    db:birthDate "1853-03-30"^^xsd:date ;
    db:birthPlace ex:Zundert ;
    pav:createdBy wd:Q45585 .

ex:Netherlands a ex:Country .

ex:Zundert a db:City ;
    db:locatedInArea ex:Netherlands .

wd:Q45585 a db:Artwork ;
    rdfs:label "The Starry Night"@en .




In [13]:
import os
os.getcwd()

'/content'

In [12]:
#save the graph in the Turtle syntax
g.serialize(destination='output.ttl', format='ttl')

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [16]:
#save the graph in the Ntriples syntax
g.serialize(destination='output.nt', format='nt')

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


  "NTSerializer always uses UTF-8 encoding. "


In [18]:
#save the graph in the Json-LD syntax
g.serialize(destination='output.json-ld', format='json-ld')

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [19]:
#loop through each triple in the graph
for (sub, pred, obj) in g:
  print (sub, pred, obj)

http://example.org/Zundert http://dbpedia.org/ontology/locatedInArea http://example.org/Netherlands
http://example.org/Netherlands http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://example.org/Country
https://www.wikidata.org/wiki/Q5582 http://dbpedia.org/ontology/birthDate 1853-03-30
https://www.wikidata.org/wiki/Q45585 http://www.w3.org/2000/01/rdf-schema#label The Starry Night
https://www.wikidata.org/wiki/Q5582 http://dbpedia.org/ontology/birthPlace http://example.org/Zundert
https://www.wikidata.org/wiki/Q45585 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Artwork
https://www.wikidata.org/wiki/Q5582 http://purl.org/pav/createdBy https://www.wikidata.org/wiki/Q45585
http://example.org/Zundert http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/City
https://www.wikidata.org/wiki/Q5582 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Artist


In [22]:
#load or parse the RDF file from local drive
g.parse('output.ttl',format='ttl') 

<Graph identifier=Nd45edea276d34fb5b46e9d39bc2b0b01 (<class 'rdflib.graph.Graph'>)>

In [24]:
#load or parse the RDF file from local drive
g.parse('https://dbpedia.org/resource/Maastricht') 

<Graph identifier=Nd45edea276d34fb5b46e9d39bc2b0b01 (<class 'rdflib.graph.Graph'>)>

In [None]:
#loop through each triple in the graph
for (sub, pred, obj) in g:
  print (sub, pred, obj)

In [29]:
# print size of the graph
print (len(g))

3165


In [None]:
#list all subjects that are different than Maastricht
for sub in g.subjects():
  if sub != URIRef('http://dbpedia.org/resource/Maastricht'):
    print (sub)