 KEN 3140: Lab 2 (RDF basics)

**In this lab we are going to:**

- Create RDF triples with rdflib
- Save these files into various RDF serialisation syntaxes
- Verify the validity of a given list of IRIs

**Creating RDF triples**

RDF allows us to make statements about resources. A statement has the following structure:
# `<subject> <predicate> <object>`.

An RDF statement expresses a relationship between two resources. The subject and the object represent the two resources being related; the predicate represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a property. Because RDF statements consist of three elements they are called triples.

In [25]:
!pip install rdflib



## Creating Nodes

The subjects and objects of the triples make up the nodes in the graph where the nodes are URI references, Blank Nodes or Literals. In RDFLib, these node types are represented by the classes **URIRef**, **BNode**, **Literal**. *URIRefs* and *Bnodes* can both be thought of as resources, such a person, a company, a website, etc.
- A *BNode* is a node where the exact URI is not known.
- A *URIRef* is a node where the exact URI is know. *URIRefs* are also used to represent the properties/predicates in the RDF graph.
- *Literals* represent attribute values, such as a name, a date, a number, etc. The most common literal values are XML data types, e.g. string, int.


In [1]:
from rdflib import URIRef, BNode, Literal, Namespace
from rdflib.namespace import FOAF, DCTERMS, XSD, RDF, SDO

#URIRef
remzi= URIRef('http://maastrichtuniversity.nl/Remzi')
computerScientist = URIRef('http://maastrichtuniversity.nl/Computer_Scientist')

#URI= Namespace + identifier

#URI for entity Remzi: http://maastrichtuniversity.nl/Remzi
UM = Namespace('http://maastrichtuniversity.nl/')

#URI for entity computerScientist: http://maastrichtuniversity.nl/computerScientist
remzi = UM['Remzi']
computerScientist = UM['Computer_Scientist']



Task: Create entities for mona_lisa, Leonardo davinci and lajoconde.

In [27]:
# mona_lisa = URIRef('http://www.wikidata.org/entity/Q12418')
# davinci = URIRef('http://dbpedia.org/resource/Leonardo_da_Vinci')
# lajoconde = URIRef('http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619')

In [28]:
name = Literal("Nicholas")  # the name 'Nicholas', as a string

age = Literal(39, datatype=XSD.integer)  # the number 39, as an integer

bn = BNode()


In [29]:
from rdflib import Graph

#initialise an empty RDF graph
g = Graph()


**Example:**

create a triple with rdflib for this sentence: Remzi is computer scientist.

In [30]:
# Bind prefix to namespace
g.bind('um', UM)

g.add((remzi, RDF.type, computerScientist))
g.add((remzi, FOAF.firstName, Literal('Remzi')))
g.add((remzi, FOAF.lastName, Literal('Celebi')))

<Graph identifier=N4193756a353141bcb8f69a18e3511b9d (<class 'rdflib.graph.Graph'>)>

In [31]:
print(g.serialize(format='ttl'))

@prefix ns1: <http://xmlns.com/foaf/0.1/> .
@prefix um: <http://maastrichtuniversity.nl/> .

um:Remzi a um:Computer_Scientist ;
    ns1:firstName "Remzi" ;
    ns1:lastName "Celebi" .




In [53]:
print ("Entities in this graph:");
print ("-----------------------");

# Print the entities in our graph
print ("Remzi entity: " + str(remzi));
print ("Computer Scientist entity: " + str(computerScientist));

print ("----------------------");

print ("Triples in this graph:");
print ("----------------------");

for (s, p, o) in g:
  print (s, p, o)
  
print ("----------------------");
for triples in g:
  print(triples)

Entities in this graph:
-----------------------
Remzi entity: http://maastrichtuniversity.nl/Remzi
Computer Scientist entity: http://maastrichtuniversity.nl/Computer_Scientist
----------------------
Triples in this graph:
----------------------
http://maastrichtuniversity.nl/Remzi http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://maastrichtuniversity.nl/Computer_Scientist
http://maastrichtuniversity.nl/Remzi http://xmlns.com/foaf/0.1/lastName Celebi
http://maastrichtuniversity.nl/Remzi http://xmlns.com/foaf/0.1/firstName Remzi
----------------------
(rdflib.term.URIRef('http://maastrichtuniversity.nl/Remzi'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://maastrichtuniversity.nl/Computer_Scientist'))
(rdflib.term.URIRef('http://maastrichtuniversity.nl/Remzi'), rdflib.term.URIRef('http://xmlns.com/foaf/0.1/lastName'), rdflib.term.Literal('Celebi'))
(rdflib.term.URIRef('http://maastrichtuniversity.nl/Remzi'), rdflib.term.URIRef('htt

In [33]:
print(g.serialize('KEN3140_Lab2_example.rdf',format='xml'))

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [34]:
print(g.serialize('KEN3140_Lab2_example.ttl',format='turtle'))

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [35]:
print(g.serialize('KEN3140_Lab2_example.nt',format='ntriples'))

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


**IRI validation**

In [41]:
!pip install validators

Collecting validators
  Downloading validators-0.20.0.tar.gz (30 kB)
Building wheels for collected packages: validators
  Building wheel for validators (setup.py): started
  Building wheel for validators (setup.py): finished with status 'done'
  Created wheel for validators: filename=validators-0.20.0-py3-none-any.whl size=19583 sha256=a2bcff611ca28299c62e5fc0097d6bc97a365b0fbd5abf991304c7151e9447c3
  Stored in directory: c:\users\p70073484\appdata\local\pip\cache\wheels\2d\f0\a8\1094fca7a7e5d0d12ff56e0c64675d72aa5cc81a5fc200e849
Successfully built validators
Installing collected packages: validators
Successfully installed validators-0.20.0


In [42]:
import validators

In [43]:
validators.url("http://google.com")

True

In [54]:
if not validators.url("http://google"):
  print("not valid")

not valid


### **Lab Tasks**

**Task 1: IRI validation**

In this task you are going to verify which of the following strings are valid IRIs or not. 
Verify them by validator.

If you find some of these to be invalid IRIs, consult the [rfc3987](https://tools.ietf.org/html/rfc3987)
IRI specification to put forward reasons why they are invalid.For each valid IRI in the list, think about
and discuss with your classmates to what extent they comply with the Linked Principles.

1. ``myIRI``
2. ``myIRI/``
3. ``myIRI#``
4. ``ftp:/myIRI``
5. ``ftp://myIRI/``
6. ``ftp://myIRI#``
7. ``http://myIRI#``
8. ``http:///myIRI/folder1/folder2/``
9. ``http:///myIRI/folder1/folder2/my name``
10. ``http:///myIRI/folder1/folder2/my_name``
11. ``my_own_protocol:///myIRI/folder1/folder2/my_name``
12. ``:///myIRI/folder1/folder2/my_name``
13. ``https://myIRI/$/my_name``
14. ``https://myIRI/#$#/my_name``
15. ``https://136.292.181.23/#12/my_name``
16. ``https://136.255.181.23/!210382/my_name``
17. ``https://schema.org/parent``
18. ``https://www.wikidata.org/wiki/Q937``
19. ``https://en.wikipedia.org/wiki/Albert_Einstein``
20. ``https://www.w3.org/Consortium/``
    

**Task 1 solution:**

In [51]:
#IRI validation
if not validators.url("myIRI"):
   print("not valid")


not valid


**Task 2: Formulating RDF triples**

Using a text editor of your choice (e.g. Notepad or Sublime text) **or** rdflib, create RDF triples capturing as fully as possible the information in the following piece of text:

“Vincent van Gogh was a Dutch artist born in Zundert, a city in the country of the Netherlands, on 30 March 1853. One of the most famous artworks created by him is ‘The Starry Night’ oil on canvas painting.”

**Requirements:**
1. Write down the triples in Turtle syntax and save the document as a .ttl file.
2. Ensure that the triples are generated using valid RDF syntax and valid IRIs. 
3. Make sure to **reuse** existing vocabulary where possible

For convenience, a conceptual diagram of the information in the above text is given below.

![image.png](task2-vangogh.png)

**Task 2 solution:**

#### Task 3: Identifying components of an RDF graph

Study the following diagram:

![image.png](task3.png)

Now, list all the:

1. object properties in the graph
2. data properties in the graph
3. instances in the graph
4. data types in the graph
5. prefix shorthands in the graph

Discuss your answers with your classmates. You may write the answers down in a new markdown cell below this one if you wish.

**Task 3 solution:**