# KEN 3140: Lab 2 (RDF basics)

**In this lab we are going to:**

- Create RDF triples with rdflib
- Manipulate / edit RDF triples using rdflib and save these files into various RDF serialisation syntaxes
- Verify the validity of a given list of IRIs

**Creating RDF triples**

RDF allows us to make statements about resources. A statement always has the following structure:
# `<subject> <predicate> <object>`.

An RDF statement expresses a relationship between two resources. The subject and the object represent the two resources being related; the predicate represents the nature of their relationship. The relationship is phrased in a directional way (from subject to object) and is called in RDF a property. Because RDF statements consist of three elements they are called triples.

In [None]:
!pip install rdflib

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rdflib
  Downloading rdflib-6.2.0-py3-none-any.whl (500 kB)
[K     |████████████████████████████████| 500 kB 5.6 MB/s 
Collecting isodate
  Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)
[K     |████████████████████████████████| 41 kB 400 kB/s 
Installing collected packages: isodate, rdflib
Successfully installed isodate-0.6.1 rdflib-6.2.0


## Creating Nodes

The subjects and objects of the triples make up the nodes in the graph where the nodes are URI references, Blank Nodes or Literals. In RDFLib, these node types are represented by the classes **URIRef**, **BNode**, **Literal**. *URIRefs* and *Bnodes* can both be thought of as resources, such a person, a company, a website, etc.
- A *BNode* is a node where the exact URI is not known.
- A *URIRef* is a node where the exact URI is know. *URIRefs* are also used to represent the properties/predicates in the RDF graph.
- *Literals* represent attribute values, such as a name, a date, a number, etc. The most common literal values are XML data types, e.g. string, int..

**Example:**

create a triple with rdflib for this sentence: Remzi is computer scientist.




In [None]:
from rdflib import URIRef, BNode, Literal, Namespace
from rdflib.namespace import FOAF, DCTERMS, XSD, RDF, SDO

uri=ns+identifier

#URI for entity Remzi: http://maastrichtuniversity.nl/Remzi
UM = Namespace('http://maastrichtuniversity.nl/')
remzi = UM['Remzi']
computerScientist = UM['Computer_Scientist']

uri = "https://www.schema.org/Book"
s = URIRef(uri)
print(s)
remzi=URIRef('http://maastrichtuniversity.nl/Remzi')
mona_lisa = URIRef('http://www.wikidata.org/entity/Q12418')
davinci = URIRef('http://dbpedia.org/resource/Leonardo_da_Vinci')
lajoconde = URIRef('http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619')


In [None]:
name = Literal("Nicholas")  # the name 'Nicholas', as a string

age = Literal(39, datatype=XSD.integer)  # the number 39, as an integer

bn = BNode()


In [None]:
from rdflib import Graph

#initialise an empty RDF graph
g = Graph()


In [None]:
g.add((remzi, RDF.type, computerScientist))
g.add((remzi, FOAF.firstName, Literal('Remzi')))
g.add((remzi, FOAF.lastName, Literal('Celebi')))
#g.add((remzi, schema.Occupation, computerScientist))

<Graph identifier=Nfd5764483958446cab66afe68d5f1b97 (<class 'rdflib.graph.Graph'>)>

In [None]:
print(g.serialize(format='ttl'))

@prefix ns1: <http://xmlns.com/foaf/0.1/> .
@prefix um: <http://maastrichtuniversity.nl/> .

um:remzi a um:Computer_Scientist ;
    ns1:firstName "Remzi" ;
    ns1:lastName "Celebi" .




In [None]:
print ("Entities in this graph:");
print ("-----------------------");

# Print the entities in our graph
print ("Remzi entity: " + remzi);
print ("Computer Scientist entity: " + computerScientist);



print ("Triples in this graph:");
print ("----------------------");

for (s, p, o) in g:
  print (s, p, o)



Entities in this graph:
-----------------------
Remzi entity: http://maastrichtuniversity.nl/remzi
Computer Scientist entity: http://maastrichtuniversity.nl/Computer_Scientist
Triples in this graph:
----------------------
http://maastrichtuniversity.nl/remzi http://xmlns.com/foaf/0.1/lastName Celebi
http://maastrichtuniversity.nl/remzi http://xmlns.com/foaf/0.1/firstName Remzi
http://maastrichtuniversity.nl/remzi http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://maastrichtuniversity.nl/Computer_Scientist


In [None]:
print(g.serialize('KEN3140_Lab2_example.rdf',format='xml'))

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [None]:
print(g.serialize('KEN3140_Lab2_example.ttl',format='turtle'))

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


In [None]:
print(g.serialize('KEN3140_Lab2_example.nt',format='ntriples'))

[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].


  "NTSerializer always uses UTF-8 encoding. "


In [None]:
g1 = Graph()

In [None]:
g1.parse('KEN3140_Lab2_task1.ttl', format='turtle')



<Graph identifier=N3ab064e56ebf45f39854f55028578d95 (<class 'rdflib.graph.Graph'>)>

In [None]:
print ("Triples in this graph:");
print ("----------------------");

for (s, p, o) in g1:
  print (s, p, o)

Triples in this graph:
----------------------
file://paste IRI here//kody file://paste IRI here//name Kody Moodley


### **Lab Tasks**

**Task 1: IRI validation**

In this task you are going to verify which of the following strings are valid IRIs or not. 
Verify them by copying and pasting them into the provided ``KEN3140_Lab2_task1.ttl`` document.
Specifically replace the text ``//paste IRI here//`` with each of these IRIs and save the file. 
After each replace, run the cell just below the one titled

- Validate codeand monitor the output to see which are valid or not.
If you find some of these to be invalid IRIs, consult the [rfc3987](https://tools.ietf.org/html/rfc3987)
IRI specification to put forward reasons why they are invalid.For each valid IRI in the list, think about
and discuss with your classmates to what extent they comply with the Linked Principles.

1. ``myIRI``
2. ``myIRI/``
3. ``myIRI#``
4. ``ftp:/myIRI``
5. ``ftp://myIRI/``
6. ``ftp://myIRI#``
7. ``http://myIRI#``
8. ``http:///myIRI/folder1/folder2/``
9. ``http:///myIRI/folder1/folder2/my name``
10. ``http:///myIRI/folder1/folder2/my_name``
11. ``my_own_protocol:///myIRI/folder1/folder2/my_name``
12. ``:///myIRI/folder1/folder2/my_name``
13. ``https://myIRI/$/my_name``
14. ``https://myIRI/#$#/my_name``
15. ``https://136.292.181.23/#12/my_name``
16. ``https://136.255.181.23/!210382/my_name``
17. ``https://schema.org/parent``
18. ``https://www.wikidata.org/wiki/Q937``
19. ``https://en.wikipedia.org/wiki/Albert_Einstein``
20. ``https://www.w3.org/Consortium/``
    

**Task 1 solution:**

In [None]:
#IRI validation

**Task 2: Formulating RDF triples**

Using a text editor of your choice (e.g. Notepad or Sublime text) **or** rdflib, create RDF triples capturing as fully as possible the information in the following piece of text:

“Vincent van Gogh was a Dutch artist born in Zundert, a city in the country of the Netherlands, on 30 March 1853. One of the most famous artworks created by him is ‘The Starry Night’ oil on canvas painting.”

**Requirements:**
1. Write down the triples in Turtle syntax and save the document as a .ttl file.
2. Ensure that the triples are generated using valid RDF syntax and valid IRIs. 
3. Make sure to **reuse** existing vocabulary where possible

For convenience, a conceptual diagram of the information in the above text is given below.

![image.png](vangogh.png)

**Task 2 solution:**

#### Task 3: Identifying components of an RDF graph

Study the following diagram:

![image.png](task3.png)

Now, list all the:

1. object properties in the graph
2. data properties in the graph
3. instances in the graph
4. data types in the graph
5. prefix shorthands in the graph

Discuss your answers with your classmates. You may write the answers down in a new markdown cell below this one if you wish.

**Task 3 solution:**