### KEN 3140: Lab 2 (RDF basics)
**Author:** Kody Moodley  
**Date:** 2021-09-01  
**License:** [https://creativecommons.org/licenses/by/4.0](https://creativecommons.org/licenses/by/4.0)

In this lab we are going to:
1. Create RDF triples by hand and validate them using [RDF4J](https://rdf4j.org/)
2. Manipulate / edit RDF triples using [RDF4J](https://rdf4j.org/) and save these files into various RDF [serialisation](https://en.wikipedia.org/wiki/Serialization) syntaxes
3. Verify the validity of a given list of IRIs
4. Examine a given RDF graph and identify its components

#### Learning objectives:
1. How to verify whether a string represents a valid IRI or not
2. How to construct entities, literals, predicates and triples in RDF
3. How to identify and reuse relations and entities from external vocabularies when creating RDF representations
4. How to construct and save RDF documents in different syntaxes and to convert between them (RDF/XML, Turtle, N-triples)
5. How to identify and add appropriate XML datatypes for literals in an RDF graph 
6. How to assess if a particular RDF representation violates the RDF specification

### Setup: import RDF4J and required classes
This section sets up the environment required for you to complete this lab. You do not need to change any code in this section. Just read the information, run the cells and proceed to the next section. 

[RDF4J](https://rdf4j.org/) is a Java library for creating and manipulating [RDF](https://www.w3.org/TR/rdf11-concepts/) information. There is a documentation page [here](https://rdf4j.org/documentation/) to learn how to get started with RDF4J. To import this (and other) Java libraries in Jupyter notebooks, we can use one of two options:

1. Using the [cell magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cell-magics) command ``%%loadFromPOM``. This command allows us to import Java libraries using [Maven](https://maven.apache.org/) dependencies as you would in a [pom.xml](https://www.javatpoint.com/maven-pom-xml#:~:text=POM%20is%20an%20acronym%20for,file%2C%20then%20executes%20the%20goal.) **NB:** this code has to be executed within **one** cell and there should not be any other code in this cell. To find other maven dependencies for your desired libraries you can search for them on [Maven Repository](https://mvnrepository.com/) or [Maven Central](https://search.maven.org/).
2. If you are having problems with getting the cell magics option to work, download the full .jar file for RDF4J [here](https://www.eclipse.org/downloads/download.php?file=/rdf4j/eclipse-rdf4j-3.4.0-onejar.jar&mirror_id=1190). The file is also included as a download with your lab materials on Canvas. **Place this .jar file in the same directory as this notebook.**

We are going to use Option 2 in this notebook:

In [1]:
%jars eclipse-rdf4j-3.4.0-onejar.jar

In [2]:
%jars commons-io-2.11.0.jar

This next block of code supresses any Java compiler warnings messages:

In [3]:
%%loadFromPOM
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-nop</artifactId>
  <version>1.7.30</version>
</dependency>

And this one imports required classes for reading and writing RDF with RDF4J:

In [4]:
// RDF4J
import org.eclipse.rdf4j.model.*;
import org.eclipse.rdf4j.model.impl.*;
import org.eclipse.rdf4j.model.vocabulary.*;
// Rio
import org.eclipse.rdf4j.rio.*;
import org.eclipse.rdf4j.rio.Rio.*;
import org.eclipse.rdf4j.rio.helpers.*;
// Java IO
import java.io.*;

### Creating a simple RDF graph with RDF4J
For demonstration, we show here how to create a simple RDF graph in RDF4J with some entities, relations and literals. Again, this is not part of the lab tasks and you do not need to change any code in this section. However, this section serves as a reference that gives you code that you can reuse for Task 2 or other assignments and tasks in this course.

#### A. Initialise an RDF graph
Here we will set up a namespace for the entities in our graph, and initialise an empty RDF graph in RDF4J which we can start building 

In [5]:
// We need to get a hold of an instance of the ValueFactory class.
// This class allows you to create IRIs, blank nodes, literals and
// triples in RDF4J. RDF4J does not have a Triples class, rather, it 
// has a class called Statement. Why? Because not only can you make
// statements with three components (triples) but also four! (called Quads)
// Therefore, "Statement" is a more general term to capture either triples or quads.
ValueFactory vf = SimpleValueFactory.getInstance();
// Create a new, empty Model object (this instance represents our RDF graph - which is empty at the moment).
Model model = new TreeModel();
// Create namespaces for our resources
String um = "http://maastrichtuniversity.nl/";
String foaf = "http://xmlns.com/foaf/0.1/";
String schemaorg = "http://schema.org/";
model.setNamespace("um", um);
model.setNamespace("foaf", foaf);
model.setNamespace("schema", schemaorg);

schema :: http://schema.org/

#### B. Create IRIs
Here we create some IRIs for resources that we want to describe

In [6]:
// Create an IRI for an entity with resource name "kody"
IRI kody = vf.createIRI(um, "kody");
// Create an IRI for an entity with resource name "Computer_Scientist"
IRI computerScientist = vf.createIRI(um, "Computer_Scientist");
// Create an IRI for an entity with resource name "ken3140"
IRI ken3140 = vf.createIRI(um, "ken3140");
// Create an IRI for a relation with resource name "instructor"
IRI instructor = vf.createIRI(schemaorg,"instructor");

#### C. Create Triples
Here we create some triples about the entities above and add them to our RDF graph. Notice that we are reusing existing defined relations RDF.TYPE, FOAF.FIRST_NAME and FOAF.LAST_NAME here. [FOAF](http://xmlns.com/foaf/spec/) is a community-maintained vocabulary about people and common relations and properties about them. Consult the [Javadoc](https://rdf4j.org/javadoc/latest/) to see what other built-in relations there might be. E.g. to see what other FOAF relations there are in RDF4J you can go [here](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/FOAF.html) and for RDF [here](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/RDF.html). Essentially, just replace the name of the .html file in the URL with whatever vocabularies (predefined in RDF4J) you want to view. To see a full list of these predefined vocabularies in RDF4J, see [here](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/package-summary.html). **NB: you are not restricted to these vocabularies. These are just the ones predefined by default for convenience in RDF4J. You can also define your own or reuse some from external vocabularies such as Schema.org** In fact, we reuse the predicate [http://schema.org/instructor](http://schema.org/instructor) from Schema.org in this example:

In [7]:
// Add our first triple to the graph: kody is a Computer Scientist
model.add(kody, RDF.TYPE, computerScientist);
// Second triple: the entity http://maastrichtuniversity.nl/kody has first name "Kody".
model.add(kody, FOAF.FIRST_NAME, vf.createLiteral("Kody"));
// Third triple: the entity http://maastrichtuniversity.nl/kody has last name "Moodley".
model.add(kody, FOAF.LAST_NAME, vf.createLiteral("Moodley"));
// Fourth triple: http://maastrichtuniversity.nl/ken3140 has an instructor - Kody
model.add(ken3140, instructor, kody);

true

#### D. Print out the entities and triples in our graph

In [8]:
System.out.println("Entities in this graph:");
System.out.println("-----------------------");

// Print the entities in our graph
System.out.println("Kody entity: " + kody);
System.out.println("Computer Scientist entity: " + computerScientist);

System.out.println();

System.out.println("Triples in this graph:");
System.out.println("----------------------");

int i = 1;
// Print the triples as well
for (Statement statement: model) {
    System.out.println(i + ". " + statement);
    i++;
}

Entities in this graph:
-----------------------
Kody entity: http://maastrichtuniversity.nl/kody
Computer Scientist entity: http://maastrichtuniversity.nl/Computer_Scientist

Triples in this graph:
----------------------
1. (http://maastrichtuniversity.nl/ken3140, http://schema.org/instructor, http://maastrichtuniversity.nl/kody) [null]
2. (http://maastrichtuniversity.nl/kody, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://maastrichtuniversity.nl/Computer_Scientist) [null]
3. (http://maastrichtuniversity.nl/kody, http://xmlns.com/foaf/0.1/firstName, "Kody") [null]
4. (http://maastrichtuniversity.nl/kody, http://xmlns.com/foaf/0.1/lastName, "Moodley") [null]


#### E. Serialisation into different RDF syntaxes
How to save a graph into different RDF syntaxes. We are going to save our graph about entities related to the KEN3140 course in different RDF serialisation syntaxes. Here we demonstrate saving in RDF/XML, TURTLE amd NTRIPLES formats. However, there are others as well such as [TRIG](https://www.w3.org/TR/trig/).

In [9]:
// RDF/XML syntax
FileOutputStream out = new FileOutputStream("KEN3140_Lab2_example.rdf");
try {
  Rio.write(model, out, RDFFormat.RDFXML);
}
finally {
  out.close();
}

// Turtle syntax
FileOutputStream out2 = new FileOutputStream("KEN3140_Lab2_example.ttl");
try {
  Rio.write(model, out2, RDFFormat.TURTLE);
}
finally {
  out2.close();
}

// N-triples syntax
FileOutputStream out3 = new FileOutputStream("KEN3140_Lab2_example.nt");
try {
  Rio.write(model, out3, RDFFormat.NTRIPLES);
}
finally {
  out3.close();
}

Try to open these files here in Jupyter or in your file explorer using the text editor of your choice and **spot the differences** in the syntaxes for representing the same information. **Which syntax do you prefer? Which do you think are more human-readable?** If ever you want to convert an RDF document between syntaxes, you can use Rio for this (by loading and parsing the file into an RDF4J model - you can see an example in Task 1 below) and then saving it in another using similar code to the preceding cell. You can also use various online tools for smaller RDF files e.g. [EasyRDF](https://www.easyrdf.org/converter)

### Lab Tasks

#### Task 1: IRI validation

In this task you are going to verify which of the following strings are valid IRIs or not. 
Verify them by copying and pasting them into the provided ``KEN3140_Lab2_task1.ttl`` document.
Specifically replace the text ``//paste IRI here//`` with each of these IRIs and save the file. 
After each replace, run the cell just below the one titled **Task1: Validation code** and monitor the output to see which are valid or not.
If you find some of these to be invalid IRIs, consult the [rfc3987](https://tools.ietf.org/html/rfc3987)
IRI specification to put forward reasons why they are invalid. **For each valid IRI in the list, think about
and discuss with your classmates to what extent they comply with the Linked Principles**

1. ``myIRI``
2. ``myIRI/``
3. ``myIRI#``
4. ``ftp:/myIRI``
5. ``ftp://myIRI/``
6. ``ftp://myIRI#``
7. ``http://myIRI#``
8. ``http:///myIRI/folder1/folder2/``
9. ``http:///myIRI/folder1/folder2/my name``
10. ``http:///myIRI/folder1/folder2/my_name``
11. ``my_own_protocol:///myIRI/folder1/folder2/my_name``
12. ``:///myIRI/folder1/folder2/my_name``
13. ``https://myIRI/$/my_name``
14. ``https://myIRI/#$#/my_name``
15. ``https://136.292.181.23/#12/my_name``
16. ``https://136.255.181.23/!210382/my_name``
17. ``https://schema.org/parent``
18. ``https://www.wikidata.org/wiki/Q937``
19. ``https://en.wikipedia.org/wiki/Albert_Einstein``
20. ``https://www.w3.org/Consortium/``
    

#### Task 1: Validation code

Parses ``KEN3140_Lab2_task1.ttl`` to see if it complies with correct RDF syntax 

In [11]:
RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);
File initialFile = new File("KEN3140_Lab2_task1.ttl");
InputStream in = new FileInputStream(initialFile);
Model model = new TreeModel();
rdfParser.setRDFHandler(new StatementCollector(model));

try {
    rdfParser.parse(in, initialFile.getAbsolutePath());
    System.out.println("Valid RDF document!");
}
catch (IOException e) {
    // handle IO problems (e.g. the file could not be read)
    System.out.println("I/O error: " + e);
}
catch (RDFParseException e) {
    // handle unrecoverable parse error
    System.out.println("Parse error: " + e);
}
catch (RDFHandlerException e) {
    // handle a problem encountered by the RDFHandler
    System.out.println("Handler error: " + e);
}
finally {
  in.close();
}

Parse error: org.eclipse.rdf4j.rio.RDFParseException: Relative URI 'myIRI' cannot be resolved using the opaque base URI 'C:%5CUsers%5Ckody.moodley%5CDocuments%5CKEN3140%5CLab2%5CKEN3140_Lab2_task1.ttl' [line 1]


#### Task 2: Formulating RDF triples

Using a text editor of your choice (e.g. Notepad or Sublime text) **or** RDF4J, create RDF triples capturing as fully as possible the information in the following piece of text:

“Vincent van Gogh was a Dutch artist born in Zundert, a city in the country of the Netherlands, on 30 March 1853. One of the most famous artworks created by him is ‘The Starry Night’ oil on canvas painting.”

**Requirements:**
1. Write down the triples in Turtle syntax and save the document as a .ttl file.
2. Ensure that the triples are generated using valid RDF syntax and valid IRIs. **Hint:** RDF4J can validate your triples for you.
3. Make sure to **reuse** existing vocabulary where possible

For convenience, a conceptual diagram of the information in the above text is given below.

![image.png](vangogh.png)

#### Task 3: Identifying components of an RDF graph

Study the following diagram:

![image.png](task3.png)

Now, list all the:

1. object properties in the graph
2. data properties in the graph
3. instances in the graph
4. data types in the graph
5. prefix shorthands in the graph

Discuss your answers with your classmates. You may write the answers down in a new markdown cell below this one if you wish.