### KEN 3140: Lab 2 (RDF basics)
#### Learning objectives:
1. How to verify whether a string represents a valid IRI or not
2. How to construct entities, literals, predicates and triples in RDF
3. How to identify and reuse relations and entities from external vocabularies when creating RDF representations
4. How to construct and save RDF documents in different syntaxes and to convert between them (RDF/XML, Turtle, N-triples)
5. How to identify and add appropriate XML datatypes for literals in an RDF graph 
6. How to assess if a particular RDF representation violates the RDF specification

### 1. Introduction & setup 
This section sets up the environment required for you to complete this lab. This information may be useful for your other assignments or future RDF projects that you would like to do in Java Jupyter notebooks.

#### A. Add RDF4J library to our notebook
[RDF4J](https://rdf4j.org/) is a Java library for creating and manipulating [RDF](https://www.w3.org/TR/rdf11-concepts/) information. There is a documentation page [here](https://rdf4j.org/documentation/) to learn how to get started with RDF4J. To import this:

1. Extract the ``rdf4j-full-3.4.0.zip`` archive to the same directory as the notebook.
2. Run the command ``%jars rdf4j-full-3.4.0/*.jar`` in the next cell.

In [None]:
%jars /opt/eclipse-rdf4j-*.jar

#### B. Now import the main classes we will need in this notebook

The [model](https://rdf4j.org/documentation/programming/model/) package is the core package for RDF4J which contains the main classes for creating and manipulating RDF. Within the model package there are three sub-packages:
1. org.eclipse.rdf4j.model
2. org.eclipse.rdf4j.model.impl
3. org.eclipse.rdf4j.model.vocabulary

[Rio](https://rdf4j.org/documentation/programming/rio/) packages the powerful writers and parsers for RDF4J. The main packages here are:
1. org.eclipse.rdf4j.rio
2. org.eclipse.rdf4j.rio.Rio
3. import org.eclipse.rdf4j.rio.helpers

In [None]:
// RDF4J
import org.eclipse.rdf4j.model.*;
import org.eclipse.rdf4j.model.impl.*;
import org.eclipse.rdf4j.model.vocabulary.*;
// Rio
import org.eclipse.rdf4j.rio.*;
import org.eclipse.rdf4j.rio.helpers.*;
// Java IO
import java.io.*;
// Import package which contains Datatype and XMLGregorianCalendar classes
import javax.xml.datatype.*;

### 2. Creating a simple RDF graph
For demonstration, we show here how to create a simple RDF graph in RDF4J with two IRI entities and two literals

#### A. Initialise RDF graph
Here we will set up a namespace for the entities in our graph, and initialise an empty RDF graph in RDF4J which we can start building 

In [None]:
// We need to get a hold of an instance of the ValueFactory class.
// This class allows you to create IRIs, blank nodes, literals and
// triples in RDF4J. RDF4J does not have a Triples class, rather, it 
// has a class called Statement. Why? Because not only can you make
// statements with three components (triples) but also four! (called Quads)
// Therefore, "Statement" is a more general term to capture either triples or quads.
ValueFactory vf = SimpleValueFactory.getInstance();
// Create a namespace for our resources
String um = "http://maastrichtuniversity.nl/";
// Create a new, empty Model object (this instance represents our RDF graph - which will be empty at the moment).
Model model = new TreeModel();
// Defining a namespace in RDF4J (first parameter of SimpleNamespace constructor is the prefix i.e. abbreviation for the namespace. second parameter is the full IRI of the namespace)
model.setNamespace(new SimpleNamespace("um",um));

#### B. Create IRIs & Namespaces
Here we create some IRIs for resources that we want to describe. Let's start with an instance and a type.

In [None]:
// Create an IRI for an entity with resource name "kody"
IRI kody = vf.createIRI(um, "kody");
// Create an IRI for an entity with resource name "Computer_Scientist" (this will be a type for kody)
IRI computerScientist = vf.createIRI(um, "Computer_Scientist");
// Create another namespace
String an = "http://anothernamespace.com/";
// define another namespace prefix (just to demonstrate that we can!)
model.setNamespace(new SimpleNamespace("an",an));

...and now some predicates. In this case we add two data properties:

In [None]:
// Predicates Example 1
// Create an IRI for a predicate with resource name "likePets"
IRI likesPets = vf.createIRI(an, "likesPets");
// Predicates  Example 2
// Create an IRI for a predicate with resource name "birthDate"
IRI birthDate = vf.createIRI("http://schema.org/", "birthDate");
// Define some more more namespaces that our entities make use of in the graph
model.setNamespace(new SimpleNamespace("schema","http://schema.org/"));
model.setNamespace(new SimpleNamespace("xsd","http://www.w3.org/2001/XMLSchema#"));
// FOAF.NS is a built-in relation in RDF4J (https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/FOAF.html)
model.setNamespace(FOAF.NS);

#### C. Data types
Create some literal values and attach some data types to them.

In [None]:
// Literals with data types Example 1 (boolean values)
// Create a boolean literal with value "true"
boolean likesPetsValue = true;
Literal booleanLiteralWithValueTrue = vf.createLiteral(true);
// Literals with data types Example 2 (dates)
XMLGregorianCalendar kodysBirthDate = DatatypeFactory.newInstance().newXMLGregorianCalendarDate(1986, 5, 14, 2);
Literal dateLiteralWithValueKodysBirthDate = vf.createLiteral(kodysBirthDate);
// More examples for other data types are available in the Javadocs for the "ValueFactory" class in RDF4J.
// NB: look at the "createLiteral" methods on this page for more information!

#### D. Create Triples
Here we create some triples about the entities above and add them to our RDF graph. Notice that we are reusing existing relations RDF.TYPE, FOAF.FIRST_NAME and FOAF.LAST_NAME here. [FOAF](http://xmlns.com/foaf/spec/) is a community-maintained vocabulary about people and common relations and properties about them. Consult the [Javadoc](https://rdf4j.org/javadoc/latest/) to see what other built-in relations there might be. E.g. to see what other FOAF relations there are in RDF4J you can go [here](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/FOAF.html) and for RDF [here](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/RDF.html). Essentially, just replace the name of the .html file in the URL with whatever vocabularies (predefined in RDF4J) you want to view. To see a full list of these predefined vocabularies in RDF4J, see [here](https://rdf4j.org/javadoc/latest/org/eclipse/rdf4j/model/vocabulary/package-summary.html). **NB: you are not restricted to these vocabularies. These are just the ones predefined by default for convenience in RDF4J so you don't have to define them yourself. You can also define your own!**

In [None]:
// Add triples to the graph: kody is a Computer Scientist
model.add(kody, RDF.TYPE, computerScientist);
// the entity http://maastrichtuniversity.nl/kody has first name "Kody".
model.add(kody, FOAF.FIRST_NAME, vf.createLiteral("Kody"));
// the entity http://maastrichtuniversity.nl/kody has last name "Moodley".
model.add(kody, FOAF.LAST_NAME, vf.createLiteral("Moodley"));
// kody likes pets
model.add(kody, likesPets, booleanLiteralWithValueTrue);
// kody was born on 14-05-1986
model.add(kody, birthDate, dateLiteralWithValueKodysBirthDate);

#### E. Print out the entities and triples in our graph

In [None]:
System.out.println("Namespaces / prefixes used in this graph:");
System.out.println("-----------------------------------------");
for (Namespace n: model.getNamespaces()){
        System.out.println(n);
}

System.out.println();

System.out.println("Entities in this graph:");
System.out.println("-----------------------");

// Print the entities in our graph
System.out.println("Kody entity: " + kody);
System.out.println("Computer Scientist entity: " + computerScientist);

System.out.println();

System.out.println("Triples in this graph:");
System.out.println("----------------------");

int i = 1;
// Print the triples as well
for (Statement statement: model) {
    System.out.println(i + ". \n" + statement);
    i++;
}

#### F. Serialisation into different RDF syntaxes
How to save a graph into different RDF syntaxes

In [None]:
// RDF/XML syntax
FileOutputStream out = new FileOutputStream("KEN3140_Lab2_example.rdf");
try {
  Rio.write(model, out, RDFFormat.RDFXML);
}
finally {
  out.close();
}

// Turtle syntax
FileOutputStream out2 = new FileOutputStream("KEN3140_Lab2_example.ttl");
try {
  Rio.write(model, out2, RDFFormat.TURTLE);
}
finally {
  out2.close();
}

// N-triples syntax
FileOutputStream out3 = new FileOutputStream("KEN3140_Lab2_example.nt");
try {
  Rio.write(model, out3, RDFFormat.NTRIPLES);
}
finally {
  out3.close();
}

Try to open these files here in Jupyter or in your file explorer using the text editor of your choice and **spot the differences** in the syntaxes for representing the same information. **Which syntax do you prefer? Which do you think are more human-readable?** If ever you want to convert an RDF document between syntaxes, you can use Rio for this (by loading and parsing the file into an RDF4J model - you can see an example in Task 1 below) and then saving it in another using similar code to the preceding cell. You can also use various online tools for smaller RDF files e.g. [EasyRDF](https://www.easyrdf.org/converter)

### 3. Lab Tasks

IRI validation

#### Task 1: Instructions

In this task you are going to verify which of the following strings are valid IRIs or not. 
Verify them by copying and pasting them into the provided ``KEN3140_Lab2_task1.ttl`` document.
Specifically replace the text **//paste IRI here//** with each of these IRIs and save the file. 
After each replace, run the cell just below the one titled **Task1: IRI validation code** and monitor the output to see which are valid or not.
If you find some of these to be invalid IRIs, consult the [rfc3987](https://tools.ietf.org/html/rfc3987)
IRI specification to put forward reasons why they are invalid. **For each valid IRI in the list, think about
and discuss with your classmates in the BlackBoard Collaborate chat, whether these comply with the Linked Principles or not**

1. ``myIRI``
2. ``myIRI/``
3. ``myIRI#``
4. ``ftp:/myIRI``
5. ``ftp://myIRI/``
6. ``ftp://myIRI#``
7. ``http://myIRI#``
8. ``http:///myIRI/folder1/folder2/``
9. ``http:///myIRI/folder1/folder2/my name``
10. ``http:///myIRI/folder1/folder2/my_name``
11. ``my_own_protocol:///myIRI/folder1/folder2/my_name``
12. ``:///myIRI/folder1/folder2/my_name``
13. ``https://myIRI/$/my_name``
14. ``https://myIRI/#$#/my_name``
15. ``https://136.292.181.23/#12/my_name``
16. ``https://136.255.181.23/!210382/my_name``
17. ``https://schema.org/parent``
18. ``https://www.wikidata.org/wiki/Q937``
19. ``https://en.wikipedia.org/wiki/Albert_Einstein``
20. ``https://www.w3.org/Consortium/``
    

#### Task 1: IRI Validation Code

Parses ``KEN3140_Lab2_task1.ttl`` to see if it complies with correct RDF syntax 

In [None]:
RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);
InputStream in = new FileInputStream("KEN3140_Lab2_task1.ttl");
Model model = new TreeModel();
rdfParser.setRDFHandler(new StatementCollector(model));

try {
    rdfParser.parse(in, "");
    System.out.println("Valid RDF document!");
}
catch (IOException e) {
    // handle IO problems (e.g. the file could not be read)
    System.out.println("I/O error: " + e);
}
catch (RDFParseException e) {
    // handle unrecoverable parse error
    System.out.println("Parse error: " + e);
}
catch (RDFHandlerException e) {
    // handle a problem encountered by the RDFHandler
    System.out.println("Handler error: " + e);
}
finally {
  in.close();
}

#### Task 2: instructions

Congratulations on completing Task 1! Now, your task is to create an RDF graph in Turtle syntax to desribe yourself and members of your **immediate** (not extended) family. You may create this graph manually by typing the triples out in a text editor and saving it with the extension ``.ttl`` **or** create it and save it using the RDF4J API in this notebook.

Content requirements of the graph:

1. Only describe properties about members of your immediate family (if you do not have at least three members in your immediate family, you may add two or three relatives from your extended family).
2. You are only allowed to capture the following object properties in your graph - parent (e.g. x is a parent of y), child (e.g. x is a child of y), gender (e.g. x has gender female), sibling (e.g. x is a sibling of y) and marriedTo / partnerOf (e.g. x is married to y / x is partnerOf y). In all triples which involve these predicates, the object must be **an IRI** and not a Literal value.
3. In terms of data properties you are welcome to add as many as you like to describe the entities in your graph. E.g. age, height, date of birth etc. You must have at least one for each entity in the graph

RDF practice requirements of the graph:

1. The graph should be a valid RDF graph with valid IRIs in Turtle syntax
2. Use prefixes (abbreviations for namespaces you use)
3. Use Turtle abbreviation to add multiple properties to the same subject without repeating the subject in the graph
4. For objects and subjects in your triples you are welcome to create your own namespaces. **Exceptions:** for relations in your graph, **reuse** terms defined in **external** vocabularies. For this task we encourage you to go to the page [Schema.org](http://schema.org/) and type the relation you are looking for in the search box. From the list of results choose the relation whose description on [Schema.org](http://schema.org/) best captures the meaning that you intend for the relation in your own graph. Reuse the IRI of that relation in your own graph. If you cannot find an appropriate relation on Schema.org, try searching using a similar process in other vocabularies e.g. [Linked Open Vocabularies](https://lov.linkeddata.es/dataset/lov/) (this is actually a searchable repository of multiple vocabularies) or on [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) (this is actually a very large RDF graph but the Wikidata community have also defined their own vocabulary / ontology [here](https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology) which they use in this graph)
5. Fulfill Requirement 4. also for **types** in your graph.
6. Provide appropriate datatypes for Literal values in your graph from the [XML datatypes list](https://www.w3.org/TR/xmlschema11-2/) or [here](http://www.datypic.com/sc/xsd/s-datatypes.xsd.html)

**Side note:** some of you may ask the question: for large RDF graphs with millions of triples, are they usually created by hand using code or text editors? The answer is **no**. Generally, larger RDF graphs are generated from unstructured sources (e.g. text) - see [here](http://repositorio.uchile.cl/bitstream/handle/2250/174484/Information-extraction-meets-the-Semantic-Web.pdf?sequence=1) and [here](https://portal.research.lu.se/ws/files/3053000/3191702.pdf), or from structured data such as relational databases, CSV, JSON and XML files - see [here](http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf). Depending on the data sources available in different situations, it is often the case that a combination of these approaches are used to convert the required information to RDF.