# KEN 3140 Semantic Web: Assignment 2
### Writing and executing SPARQL queries on RDF graphs

**Authors:** Kody Moodley, Vincent Emonet, Ozge Erten, and Maryam Mohammadi \
**Date:** 2021-09-10  
**License:** [https://creativecommons.org/licenses/by/4.0](https://creativecommons.org/licenses/by/4.0)

This assignment will assess whether you have met the following learning objectives after studying the relevant materials and doing the relevant lab exercises in this course:

#### Learning objectives:
1. How to formulate basic and complex SPARQL queries with valid structure and syntax
2. How to identify and select the appropriate SPARQL features for including in a query, in order to answer a specific question
3. How to design triple and graph patterns to match criteria that a question or task requires
4. How to include new information in an RDF graph using SPARQL queries
5. How to identify, select and include appropriate SPARQL functions in SPARQL queries to filter entities according to their literal values
6. How to distinguish between asserted and inferred statements in RDF graphs using RDFS inference in conjunction with SPARQL queries

#### Assignment task description
Please read all sections of this document very carefully before attempting the assignment, asking questions and submitting

* This assignment will assess your competencies with formulating SPARQL queries in order to answer a series of questions about the content of a pre-prepared RDF graph about family relations. The graph is provided in the file ``KEN3140_assignment2_familyrelations.ttl`` in Turtle syntax included along with your assignment materials. You will also observe the effect of RDFS inference when used in conjunction with SPARQL queries.

* There are two parts to this assignment: a **Part A** and a **Part B**. Both parts require you to formulate SPARQL queries to answer the questions asked in that part. Part A questions require less complex SPARQL queries and Part B requires more complex queries and potentially the use of advanced features of the SPARQL language.

* Before you begin formulating your queries, it might be helpful to explore the graph in some way. You are free to do this in whichever way you prefer. At the very least, you can open ``KEN3140_assignment2_familyrelations.ttl`` in the text editor of your choice and examine the triples. A (not very pretty) picture of the graph is also available in the file ``KEN3140_assignment2_familyrelations.png`` which may help you to understand the structure and content of the graph. 

* Make note of the information in the provided graph as well as the vocabularies it uses i.e., which external vocabularies are used to specify the types, object properties and data properties in the graph.

* You should use either the SPARQL kernel for Jupyter notebooks to run your SPARQL queries in this notebook, or use [YASGUI](http://yasgui.triply.cc) to run them. However, in either case, **please paste your solutions for each query back into the relevant cell of this notebook before submitting!**

#### Deadline & submission instructions
The deadline for your assignment is **Sunday, 27 September 2021 at 23:59 [note the extended deadline]**. You should upload a copy of this notebook which is where you will record your solutions. Please rename the notebook to include your name and student ID. I.e., name the notebook to: ``KEN3140_assignment2_(your name)_(your studentID).ipynb``.

#### Grading criteria
We will assess the design of your SPARQL queries on a number of criteria directly related to the learning objectives of the assignment. I.e., we will assess to what extent you have demonstrated that you have achieved or mastered the learning objectives in the formulation of your SPARQL queries. Please make sure to follow the assignment instructions carefully and meet all the requirements! You will receive a grade out of 10 points for this assignment.

#### Helpful resources
1. KEN3140 Lecture 4 & 5 slides (Canvas)
2. KEN3140 Lab 4 & 5 materials (Canvas)
3. [SPARQL W3C specs](https://www.w3.org/TR/sparql11-overview/)
4. [Learning SPARQL ebook on UM digital library](https://maastrichtuniversity.on.worldcat.org/v2/oclc/853679890)

#### Contact
1. Kody Moodley (kody.moodley@maastrichtuniversity.nl)
2. Vincent Emonet (vincent.emonet@maastrichtuniversity.nl)
3. Michel Dumontier (michel.dumontier@maastrichtuniversity.nl)

#### 1. Install the SPARQL kernel 

Only do these steps if you are going to use the kernel as opposed to using YASGUI for your assignment:

#### Locally

Run the following two commands in the sequence specified in your terminal (before starting Jupyter):

```bash
pip install sparqlkernel --user
jupyter sparqlkernel install --user
```

#### With docker

The SPARQL kernel should be installed in you Docker image. If it is not, please run the following command:

**Windows**

```bash
docker run -it --rm --name java-notebook -p 8888:8888 -v C:\path\to\current\directory:/home/jovyan/work -e JUPYTER_TOKEN=SET_A_PASSWORD ghcr.io/maastrichtu-ids/jupyterlab:latest
```
**Linux/Mac**

```bash
docker run -it --rm --name java-notebook -p 8888:8888 -v $(pwd):/home/jovyan/work -e JUPYTER_TOKEN=SET_A_PASSWORD ghcr.io/maastrichtu-ids/jupyterlab:latest
```

#### 2. Define the SPARQL endpoint URL

In [1]:
# Set the SPARQL Kernel parameters
%endpoint https://graphdb.dumontierlab.com/repositories/KEN3140_SemanticWeb

# Ignore: this is optional, it would increase the level of detail in the logs
%log debug

In [2]:
# Use these commands before the query you want to run

# Use this command to disable inference in the endpoint
%qparam infer false
# Use this command to enable inference in the endpoint
%qparam infer true

## Part A

### Q1: List the top five tallest people in the graph in order from tallest to shortest

**Important notes:** Display both the people and their heights in the query results.

In [3]:
# Insert query here

### Q2: List all family members, order them from shortest to tallest, and count the number of uncles each of them have

**Important notes:** Display the family members, their heights and the number of uncles in the results of your query. Include family members who have no uncles as well in your results. Using comments with the "#" symbol before your solution query in the cell below, state in English natural language how you define an "uncle".

In [4]:
# Insert query here

### Q3: Identify the shortest person who has at least two uncles 

**Important notes:** Display the person and their height in the results

In [5]:
# Insert query here

### Q4: Count the number of males and females per family

**Important notes:** The assumption in this questions is that people with the same family name are part of the same family. Include the family name, the number of males in that family, and the number of females in that family, in your query results.

In [6]:
# Insert query here

### Q5: List all females in the graph born after 1965 from the oldest at the top of the list, to the youngest at the bottom

**Important notes:** Include the person and birth date of that person in the query results.

In [7]:
# Insert query here

## Part B

### Q6: Return the mean (average) height of men, and the mean height of women in the full graph

**Important notes:** you must use one query for this task

In [8]:
# Insert query here

### Q7: For each person with a child calculate their "couple salary per child". 

**Important notes:** the "couple salary per child" is the total combined salary of the two parents divided by the number of children they have. Include the person, spouse, couple salary (total combined salary of the parents), number of children, and the couple salary per child in the query results.

In [9]:
# Insert query here

### Q8: List persons with the given name ending with the letter "a"

**Important notes:** Include the person and their human readable given name in the query results.

In [10]:
# Insert query here

### Q9: Identify and list all sibling relationships in the graph

**Important notes:** Include each pair of persons in the graph that are siblings. You will supply two queries for this task. One **with** inference and one **without**. Paste both queries below, one with inference toggled off and one with inference toggled on (the queries should be in separate cells).

In [11]:
# Insert query here (without inference)

In [12]:
# Insert query here (with inference)

### Q10: Create triples

**Perform the following task with a single SPARQL query**

For each person, create 2 new triples:
* a triple capturing the full name of a person in a human readable string i.e., the concatenation of the first and last name of the person. Use the `rdfs:label` relation to capture this string.
* a triple capturing the height of each person using `schema:height` again but with the value in metres rather than centimetres (which is the current unit used to represent the height values of the family members in the graph).

**Your query should create and display these triples but it must not attempt to add them to the graph**

In [13]:
# Insert query here (with inference)

### Q11: Complex query

Write one SPARQL query to find the family member with the highest [out-degree](https://xlinux.nist.gov/dads/HTML/outdegree.html) in the graph and then list the names of all book authors on DBpedia that have this family members **first** name included lexically in their name as well, also list the titles of the books of these authors. For example, if the family members name is "Nicole", then book authors that are called "Nicole Smith" or "Nicolene Smith" should be included in the results. "Nicola Smith" would not be a valid result, since "Nicole" does not appear in that name.

**Important notes:** Include the IRI of the family member, the IRIs of the book authors, and the human-readable labels of their book titles in your query results. **Make sure to use HTTPS in the SPARQL endpoint URLs that you use in your query, and not HTTP.**

In [14]:
# Insert query here (with inference)

### Bonus: Create missing relations

Create missing sibling relations when a sibling relation is defined in "one direction" between two family members but not explicitly in the other direction. For example, if john is a sibling of mary, then we know that mary is also a sibling of john. But since RDF predicates are directional, you need to state explicitly if a particular relation holds in the other direction as well. **Your query should create and display these missing relations but it must not attempt to add them to the graph**

In [15]:
# Insert query here (with inference)