# KEN 3140 Semantic Web: Assignment 2
### Writing and executing SPARQL queries on RDF graphs
#### Assignment task description
Please read all sections of this document very carefully before attempting the assignment, asking questions and submitting

This assignment will assess your competencies with formulating SPARQL queries in order to answer a series of questions about the content of a pre-prepared RDF graph about family relations. The graph is provided in the file ``KEN3140_assignment2_familyrelations.ttl`` in Turtle syntax included along with your assignment materials. You will also observe the effect of RDFS inference when used in conjunction with SPARQL queries.

There are two parts to this assignment: a **Part A** and a **Part B**. Both parts require you to formulate SPARQL queries to answer the questions asked in that part. Part A questions require less complex SPARQL queries and Part B requires more complex queries and potentially the use of advanced features of the SPARQL language.

Before you begin formulating your queries, it might be helpful to explore the graph in some way. You are free to do this in whichever way you prefer. At the very least, you can open ``KEN3140_assignment2_familyrelations.ttl`` in the text editor of your choice and examine the triples. A picture of the graph is also available in the file ``KEN3140_assignment2_familyrelations.png`` which may help you to understand the structure and content of the graph. 

Make note of the information in the provided graph as well as the vocabularies it uses i.e., which external vocabularies are used to specify the types, object properties and data properties in the graph.

##### Learning objectives:
1. How to formulate basic and complex SPARQL queries with valid structure and syntax
2. How to identify and select the appropriate SPARQL features for including in a query, in order to answer a specific question
3. How to design triple and graph patterns to match criteria that a question or task requires
4. How to include new information in an RDF graph using SPARQL queries
5. How to identify, select and include appropriate SPARQL functions in SPARQL queries to filter entities according to their literal values
6. How to distinguish between asserted and inferred statements in RDF graphs using RDFS inference in conjunction with SPARQL queries

##### Deadline & submission instructions
The deadline for your assignment is **Sunday, 27 September 2020 at 23:59 [note the extended deadline]**. You should upload a copy of this notebook which is where you will record your solutions. Please rename the notebook to include your name and student ID. I.e., name the notebook to: ``KEN3140_assignment2_(your name)_(your studentID).ipynb``.

##### Grading criteria
We will assess the design of your SPARQL queries on a number of criteria directly related to the learning objectives of the assignment. I.e., we will assess to what extent you have demonstrated that you have achieved or mastered the learning objectives in the formulation of your SPARQL queries. Please make sure to follow the assignment instructions carefully and meet all the requirements! We will provide a more detailed scheme of our grading process when we release the solution and grades for this assignment later in the course. You will receive a grade out of 10 points for this assignment.

##### Helpful resources
1. KEN3140 Lecture 4 & 5 slides (Canvas)
2. KEN3140 Lab 4 & 5 slides and materials (Canvas & [Github](https://github.com/MaastrichtU-IDS/UM_KEN3140_SemanticWeb))
3. [SPARQL W3C specs](https://www.w3.org/TR/sparql11-overview/)

##### Contact
1. Kody Moodley (kody.moodley@maastrichtuniversity.nl)
2. Vincent Emonet (vincent.emonet@maastrichtuniversity.nl)
3. Michel Dumontier (michel.dumontier@maastrichtuniversity.nl)

#### 1. Install the SPARQL kernel

#### Locally, in 2 commands

```bash
pip install sparqlkernel --user
jupyter sparqlkernel install --user
```

#### With docker

Similar to the previous notebook: start it in the terminal in the folder where the notebook is

```bash
docker run -it --rm -p 8888:8888 -v $(pwd):/home/jovyan -e JUPYTER_ENABLE_LAB=yes -e JUPYTER_TOKEN=password umids/jupyterlab:sparql
```


#### 2. Define the SPARQL endpoint URL

In [1]:
# Set the SPARQL Kernel parameters
%endpoint https://graphdb.dumontierlab.com/repositories/KEN3140_SemanticWeb

# This is optional, it would increase the log level
%log debug

#### 3. Example query

##### With inference

In [2]:
%qparam infer true
PREFIX schema: <https://schema.org/>
SELECT * 
WHERE {
    ?entity a schema:Person ;
              schema:parent ?parent .
}

entity,parent
https://my-family.org/Miranda,https://my-family.org/Pierre
https://my-family.org/Miranda,https://my-family.org/Mathilde


##### Without inference

In [3]:
%qparam infer false
PREFIX schema: <https://schema.org/>
SELECT * 
WHERE {
    ?entity a schema:Person ;
              schema:parent ?parent .
}

entity,parent


## Part A

### Q1: List the top five tallest people in the graph in order from tallest to shortest

**Important notes:** Display both the people and their heights in the query results.

In [4]:
# Insert query here

### Q2: List the family members, order them from shortest to tallest, and count the number of uncles they have.

**Important notes:** Display the family members, their heights and the number of uncles in the results of your query. Include family members who have no uncles as well.

In [5]:
# Insert query here

### Q3: Identify the shortest person who has at least two uncles 

**Important notes:** Display the person and their height in the results

In [6]:
# Insert query here

### Q4: Count the number of males and females per family

**Important notes:** The assumption in this questions is that people with the same family name are part of the same family. Include the family name, the number of males in that family, and the number of females in that family, in your query results.

In [7]:
# Insert query here

### Q5: List all females in the graph born after 1965 from the oldest to the youngest

**Important notes:** Include the person and birth date of that person in the query results.

In [8]:
# Insert query here

## Part B

### Q6: Return the mean size of men, and the mean size of women in the full family

In [9]:
# Insert query here

### Q7: For each person with a child calculate their "couple salary per child". 

**Important notes:** the "couple salary per child" is the combined salary of the two parents divided by the number of children they have. Include the person, spouse, couple salary (combined salary of parents), number of children, and the couple salary per child in the query results.

In [10]:
# Insert query here

### Q8: list persons with the given name ending with the letter "a"

**Important notes:** Include the person and their human readable given name in the query results.

In [11]:
# Insert query here

### Q9: Identify and list all sibling relationships in the graph

**Important notes:** Include each pair of persons in the graph that are siblings. **With AND without** inference. Paste both queries below (one with inference toggled off and one with inference toggled on (the queries should be in separate cells).

In [12]:
# Insert query here (without inference)

In [13]:
# Insert query here (with inference)

### Q10: Construct triples

**Perform the following task with a single SPARQL query**

For each person, construct 2 new triples:
* a triple capturing the full name of a person in a human readable string i.e., the concatenation of the first and last name of the person. Use the `rdfs:label` relation to capture this string.
* a triple capturing the height of each person using `schema:height` againg but with the value in metres rather than centimetres (which is currently implicitly inferred to have the cm unit).

In [14]:
# Insert query here (with inference)

### Q11: Federated query to DBpedia

Retrieve all the books with the string "family" in their label, and the URI of the graph where this information can be found in the DBpedia triplestore.

**Important notes:** Include the graph IRI, boo IRI, and book name (human readable) in your query results.

In [15]:
# Insert query here (with inference)

### Bonus: Construct missing relations

Construct missing sibling relations when a sibling relation is defined in "one direction" between two entities but not explicitly in the other direction.

In [16]:
# Insert query here (with inference)