# KEN3140: Lab 4 (Part 1)

### Writing and executing basic SPARQL queries on remote SPARQL endpoints (RDF graphs on the Web)

##### Authors:
+ [Vincent Emonet](https://www.maastrichtuniversity.nl/vincent.emonet): [vincent.emonet@maastrichtuniversity.nl](mailto:vincent.emonet@maastrichtuniversity.nl)
+ [Kody Moodley](https://www.maastrichtuniversity.nl/kody.moodley): [kody.moodley@maastrichtuniversity.nl](mailto:kody.moodley@maastrichtuniversity.nl)

##### Affiliation: 
[Institute of Data Science](https://www.maastrichtuniversity.nl/research/institute-data-science)

##### License:
[CC-BY 4.0](https://creativecommons.org/licenses/by/4.0)

##### Date:
2021-09-06

#### In this lab you will learn:

How to compose basic [SPARQL](https://www.w3.org/TR/2013/REC-sparql11-query-20130321/) [SELECT](https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#select) queries to retrieve specific information from an [RDF](https://www.w3.org/TR/rdf11-concepts/) graph, and to answer questions about its content

#### Specific learning goals:

+ How to select the appropriate SPARQL feature(s) or function(s) required to answer the given question or retrieve the result asked for
+ How to represent the retrieval of information from a triplestore using triple patterns and basic graph patterns in SELECT queries
+ How to query existing public SPARQL endpoints using tools such as [YASGUI](https://yasgui.triply.cc)

#### Prerequisite knowledge: 
+ [Lecture 4: Introduction to SPARQL](https://canvas.maastrichtuniversity.nl/courses/4700/files/559320?module_item_id=115828)
+ [SPARQL 1.1 language specification](https://www.w3.org/TR/sparql11-query/)
+ Chapters 1 - 3 of [Learning SPARQL](https://maastrichtuniversity.on.worldcat.org/external-search?queryString=SPARQL#/oclc/853679890)

#### Task information:

+ In this lab, we will ask you to query the [DBpedia](https://dbpedia.org/) knowledge graph!
+ [DBpedia](https://dbpedia.org/) is a crowd-sourced community effort to extract structured content in RDF from the information created in various [Wikimedia](https://www.wikimedia.org/) projects (e.g. [Wikipedia](https://www.wikipedia.org/)). DBpedia is similar in information content to [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page). 
+ **A word on data quality:** remember that DBpedia is crowd-sourced. This means that volunteers and members of the general public are permitted to add and maintain it's content. As a result, you may encounter inaccuracies / omissions in the content and inconsistencies in how the information is represented. Don't be alarmed by this, it is not critical that the content is accurate for the learning objectives of this lab.
+ **Your task** is to formulate and execute SPARQL queries for Tasks 1 - 3 either in this Jupyter notebook (if you have SPARQL kernel installed in Jupyter) or on [YAS-GUI](https://yasgui.triply.cc/)


#### Task information (contd):

+ The DBpedia SPARQL endpoint URL is: [https://dbpedia.org/sparql](https://dbpedia.org/sparql)
+ DBPedia has it's own SPARQL query interface at [https://dbpedia.org/sparql](https://dbpedia.org/sparql) which is built on OpenLink's [Virtuoso](https://virtuoso.openlinksw.com/) [RDF](https://www.w3.org/TR/rdf11-concepts/) triplestore management system.
+ In this lab, we will use an alternative SPARQL query interface to query DBPedia. It is called **[YASGUI](https://yasgui.triply.cc)**. The reason is that YASGUI has additional user-friendly features e.g. management of multiple SPARQL queries in separate tabs. It also allows one to query any publicly available SPARQL endpoint from the same interface.
+ To install SPARQL kernel for Jupyter -> close Jupyter and execute the following commands in sequence in your terminal before you start Jupyter:
    + ``pip install sparql-kernel``
    + ``jupyter sparqlkernel install`` **OR** ``jupyter sparqlkernel install --user`` (if the first command gives an error)
    

#### Tips 🔎

+ How do I find vocabulary to use in my SPARQL query from DBpedia?

> Search on google, e.g., if you want to know the term for "capital city" in DBpedia, search for: "**[dbpedia capital](https://www.google.com/search?&q=dbpedia+capital)**" In general, "dbpedia [approximate name of predicate or class you are looking for]" 

> Your search query does not have to exactly match the spelling of the DBpedia resource name you are looking for

> Alternatively, you can formulate SPARQL queries to list properties and types in DBpedia Do you know what these queries might look like?

+ Use [prefix.cc](http://prefix.cc/) to discover the full IRIs for unknown prefixes you may encounter

# YASGUI interface 

<img src="yasgui-interface.png">

<!-- # Install the SPARQL kernel

This notebook uses the SPARQL Kernel to define and **execute SPARQL queries in the notebook** codeblocks.
To **install the SPARQL Kernel** in your JupyterLab installation:

```shell
pip install sparqlkernel --user
jupyter sparqlkernel install --user
```

To start running SPARQL query in this notebook, we need to define the **SPARQL kernel parameters**:
* 🔗 **URL of the SPARQL endpoint to query**
* 🌐 Language of preferred labels
* 📜 Log level -->

In [2]:
# specify which endpoint we are querying
%endpoint http://dbpedia.org/sparql

# This is optional, it would increase the log level (messages from the jupyter sparql kernel)
%log debug

# Uncomment the next line to return labels in english and avoid duplicates
# %lang en

# Anatomy of a SPARQL query

As we saw in Lecture 4, these are the main components of a SPARQL query:

<img src="sparql_query_breakdown.png">

### Task 1 [15min]: Simpler queries

Write SPARQL queries to execute the following tasks.

a) List 10 triples from DBpedia

b) List all the books in DBpedia

c) List the authors of all books in DBpedia

d) Truncate the results of Task 1c) to only 10 results

e) Display the number of authors for books in DBpedia

f) Display the number of UNIQUE authors for books in DBpedia

### Task 2 [15-20min]: Moderately challenging queries
Write SPARQL queries to execute the following tasks.

a) List 10 authors who wrote a book with more than 500 pages

b) List 20 books in DBpedia that have the term grand in their name

* **Hint:** use the [contains(string_to_look_in,string_to_look_for)](https://www.w3.org/TR/sparql11-query/#func-contains) function

c) List 20 book names from DBpedia together with the language of their names

* **Hint:** use the [lang](https://www.w3.org/TR/sparql11-query/#func-lang) function.

d) List the top 5 longest books in DBpedia (with the most pages) in descending order

### Task 3 [20min]: Challenging queries
Write SPARQL queries to execute the following tasks.

a) List 10 book authors from DBpedia and the capital cities of the countries in which they were born

b) Display the number of authors for the book that has the English title "1066 and All That"

c) List all DBpedia books whose English name starts with "she" (case-insensitive)

* **Hint:** use [langMatches](https://www.w3.org/TR/rdf-sparql-query/#func-langMatches), [STRSTARTS](https://www.w3.org/TR/sparql11-query/#func-strstarts) and [lcase](https://www.w3.org/TR/sparql11-query/#func-lcase) functions.

d) List all the unique book categories for all short books (less than 300 pages) written by authors who were born in Amsterdam

* **Hint:** use the [dct:subject](http://udfr.org/docs/onto/dct_subject.html) property of a [dbo:Book](https://dbpedia.org/ontology/Book) to define "category" in this task.


e) Sort the results in Task 3d) by the number of pages - longest to shortest

# Examples of other public SPARQL endpoints 🔗

* Wikidata, facts powering Wikipedia infobox: https://query.wikidata.org/sparql
* Bio2RDF, linked data for the life sciences: https://bio2rdf.org/sparql
* Disgenet, gene-disease association: http://rdf.disgenet.org/sparql
* PathwayCommons, resource for biological pathways analysis: http://rdf.pathwaycommons.org/sparql
* EU publications office, court decisions and legislative documents from the EU: http://publications.europa.eu/webapi/rdf/sparql
* Finland legal open data, cases and legislation: https://data.finlex.fi/en/sparql 
* EU Knowledge Graph, open knowledge graph containing general information about the European Union: [SPARQL endpoint](https://query.linkedopendata.eu/#SELECT%20DISTINCT%20%3Fo1%20WHERE%20%7B%0A%20%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fentity%2FQ1%3E%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fprop%2Fdirect%2FP62%3E%20%3Fo1%20.%20%0A%7D%20%0ALIMIT%201000)

# SPARQL applied to the COVID pandemic: 

* Wikidata SPARQL queries around the SARS-CoV-2 virus and pandemic: https://egonw.github.io/SARS-CoV-2-Queries