# KEN 4256: Lab 4


### Writing and executing basic SPARQL queries on RDF graphs

##### Authors:
+ [Vincent Emonet](https://www.maastrichtuniversity.nl/vincent.emonet): [vincent.emonet@maastrichtuniversity.nl](mailto:vincent.emonet@maastrichtuniversity.nl)
+ [Kody Moodley](https://www.maastrichtuniversity.nl/kody.moodley): [kody.moodley@maastrichtuniversity.nl](mailto:kody.moodley@maastrichtuniversity.nl)

##### Affiliation: 
[Institute of Data Science](https://www.maastrichtuniversity.nl/research/institute-data-science)

##### License:
[CC-BY 2.0](https://creativecommons.org/licenses/by/2.0/legalcode)

##### Date:
2021-02-26

#### In this lab you will learn:

How to compose basic [SPARQL](https://www.w3.org/TR/2013/REC-sparql11-query-20130321/) [SELECT](https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#select) queries to retrieve specific information from an [RDF](https://www.w3.org/TR/rdf11-concepts/) graph, and to answer questions about its content

#### Specific learning goals:

+ How to select the appropriate SPARQL feature(s) or function(s) required to answer the given question or retrieve the result asked for
+ How to represent the retrieval of information from a triplestore using triple patterns and basic graph patterns in SELECT queries
+ How to query existing public SPARQL endpoints using tools such as [YASGUI](https://yasgui.triply.cc)

#### Prerequisite knowledge: 
+ [Lecture 6: Introduction to SPARQL](https://canvas.maastrichtuniversity.nl/courses/4700/files/559320?module_item_id=115828)
+ [SPARQL 1.1 language specification](https://www.w3.org/TR/sparql11-query/)
+ Chapters 1 - 3 of [Learning SPARQL](https://maastrichtuniversity.on.worldcat.org/external-search?queryString=SPARQL#/oclc/853679890)

#### Task information:

+ In this lab, we will ask you to query the [DBpedia](https://dbpedia.org/) knowledge graph!
+ [DBpedia](https://dbpedia.org/) is a crowd-sourced community effort to extract structured content in RDF from the information created in various [Wikimedia](https://www.wikimedia.org/) projects (e.g. [Wikipedia](https://www.wikipedia.org/)). DBpedia is similar in information content to [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page). 
+ **A word on data quality:** remember that DBpedia is crowd-sourced. This means that volunteers and members of the general public are permitted to add and maintain it's content. As a result, you may encounter inaccuracies / omissions in the content and inconsistencies in how the information is represented. Don't be alarmed by this, it is not critical that the content is accurate for the learning objectives of this lab.


#### Task information (contd):

+ The DBpedia SPARQL endpoint URL is: [https://dbpedia.org/sparql](https://dbpedia.org/sparql)
+ DBPedia has it's own SPARQL query interface at [https://dbpedia.org/sparql](https://dbpedia.org/sparql) which is built on OpenLink's [Virtuoso](https://virtuoso.openlinksw.com/) [RDF](https://www.w3.org/TR/rdf11-concepts/) triplestore management system.
+ In this lab, we will use an alternative SPARQL query interface to query DBPedia. It is called **[YASGUI](https://yasgui.triply.cc)**. The reason is that YASGUI has additional user-friendly features e.g. management of multiple SPARQL queries in separate tabs. It also allows one to query any publicly available SPARQL endpoint from the same interface.

#### Tips 🔎

+ How do I find vocabulary to use in my SPARQL query from DBpedia?

> Search on google, e.g., if you want to know the term for "capital city" in DBpedia, search for: "**[dbpedia capital](https://www.google.com/search?&q=dbpedia+capital)**" In general, "dbpedia [approximate name of predicate or class you are looking for]" 

> Your search query does not have to exactly match the spelling of the DBpedia resource name you are looking for

> Alternatively, you can formulate SPARQL queries to list properties and types in DBpedia Do you know what these queries might look like?

+ Use [prefix.cc](http://prefix.cc/) to discover the full IRIs for unknown prefixes you may encounter

# YASGUI interface 

<img src="yasgui-interface.png">

<!-- # Install the SPARQL kernel

This notebook uses the SPARQL Kernel to define and **execute SPARQL queries in the notebook** codeblocks.
To **install the SPARQL Kernel** in your JupyterLab installation:

```shell
pip install sparqlkernel --user
jupyter sparqlkernel install --user
```

To start running SPARQL query in this notebook, we need to define the **SPARQL kernel parameters**:
* 🔗 **URL of the SPARQL endpoint to query**
* 🌐 Language of preferred labels
* 📜 Log level -->

In [16]:
%endpoint http://dbpedia.org/sparql

# This is optional, it would increase the log level
%log debug

# Uncomment the next line to return label in english and avoid duplicates
# %lang en

# Anatomy of a SPARQL query

As we saw in Lecture 6, these are the main components of a SPARQL query:

<img src="sparql_query_breakdown.png">

# Task 1 [15min]: 
Simpler SPARQL queries.

a) **[List 10 triples from DBpedia](https://api.triplydb.com/s/4c19DjNva)**:

Observe that **DBpedia limits to 10.000 results by default** (and the SPARQL kernel shows only 20 for readability reasons). **Note:** DBpedia contains much more than 10.000 triples.

Do these triples look strange to you? What do they represent? These triples define some vocabulary the endpoint uses to describe its configuration. However, you might not be interested in this, but rather some general knowledge about entities in the real world.

b) Get **[all the books in DBpedia 📚](https://yasgui.triply.cc/#query=PREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20dbo%3A%20%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0ASELECT%20*%0AWHERE%20%7B%0A%20%20%3Fbook%20rdf%3Atype%20dbo%3ABook%20.%0A%7D%0A&endpoint=https%3A%2F%2Fdbpedia.org%2Fsparql&requestMethod=POST&tabTitle=Query&headers=%7B%7D&contentTypeConstruct=text%2Fturtle%2C*%2F*%3Bq%3D0.9&contentTypeSelect=application%2Fsparql-results%2Bjson%2C*%2F*%3Bq%3D0.9&outputFormat=table)**

Here a prefix is defined for the DBpedia vocabulary / ontology:

```sparql
PREFIX dbo: <http://dbpedia.org/ontology/>
```

The `rdf:` prefix is defined by default, but `rdf:type` can be shortened to `a`

```sparql
SELECT * WHERE {
  ?book a dbo:Book .
}
```

If we execute this query on the following two triples only, the result would contain just `<http://book1>` (the only valid binding for ```?book ```)

```turtle
<http://book1> rdf:type <http://dbpedia.org/ontology/Book> .
<http://country1> rdf:type <http://dbpedia.org/ontology/Country> .
```

Experiment with variations of this query. I.e. modify it to return specific variables rather than ``*``. Do you notice a difference?

c) **[List the authors of all books in DBpedia 🖋️](http://yasgui.triply.cc/#query=PREFIX%20dbo%3A%20%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0ASELECT%20*%0AWHERE%20%7B%0A%20%20%20%20%3Fbook%20a%20dbo%3ABook%20.%0A%20%20%20%20%3Fbook%20dbo%3Aauthor%20%3Fauthor%20.%0A%7D&endpoint=https%3A%2F%2Fdbpedia.org%2Fsparql&requestMethod=POST&tabTitle=Query&headers=%7B%7D&contentTypeConstruct=text%2Fturtle%2C*%2F*%3Bq%3D0.9&contentTypeSelect=application%2Fsparql-results%2Bjson%2C*%2F*%3Bq%3D0.9&outputFormat=table)** (when an author is defined):

How would you modify the query above to return the number of authors for the book that has the **English** title: "1066 and All That"?

A Turtle-like syntax can also be used in SPARQL to make the query more readable:

```sparql
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT *
WHERE {
    ?book a dbo:Book ; 
        dbo:author ?author .
}
```

Consider a graph with the following 4 statements:

```turtle
<http://book1> rdf:type <http://dbpedia.org/ontology/Book> .
<http://book1> dbo:author <http://author1> .
<http://book2> rdf:type <http://dbpedia.org/ontology/Book> .
<http://book2> dbo:contributor <http://author2> .
```

The previous query will return only **one row of results with `<http://book1>` and `<http://author1>`**

d) **[Truncate the results of Task 1c) to only 10 results](https://api.triplydb.com/s/sc_5bOadM)** 

e) **[Display the number of authors for books in DBpedia](https://api.triplydb.com/s/alXTOBBJK)** 👥

f) **[Display the number of UNIQUE authors for books in DBpedia](https://api.triplydb.com/s/O9DeKnsXH)** 👤

# Task 2 [15-20min]: 
Moderately challenging SPARQL queries.

a) **[List 10 authors who wrote a book with more than 500 pages](https://api.triplydb.com/s/UbGw805Vr)** 📖

b) **[List 20 books in DBpedia that have the term grand in their name](https://api.triplydb.com/s/TGrgpbPsI)**

* **Hint:** use the [contains(string_to_look_in,string_to_look_for)](https://www.w3.org/TR/sparql11-query/#func-contains) function

c) **[List 20 book names from DBpedia together with the language of their names](https://api.triplydb.com/s/voF_vhc41)**

* **Hint:** use the [lang](https://www.w3.org/TR/sparql11-query/#func-lang) function.

d) **[List the top 5 longest books in DBpedia (with the most pages) in descending order](https://api.triplydb.com/s/ao8dNwQx9)**

# Task 3 [20min]: 
Challenging SPARQL queries.

a) **[List 10 book authors from DBpedia and the capital cities of the countries in which they were born](https://api.triplydb.com/s/1EnFo0hbi)**

b) **[Display the number of authors for the book that has the English title "1066 and All That"](https://api.triplydb.com/s/3YIxMgMF1)**

c) **[List all books with a name in English starting with "http" (case-insensitive)](https://api.triplydb.com/s/T8cRK4S5K)**

* **Hint:** use [langMatches](https://www.w3.org/TR/rdf-sparql-query/#func-langMatches), [STRSTARTS](https://www.w3.org/TR/sparql11-query/#func-strstarts) and [lcase](https://www.w3.org/TR/sparql11-query/#func-lcase) functions.
* **Note:** there are no results for this query from DBPedia as of 26 February 2021, possibly due to modifications to the DBpedia triplestore.

d) **[List all the unique book categories for all short books (less than 300 pages) written by authors who were born in Amsterdam](https://api.triplydb.com/s/DmWP3_cZ2)**

* **Hint:** use the [dct:subject](http://udfr.org/docs/onto/dct_subject.html) property of a [dbo:Book](https://dbpedia.org/ontology/Book) to define "category" in this task.


e) **[variation of Task 3d) in which we sort the results by the number of pages - longest to shortest](https://api.triplydb.com/s/av1NdRr4n)**

* **Note:** this task is an additional task not included in the original version of Lab 6.

# Examples of other public SPARQL endpoints 🔗

* Wikidata, facts powering Wikipedia infobox: https://query.wikidata.org/sparql
* Bio2RDF, linked data for the life sciences: https://bio2rdf.org/sparql
* Disgenet, gene-disease association: http://rdf.disgenet.org/sparql
* PathwayCommons, resource for biological pathways analysis: http://rdf.pathwaycommons.org/sparql
* EU publications office, court decisions and legislative documents from the EU: http://publications.europa.eu/webapi/rdf/sparql
* Finland legal open data, cases and legislation: https://data.finlex.fi/en/sparql 
* EU Knowledge Graph, open knowledge graph containing general information about the European Union: [SPARQL endpoint](https://query.linkedopendata.eu/#SELECT%20DISTINCT%20%3Fo1%20WHERE%20%7B%0A%20%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fentity%2FQ1%3E%20%3Chttps%3A%2F%2Flinkedopendata.eu%2Fprop%2Fdirect%2FP62%3E%20%3Fo1%20.%20%0A%7D%20%0ALIMIT%201000)

# SPARQL applied to the COVID pandemic: 

* Wikidata SPARQL queries around the SARS-CoV-2 virus and pandemic: https://egonw.github.io/SARS-CoV-2-Queries