# Query RDF with the SPARQL query language


* SPARQL (pronounced sparkle) stands for: **SPARQL Protocol And RDF Query Language**
* SPARQL 1.0 W3C-Recommendation since January 15th 2008
* SPARQL 1.1 W3C-Recommendation since March 21st 2013 Query language to query instances in RDF documents

**Reference specifications: https://www.w3.org/TR/sparql11-query/**


> w3.org materials are standards and recommendations accepted by the World Wide Web Consortium (W3C, the organism defining the Internet standards)

# SPARQL endpoint 🔗

* Databases are not built to be publicly available, they usually live close to their applications
* Triplestore can be fully queried through a publicly available SPARQL endpoint URL
* Some solutions enable user management, but natively SPARQL endpoints are built to be open and give the same access to all their users

We will use the **DBpedia SPARQL endpoint**: 

>**https://dbpedia.org/sparql**

[DBpedia](https://wiki.dbpedia.org/) is a project to represent (parts of) Wikipedia as RDF, it has been used has a playground for the Semantic Web for years. The data is not controlled or curated, which lead to poor data quality (don't be surprised to find weird things)

You can use a nicer query editor that can query any public SPARQL endpoint: 

> **https://yasgui.triply.cc**

# Install the SPARQL kernel

This notebook uses the SPARQL Kernel to define and **execute SPARQL queries in the notebook** codeblocks.
To **install the SPARQL Kernel** in your JupyterLab installation:

```shell
pip install sparqlkernel --user
jupyter sparqlkernel install --user
```

To start running SPARQL query in this notebook, we need to define the **SPARQL kernel parameters**:
* 🔗 **URL of the SPARQL endpoint to query**
* 🌐 Language of preferred labels
* 📜 Log level

In [36]:
%endpoint http://dbpedia.org/sparql

# This is optional, it would increase the log level
%log debug

# Uncomment the next line to return label in english and avoid duplicates
# %lang en

# SPARQL query components

Variables to resolve are defined using `?` (e.g. `?my_variable`)

```sparql
# prefix declarations: for abbreviating URIs
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# dataset definition (optional): which RDF graph(s) are being queried
FROM
# result clause: what information to return from the query
SELECT *
# query pattern: specifying what to query for in the underlying dataset
WHERE {
    ?s ?p ?o .
}
# query modifiers: slicing, ordering, and rearranging query results
ORDER BY ?s
LIMIT 10
```

# FILTER the results

## Comparison operators: <, =, >, <=, >=, !=
* Comparison of data literals according to natural order
* Support for numerical data types, xsd:dateTime, xsd:string (alphabetic ordering), xsd:Boolean (1>0)
* For other types and other RDF-elements, only = and != are available
* Comparison of literals of incompatible types (e.g. xsd:string and xsd:integer) is not allowed, they must be converted

## Arithmetic operators: +, -, *, /
* Support for numerical data types
* Used to combine values in filter conditions 
* E.g. `FILTER(?weight/ (?size*?size)>=25)`


# Perform an arithmetic operation

Calculate the GDP per capita of countries from `dbp:gdpNominal` and `dbo:populationTotal` **impossible due to different datatypes** 🚫

In [33]:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?country ?gdpValue datatype(?gdpValue) AS ?gdpType ?population datatype(?population) AS ?populationType (?gdpValue / ?population AS ?gdpPerCapita)
WHERE {
    ?country dbp:gdpNominal ?gdpValue ;
             dbo:populationTotal ?population .
} LIMIT 10

country,gdpValue,gdpType,population,populationType,gdpPerCapita
http://dbpedia.org/resource/Arab_League,3.526E12,http://dbpedia.org/datatype/usDollar,423000000,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Syria,5.9957E10,http://dbpedia.org/datatype/usDollar,17064854,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Economic_Cooperation_Organization,US $1.9 trillion,http://www.w3.org/1999/02/22-rdf-syntax-ns#langString,416046863,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Egypt,3.30765E11,http://dbpedia.org/datatype/usDollar,85783,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/United_States,1.8558E13,http://dbpedia.org/datatype/usDollar,324720797,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Afghanistan,1.9654E10,http://dbpedia.org/datatype/usDollar,32564342,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Albania,1.2204E10,http://dbpedia.org/datatype/usDollar,2886026,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Algeria,1.8171E11,http://dbpedia.org/datatype/usDollar,40400000,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Andorra,4.51E9,http://dbpedia.org/datatype/usDollar,85470,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0
http://dbpedia.org/resource/Antigua_and_Barbuda,1.332E9,http://dbpedia.org/datatype/usDollar,91295,http://www.w3.org/2001/XMLSchema#nonNegativeInteger,0


# Cast a variable to a specific datatype

Especially useful when **comparing or performing an arithmetical operations on 2 variables**.

Here we divide a value in `usDollar` by a `nonNegativeInteger` casting the 2 to `xsd:integer` to calculate the GDP per capita of each country 💶

In [27]:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?country ?gdpValue ?population (xsd:integer(?gdpValue) / xsd:integer(?population) AS ?gdpPerCapita)
WHERE {
    ?country dbp:gdpNominal ?gdpValue ;
             dbo:populationTotal ?population .
} LIMIT 10

country,gdpValue,population,gdpPerCapita
http://dbpedia.org/resource/Arab_League,3.526E12,423000000,8335.0
http://dbpedia.org/resource/Syria,5.9957E10,17064854,3513.0
http://dbpedia.org/resource/Economic_Cooperation_Organization,US $1.9 trillion,416046863,
http://dbpedia.org/resource/Egypt,3.30765E11,85783,3855833.0
http://dbpedia.org/resource/United_States,1.8558E13,324720797,57150.0
http://dbpedia.org/resource/Afghanistan,1.9654E10,32564342,603.0
http://dbpedia.org/resource/Albania,1.2204E10,2886026,4228.0
http://dbpedia.org/resource/Algeria,1.8171E11,40400000,4497.0
http://dbpedia.org/resource/Andorra,4.51E9,85470,52767.0
http://dbpedia.org/resource/Antigua_and_Barbuda,1.332E9,91295,14590.0


# Count aggregated results

Count the number of books for each author 📚

In [13]:
PREFIX dbo:<http://dbpedia.org/ontology/>
SELECT ?author (count(?book) as ?book_count)
WHERE {
    ?book a dbo:Book ;
        dbo:author ?author .
} LIMIT 10

author,book_count
http://dbpedia.org/resource/Daniel_Carter_Beard,1
http://dbpedia.org/resource/Raymond_Smullyan,1
http://dbpedia.org/resource/William_Donaldson,1
http://dbpedia.org/resource/Joe_Conason,1
http://dbpedia.org/resource/John_Lennon,3
http://dbpedia.org/resource/Jürgen_Habermas,8
http://dbpedia.org/resource/Robert_Nathan,1
http://dbpedia.org/resource/J._P._Martin,6
http://dbpedia.org/resource/Anthony_Everitt,1
http://dbpedia.org/resource/Arthur_R.G._Solmssen,1


# Count depend on the aggregated results of a row

Here we select also the book, hence getting a count of 1 book for each row 📘

In [14]:
PREFIX dbo:<http://dbpedia.org/ontology/>
SELECT ?book ?author (count(?book) as ?book_count)
WHERE {
    ?book a dbo:Book ;
        dbo:author ?author .
} LIMIT 10

book,author,book_count
http://dbpedia.org/resource/The_Aunt's_Story,http://dbpedia.org/resource/Patrick_White,1
http://dbpedia.org/resource/The_Circus_of_Dr._Lao_and_Other_Improbable_Stories,http://dbpedia.org/resource/Ray_Bradbury,1
http://dbpedia.org/resource/The_Four_False_Weapons,http://dbpedia.org/resource/John_Dickson_Carr,1
http://dbpedia.org/resource/The_Smack_Man,http://dbpedia.org/resource/Nelson_DeMille,1
http://dbpedia.org/resource/The_Surprising_Archaea,http://dbpedia.org/resource/John_L._Howland,1
http://dbpedia.org/resource/Trapped_in_the_USSR,http://dbpedia.org/resource/J._J._Fortune,1
http://dbpedia.org/resource/A_Gent_from_Bear_Creek,http://dbpedia.org/resource/Robert_E._Howard,1
http://dbpedia.org/resource/A_Golden_Anniversary_Bibliography_of_Edgar_Rice_Burroughs,http://dbpedia.org/resource/Henry_Hardy_Heins,1
http://dbpedia.org/resource/Act_of_Providence,http://dbpedia.org/resource/Joseph_Payne_Brennan,1
http://dbpedia.org/resource/Atlas_of_the_British_Flora,http://dbpedia.org/resource/Franklyn_Perring,1


# SPARQL query breakdown 🧬

<img src="sparql_query_breakdown.png">

# Search on DBpedia 🔎

Use **[https://yasgui.triply.cc](https://yasgui.triply.cc)** to write and run SPARQL query on DBpedia

> Find DBpedia classes and relations: search on google, e.g.: "**[dbpedia capital](https://www.google.com/search?&q=dbpedia+capital)**"

* The capital ([dbo:capital](http://dbpedia.org/ontology/capital)) of the country in which authors of books are born ([dbo:birthPlace](http://dbpedia.org/ontology/birthPlace)), limit to 10

* All books with a name in english starting with "http" ignoring case

* Calculate the GDP per capita using countries `dbp:gdpNominal`, `dbo:populationTotal` (GDP/population) and compare it to the existing property in DBpedia `dbp:gdpNominalPerCapita`

> **Search for functions in the specifications: https://www.w3.org/TR/sparql11-query**


# Public SPARQL endpoints 🔗

* Wikidata, facts powering Wikipedia infobox: https://query.wikidata.org/sparql
* Bio2RDF, linked data for the life sciences: https://bio2rdf.org/sparql
* Disgenet, gene-disease association: http://rdf.disgenet.org/sparql
* PathwayCommons, resource for biological pathways analysis: http://rdf.pathwaycommons.org/sparql

# Going further

* Wikidata SPARQL queries around the SARS-CoV-2 virus and pandemic: https://egonw.github.io/SARS-CoV-2-Queries
* Use [prefix.cc](http://prefix.cc/) to resolve mysterious prefixes.