# Query RDF with the SPARQL query language


* SPARQL (pronounced sparkle) stands for: **SPARQL Protocol And RDF Query Language**
* SPARQL 1.0 W3C-Recommendation since January 15th 2008
* SPARQL 1.1 W3C-Recommendation since March 21st 2013 Query language to query instances in RDF documents

Latest specifications: https://www.w3.org/TR/sparql11-query/


> Note: w3.org material are standards and recommendations accepted by the World Wide Web Consortium (W3C, the organism defining the Internet standards)

# SPARQL 

We will use the DBpedia SPARQL endpoint: https://dbpedia.org/sparql

> DBpedia is a project to represent (parts of) Wikipedia as RDF, it has been used has a huge playground for the Semantic Web for years. The data is not controlled or curated, which lead to poor data 

You can use a nicer query editor that can query any public endpoint: https://yasgui.triply.cc

This notebook uses the SPARQL Kernel to define and execute SPARQL queries in the notebook codeblocks.
To install the SPARQL Kernel in your JupyterLab:

```shell
pip install sparqlkernel
jupyter sparqlkernel install --user
```

> You will need to reinstall it if you stop and restart JupyterLab Docker container

To start running SPARQL query in this notebook, we need to define the **SPARQL kernel parameters**:
* 🔗 **URL of the SPARQL endpoint to query**
* 🌐 Language of preferred labels
* 📜 Log level

In [3]:
%endpoint http://dbpedia.org/sparql
%lang en

# This is optional, it would increase the log level
%log debug

# SPARQL components

Variables to resolve are defined using `?` (e.g. `?my_variable`)

```sparql
# prefix declarations: for abbreviating URIs
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# dataset definition (optional): which RDF graph(s) are being queried
FROM
# result clause: what information to return from the query
SELECT *
# query pattern: specifying what to query for in the underlying dataset
WHERE {
    ?s ?p ?o .
}
# query modifiers: slicing, ordering, and rearranging query results
ORDER BY ?s
LIMIT 10
```

## Run a SPARQL query

Let's get all triples in DBpedia:

In [4]:
SELECT *
WHERE {
  ?subject ?predicate ?object .
}

subject,predicate,object
http://www.openlinksw.com/virtrdf-data-formats#default-iid,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-iid-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-iid-nonblank-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#default-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-dt,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat
http://www.openlinksw.com/virtrdf-data-formats#sql-varchar-dt-nullable,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat


We can see that **DBpedia limit by default to 10.000 results** (and the SPARQL kernel shows only 20 for readability)

But the returned triples are not really interesting, we will need to filter our results

# Get all books

Get all the books in DBpedia

In [8]:
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT *
WHERE {
  ?book rdf:type dbo:Book .
}

book
http://dbpedia.org/resource/Havana_Storm
http://dbpedia.org/resource/The_Awful_German_Language
http://dbpedia.org/resource/Urmonotheismus
http://dbpedia.org/resource/Modern_C_Design
http://dbpedia.org/resource/1066_and_All_That
http://dbpedia.org/resource/2010:_Odyssey_Two
http://dbpedia.org/resource/401(k)
http://dbpedia.org/resource/A_Crown_of_Swords
http://dbpedia.org/resource/A_Dictionary_of_the_English_Language
http://dbpedia.org/resource/Adultery


Here a prefix is defined for the DBpedia ontology:

```sparql
PREFIX dbo: <http://dbpedia.org/ontology/>
```

The `rdf:` prefix is defined by default, but `rdf:type` can be shortened to `a`

```sparql
SELECT * WHERE {
  ?book a dbo:Book .
}
```

If run on the 2 following statements it will return only the `<http://book1>`

```turtle
<http://book1> rdf:type <http://dbpedia.org/ontology/Book> .
<http://country1> rdf:type <http://dbpedia.org/ontology/Country> .
```

# Get property of class

Get **author of books** (when one is defined):

In [5]:
PREFIX dbo:<http://dbpedia.org/ontology/>
SELECT *
WHERE {
  ?book a dbo:Book .
  ?book dbo:author ?author .
}

book,author
http://dbpedia.org/resource/1066_and_All_That,http://dbpedia.org/resource/R._J._Yeatman
http://dbpedia.org/resource/1066_and_All_That,http://dbpedia.org/resource/W._C._Sellar
http://dbpedia.org/resource/2010:_Odyssey_Two,http://dbpedia.org/resource/Arthur_C._Clarke
http://dbpedia.org/resource/A_Crown_of_Swords,http://dbpedia.org/resource/Robert_Jordan
http://dbpedia.org/resource/A_Dictionary_of_the_English_Language,http://dbpedia.org/resource/Samuel_Johnson
http://dbpedia.org/resource/Alice's_Adventures_in_Wonderland,http://dbpedia.org/resource/Lewis_Carroll
http://dbpedia.org/resource/Anne_of_Green_Gables,http://dbpedia.org/resource/Lucy_Maud_Montgomery
http://dbpedia.org/resource/Around_the_World_in_Eighty_Days,http://dbpedia.org/resource/Jules_Verne
http://dbpedia.org/resource/Between_Planets,http://dbpedia.org/resource/Robert_A._Heinlein
http://dbpedia.org/resource/Beyond_This_Horizon,http://dbpedia.org/resource/Robert_A._Heinlein


A turtle-like syntax can also be used to make the query more readable:

```sparql
PREFIX dbo:<http://dbpedia.org/ontology/>
SELECT *
WHERE {
  ?book a dbo:Book ; 
      dbo:author ?author .
}
```

In a graph with the following 4 statements:

```turtle
<http://book1> rdf:type <http://dbpedia.org/ontology/Book> .
<http://book1> dbo:author <http://author1> .
<http://book2> rdf:type <http://dbpedia.org/ontology/Book> .
<http://book2> dbo:contributor <http://author2> .
```

The previous query will return only one row of results with `<http://book1>` and `<http://author1>`