# What is SPARQL

**SPARQL** (SPARQL Protocol And RDF Query Language):
- the query language for data in RDF format
- protocol (REST)

To practice, we will use an example graph describing the domain of a locality (settlement) that contains selected information from the knowledge graph **DBpedia** (https://www.dbpedia.org). 
DBpedia contains data from Wikipedia structured into RDF format.

In [None]:
!pip install rdflib

from rdflib import Graph
g = Graph()

g.parse("settlements3.ttl", format="ttl")
print("Graph contains %s triples." % len(g))

In rdflib, SPARQL queries can be issued to a graph using the rdflib.graph.Graph.query() method.

# Triple patterns and basic pattern
The main form of query in SPARQL is the `SELECT` query, which looks a bit like an SQL query. A `SELECT` query consists of two main elements: a header with a list of selected variables and a `WHERE` clause to specify the graph patterns we want to match in the query, specifically the **basic graph pattern** (written in curly brackets). 

The result of a `SELECT` query is a table in which there will be one column for each selected variable and one row for each pattern match.

The basic building blocks of SPARQL queries are **triple patterns**. These are similar to RDF triples, but you can use a variable in any of the triples positions. We use them to find matching triples in a graph, and the variables act as wildcards to match any node in the graph.

In [None]:
qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       
       SELECT ?x ?y
       WHERE {
          ?x dbo:country ?y .
       }""")

for row in qres:
    print("%s is located in country %s" % row)

<span style="color:red">__Exercise 1: Compose a simple query on graph g (containing one triple pattern) for entities with their locations (`dbo:location`) in Warsaw. Warsaw is represented by the resource `dbr:Warsaw`, where `dbr` is a prefix associated with the namespace <http://dbpedia.org/resource/>). The list of results should include resource http://dbpedia.org/resource/Copernicus_Science_Centre.__ </span>

In [None]:
#enter the solution to task 1 here



for row in qres:
    print("%s is located in Warsaw" % row)


Let us then ask a query containing two triple patterns, for geographical objects that are located in specific districts of Warsaw:

In [None]:
  qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       PREFIX dbr: <http://dbpedia.org/resource/>
       
       select ?poi ?district
       WHERE {
        dbr:Warsaw dbo:subdivision ?district   .
        ?poi dbo:location ?district .
       }
       """)

for row in qres:
    print("%s located in %s" % row)

<span style="color:red">__Exercise 2: Compose graph g with a query (containing two triple patterns) for the types of geographic features that are located in Warsaw (use the `dbo:location` and `rdf:type` properties). The list of results should include the tuple: (http://dbpedia.org/resource/Copernicus_Science_Centre, http://dbpedia.org/ontology/Museum)__</span>

In [None]:
#enter the solution to task 2 here

for row in qres:
    print("%s is of type %s" % row)

# Query modifiers

If we would like the query results to be sorted by the value of the selected variable, we can add an `ORDER BY` clause:

In [None]:
  qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       PREFIX dbr: <http://dbpedia.org/resource/>
       PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
       
       select ?poi ?district 
       WHERE {
        dbr:Warsaw dbo:subdivision ?district   .
        ?poi dbo:location ?district .
       }
       ORDER BY ?poi
       """)

for row in qres:
    print("%s located in %s" % row)

The `LIMIT` modifier, in turn, allows us to display a limited number of results:

In [None]:
qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       PREFIX dbr: <http://dbpedia.org/resource/>
       PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
       
       select ?poi ?district 
       WHERE {
        dbr:Warsaw dbo:subdivision ?district   .
        ?poi dbo:location ?district .
       }
       LIMIT 3
       """)

for row in qres:
    print("%s located in %s" % row)

  <span style="color:red">__Task 3: formulate a query to graph g about cities located in Poland limiting the results to 5__ </span>.

In [None]:
#enter the solution to task 3 here

# FILTER clause

The FILTER clause allows the exclusion of selected triples from the query results. Its idea is to perform a Boolean test to include or exclude results based on the value of a given variable. 

SPARQL supports many built-in functions for writing such expressions, such as:
comparison operators: (`=`,`!=`, `<`, `<=`, `>`, `>=`)
logical operators (`&&`, `|`, `!`)
mathematical operators (`+`, `-`, `/`, `*`)  


In [None]:
qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       PREFIX dbr: <http://dbpedia.org/resource/>

       SELECT ?settlement ?population
       WHERE {
        ?settlement dbo:country dbr:Poland .
        ?settlement dbo:populationTotal ?population . FILTER (?population >= "500000"^^xsd:int)
       }""")

for row in qres:
    print("%s has population %s" % row)

<span style="colour:red">__Task 4: formulate a query to graph g about the names of towns and their areas (`dbo:areaTotal`) that have areas greater than 120000000__ </span>.

In [None]:
#enter the solution to task 4 here

# OPTIONAL clause

Using the OPTIONAL clause, we can specify parts of a query that do not have to match the graph for the whole query to return a result for a given graph pattern. For example, a knowledge graph such as DBpedia may contain information about the population of a locality, but not about its area, yet we may want to return information about a locality, even if it is partial (available not for every locality).

In [None]:
qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       PREFIX dbr: <http://dbpedia.org/resource/>

       SELECT ?settlement ?population
       WHERE {
        ?settlement dbo:country dbr:Poland .
        OPTIONAL {?settlement dbo:populationTotal ?population . }
       }""")

for row in qres:
    print("%s has population %s" % row)


<span style="color:red">__Exercise 5: Query for city names with optional area information (`dbo:areaTotal`).__ </span>

In [None]:
# enter the solution to task 5 here

# ASK query

If we want to determine whether a given triple pattern or an RDF triple will find a match in the graph at all, and not necessarily the match on all results, we can ask an `ASK` query that returns a `true` or `false` value:

In [None]:
qres = g.query(
    """PREFIX dbo: <http://dbpedia.org/ontology/>
       PREFIX dbr: <http://dbpedia.org/resource/>
       ASK 
       WHERE {
          dbr:Warsaw dbo:location dbr:Poland 
       }""")

for row in qres:
    print("%s" % row)

<span style="color:red">__Exercise 6: Write an `ASK` query similar to the one above, but asking if the country of Warsaw is Poland (`dbo:country`).__ </span>

In [None]:
# enter the solution to task 6 here

# DBpedia's SPARQL endpoint

SPARQL queries are executed against RDF datasets, consisting of RDF graphs. 
A SPARQL endpoint is a service that accepts queries and returns results over HTTP. 
SPARQL endpoints have their own addresses, usually associated with specific datasets. 
The address of a SPARQL endpoint associated with the DBpedia knowledge graph is https://dbpedia.org/sparql.
DBpedia also offers interfaces for browsing the graph as well as for querying it: https://dbpedia.org/sparql/
 

__Task 7: Using the interface provided by DBpedia and the knowledge you have acquired so far about this graph, formulate the following queries in SPARQL so as to get results using the SPARQL DBpedia endpoint:__

1. a list of people born in Warsaw 
2. list of museums in Kraków
3. list of people born in Warsaw who have won a Nobel Prize
4. dates of birth of people born in Krakow  
.

In [None]:
#  enter the solution to task 7 here