In [1]:
# Importing CSS
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles.css", "r").read()
    return HTML(styles)
css_styling()

__Omics Technologien - Tutorial__

***

# Exercise 5: Querying RDFs with SPARQL
<br>

<div class="logos">
    <div>
        <img src="./figures/Universität_Bielefeld.png"/>
    </div>
    <div>
        <img src="./figures/isaslogooffizielleform100k.jpg"/>
    </div>
</div>

***

Robert Heyer, Kay Schallert, Maximilian Wolf

__Content__

- Introduction to SPAQRL
- Retrieving information from the web of linked data
 

__Aim__

- Understanding the syntax of SPARQL queries
- Application of SPARQL to perform simple queries

# Content:
1. The SPARQL Query Language
2. SPARQL Endpoints
3. Property paths
4. Filter statements
5. Aggregate functions
6. Solution sequence Modifiers

# 1) The SPARQL Query Language
***
- declarative query language (insprired by SQL) for data manipulation and data definition operations on data represented as RDF statements
- every SPARQL query consits of a head and a Query body
    - __head__: provides the basis for categorizing different types of query solutions (`SELECT`, `DELETE`, ...)
    - __body__: comprises a collection of RDF statement patterns that represent the entity relationships to which a query is scoped
- SPQARQL is a graph pattern (or Triple pattern) matching algorithm

## 1.1) Types of SPARQL queries:
- read oritented query types (different ways to present results):
    - `SELECT` (Returns all, or a subset of, the variables bound in a query pattern match.)
    - `CONSTRUCT` (Returns an RDF graph constructed by substituting variables in a set of triple templates.)
    - `DESCRIBE` (Returns an RDF graph that describes the resources found.)
    - `ASK` (Returns a boolean indicating whether a query pattern matches or not.)
- write-oriented query types:
    - `CREATE`
    - `INSERT`
    - `UPDATE`
    - `DELETE`

#### Excursus: HTTP Content Negotiation
- if you type "https://dbpedia.org/resource/Aragorn" in your browser, you get redirected to "https://dbpedia.org/page/Aragorn"
- the URI "./resource/Aragorn" describes the thing itself (Designatum) and "./page/Aragorn" is the documentation about the thing (Designator)
- the browser redirects to the documentation as the http GET request by default only accepts human readable html/text
- just because the ressource has no page, it does not mean it dos not exist 
- example: try accessing https://dbpedia.org/resource/Andúril via browser

## 1.2) Definition of variables in SPARQL:
- variables are differentiated from other terms in the query by a "?" e.g., `?name`
- variables are bound to RDF terms and can appear in an arbitrary place in the triple (subject, property, object)
- example: Look for Agents and their weapons. --> `?agent dbp:weapon ?thing`

## 1.3) General format of a SPARQL SELECT query:

- definition of prefixes
> `SELECT ?OutputVaraible1 ?OutputVariable2, ... or * (for "everthing")` <br>
> `Optional: FROM <http://graph2query.org>` <br>
> `WHERE clause to define pattern to match`<br>
> `Optional: Solution modifiers`<br>


<div>
    <img src="./figures/SPARQL-WordLift.png"  width="60%"/>
</div>

## 1.4) Getting started with a simple query

In [2]:
pip install sparql-dataframe

Collecting sparql-dataframe
  Downloading sparql_dataframe-0.4-py3-none-any.whl.metadata (1.8 kB)
Collecting SPARQLWrapper>=1.8.1 (from sparql-dataframe)
  Downloading SPARQLWrapper-2.0.0-py3-none-any.whl.metadata (2.0 kB)
Collecting rdflib>=6.1.1 (from SPARQLWrapper>=1.8.1->sparql-dataframe)
  Downloading rdflib-7.5.0-py3-none-any.whl.metadata (12 kB)
Downloading sparql_dataframe-0.4-py3-none-any.whl (3.5 kB)
Downloading SPARQLWrapper-2.0.0-py3-none-any.whl (28 kB)
Downloading rdflib-7.5.0-py3-none-any.whl (587 kB)
   ---------------------------------------- 0.0/587.2 kB ? eta -:--:--
   ---------------------------------------- 587.2/587.2 kB 6.9 MB/s  0:00:00
Installing collected packages: rdflib, SPARQLWrapper, sparql-dataframe

   ---------------------------------------- 0/3 [rdflib]
   ---------------------------------------- 0/3 [rdflib]
   ------------- -------------------------- 1/3 [SPARQLWrapper]
   ---------------------------------------- 3/3 [sparql-dataframe]

Successfully

In [3]:
import sparql_dataframe 

-  Helper used in this exercise to convert SPARQLWrapper results to human readable Pandas dataframes.
-  use SPARQLWrapper or RDFlib results for more manipulatable results (for instance in Turtle format)
-  (installs also RDFlib and SPARQLWrapper)

In [4]:
# simple SELECT Query (for a ressource with a specific label)
endpoint = "http://dbpedia.org/sparql"
q = """
    PREFIX : <http://dbpedia.org/resource/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT ?character
    WHERE {
            ?character rdfs:label "Aragorn"@en .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character
0,http://dbpedia.org/resource/Aragorn


# 2) SPARQL Endpoints
***
- SPARQL Endpoints are Points of Presence on an HTTP network that are capable of receiving and processing SPARQL Protocol requests
- (some) available endpoints are listed here (https://www.w3.org/wiki/SparqlEndpoints)
- Predefined namespaces for SPARQL endpoint: https://dbpedia.org/sparql/?help=nsdecl (helps to not get mad because of typos, but at some endpoints, they have to be defined in query head)
- Predefined namespaces are different for different SPARQL endpoints (e.g., https://sparql.uniprot.org/ vs. https://dbpedia.org/sparql/?help=nsdecl) 

In [5]:
#conjunctive SELECT query

endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName
    WHERE {
            ?character rdfs:label "Aragorn"@en .
            ?character dbp:weapon ?WeaponName .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName
0,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/List_of_weapons_an...


# 3) Property paths
***
- is a possible route through a graph between two graph nodes without being limited to adjoining neighbours each time
- possiblities: 
    - alternatives
    - sequences
    - inverse property paths


<div>
     <img src="./figures/PropertyPathSyntax.png" width=50%/>
</div>

- uri = URI / prefixed name;  elt = path element

- https://www.w3.org/TR/sparql11-query/#propertypaths

## 3.1) using `SELECT` statements and property paths (^elt)

In [6]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?linkedObject
    WHERE {
            "Aragorn"@en ^rdfs:label ?character.
            ?character ?d ?linkedObject
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,linkedObject
0,http://dbpedia.org/resource/Aragorn,http://www.w3.org/2002/07/owl#Thing
1,http://dbpedia.org/resource/Aragorn,http://www.ontologydesignpatterns.org/ont/dul/...
2,http://dbpedia.org/resource/Aragorn,http://www.wikidata.org/entity/Q24229398
3,http://dbpedia.org/resource/Aragorn,http://www.wikidata.org/entity/Q95074
4,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/ontology/Agent
...,...,...
327,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Arwen
328,http://dbpedia.org/resource/Aragorn,2.25
329,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/List_of_weapons_an...
330,http://dbpedia.org/resource/Aragorn,40.0


## 3.2) using `SELECT` statements and property paths ({n})

__try:__ <br>
> `{1}`, `{2}`, `{2,3}`

In [7]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?linkedObject
    WHERE {
            "Aragorn"@en ^rdfs:label ?character.
            ?character dbo:wikiPageWikiLink{2} ?linkedObject
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,linkedObject
0,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Urban_poverty
1,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Torture_in_popular...
2,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Torture_in_China
3,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/The_Inquisition
4,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Psychologically_re...
...,...,...
9995,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Ariel_Dorfman
9996,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/American_Yakuza
9997,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Concentration_camp
9998,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/Sean_Penn


## 3.3) using `SELECT` statements and property paths

__try:__ <br>
>`|`    or 
>`/`

In [8]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?linkedObject
    WHERE {
            "Aragorn"@en ^rdfs:label ?character.
            ?character dbo:spouse | rdfs:label ?linkedObject .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,linkedObject
0,http://dbpedia.org/resource/Aragorn,Aragorn
1,http://dbpedia.org/resource/Aragorn,Àragorn
2,http://dbpedia.org/resource/Aragorn,أراغورن
3,http://dbpedia.org/resource/Aragorn,Aragorn
4,http://dbpedia.org/resource/Aragorn,Aragorn (Tolkiens Welt)
5,http://dbpedia.org/resource/Aragorn,Aragorn
6,http://dbpedia.org/resource/Aragorn,Aragorn
7,http://dbpedia.org/resource/Aragorn,Aragorn
8,http://dbpedia.org/resource/Aragorn,Aragorn
9,http://dbpedia.org/resource/Aragorn,アラゴルン


## 3.4) conjunctive `SELECT` query with redundant subjects / predicates

In [9]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth .
            ?character dbo:wikiPageWikiLink dbr:Protagonist .
            ?character dbo:firstAppearance "The Fellowship of the Ring (1954)" .
            ?character dbp:weapon ?WeaponName .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName
0,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/List_of_weapons_an...


## 3.5) Conjunctive query using predicate and object list

In [10]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth , dbr:Protagonist ;
                dbo:firstAppearance "The Fellowship of the Ring (1954)" ;
                dbp:weapon ?WeaponName .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName
0,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/List_of_weapons_an...


## 3.6) Conjunctive query using `UNION` (see OR in Neo4J)

In [11]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth , dbr:Protagonist .
            {?character dbo:firstAppearance "The Fellowship of the Ring (1954)"} 
            UNION 
            {?character dbo:firstAppearance "The Hobbit (1937)"}
            ?character dbp:weapon ?WeaponName .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName
0,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/List_of_weapons_an...
1,http://dbpedia.org/resource/Gandalf,
2,http://dbpedia.org/resource/Gandalf,Glamdring
3,http://dbpedia.org/resource/Gandalf,Wizard's staff


## 3.7) Conjunctive query using `OPTIONAL` matching

In [12]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth, dbc:The_Lord_of_the_Rings_characters.
            OPTIONAL {?character dbp:weapon ?WeaponName}
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName
0,http://dbpedia.org/resource/Elendil,The sword Narsil
1,http://dbpedia.org/resource/Aragorn,http://dbpedia.org/resource/List_of_weapons_an...
2,http://dbpedia.org/resource/Gandalf,
3,http://dbpedia.org/resource/Gandalf,Glamdring
4,http://dbpedia.org/resource/Gandalf,Wizard's staff
5,http://dbpedia.org/resource/Merry_Brandybuck,
6,http://dbpedia.org/resource/Samwise_Gamgee,
7,http://dbpedia.org/resource/Saruman,
8,http://dbpedia.org/resource/Éomer,
9,http://dbpedia.org/resource/Sauron,


# 4) `FILTER` statements
***
A constraint, expressed by the keyword `FILTER`, is a restriction on solutions over the whole group in which the filter appears. The following patterns all have the same solutions
- __SPARQL Filter Functions:__
    - Logical: !, &&, ||
    - Comparison: =, !=, >, <, IN, NOT IN
    - SPARQL tests: isIRI, isURI, isBlank, isLiteral, isNumeric, bound
    - Strings: STRLEN, SUBSTR, UCASE, LCASE, STRSTARTS, STRENDS, CONTAINS, ...
    - many more ... (see e.g., : https://docs.data.world/tutorials/sparql/list-of-sparql-filter-functions.html)

<br> <br>
__try:__ <br>
`FILTER CONTAINS(str(?character), "Aragorn")` <br> 
`FILTER ISURI(?WeaponName)`

In [13]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName ?wikiPageLength
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth, dbc:The_Lord_of_the_Rings_characters;
                dbo:wikiPageLength ?wikiPageLength.
            OPTIONAL {?character dbp:weapon ?WeaponName}
            FILTER CONTAINS(str(?character), "Aragorn")
            }
    """
df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName,wikiPageLength


# 5) Aggregate functions 
***
- `COUNT`
-  `GROUP BY`
-  `SUM`
- `AVG`
- `MIN`
- `MAX`
- `GROUP_CONCAT`
    
***
 __Aggregate function `COUNT`:__

In [14]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character COUNT(?WeaponName)
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth, dbc:The_Lord_of_the_Rings_characters.
            OPTIONAL {?character dbp:weapon ?WeaponName}
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,callret-1
0,http://dbpedia.org/resource/Merry_Brandybuck,0
1,http://dbpedia.org/resource/Goldberry,0
2,http://dbpedia.org/resource/Saruman,0
3,http://dbpedia.org/resource/Faramir,0
4,http://dbpedia.org/resource/Éomer,0
5,http://dbpedia.org/resource/Gandalf,3
6,http://dbpedia.org/resource/Sauron,0
7,http://dbpedia.org/resource/Gollum,0
8,http://dbpedia.org/resource/Arwen,0
9,http://dbpedia.org/resource/Galadriel,0


# 6) Solution sequence Modifiers:
***
- `ORDER` modifier: put the solutions in order
- `PROJECTION` modifier: choose certain variables
- `DISTINCT` modifier: ensure solutions in the sequence are unique
- `REDUCED` modifier: permit elimination of some non-unique solutions
- `OFFSET` modifier: control where the solutions start from in the overall sequence of solutions
- `LIMIT` modifier: restrict the number of solutions

## 6.1) Conjunctive query using `ORDER BY` statements

try different orders: <br>
> `ORDER BY ?character DESC(?wikiPageLength)`<br>
> `ORDER BY ?character ASC, ORDER BY ?wikiPageLength`

In [15]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT ?character ?WeaponName ?wikiPageLength
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth, dbc:The_Lord_of_the_Rings_characters;
                dbo:wikiPageLength ?wikiPageLength.
            OPTIONAL {?character dbp:weapon ?WeaponName}
            }
    ORDER BY ?character
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character,WeaponName,wikiPageLength


## 6.2) Conjunctive query using `DISTINCT` and `REDUCED`

- `SELECT DISTINCT`: remove duplicate values
- `SELECT REDUCED`: allow removal of some duplicate values (rarely used, but faster than `DISTINCT`)

In [16]:
endpoint = "http://dbpedia.org/sparql"
q = """
    SELECT REDUCED ?character
    WHERE {
            ?character dbo:wikiPageWikiLink dbr:Middle-earth, dbc:The_Lord_of_the_Rings_characters;
                dbo:wikiPageLength ?wikiPageLength.
            OPTIONAL {?character dbp:weapon ?WeaponName}
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

Unnamed: 0,character


In [17]:
endpoint = "http://dbpedia.org/sparql"
q = """
    DESCRIBE ?character 
    WHERE {
            ?character rdfs:label "Aragorn"@en .
            }
    """

df = sparql_dataframe.get(endpoint, q)
df

QueryException: Only SPARQL SELECT queries are supported.