# 3. All about SPARQL

This module is about many of the parts of the SPARQL query language.

---

## 3.1. Finding out all about SPARQL

We will review the SPARQL documents, in particular the Query Language:

* <https://www.w3.org/TR/sparql11-query/>
    * also <https://www.w3.org/TR/sparql12-query/>

We will cover:

1. The multiple SPARQL specifications
2. Property Paths
3. Assignment
4. Aggregation, ORDER BY & LIMIT
5.  GRAPH
6.  Functions
    * SPARQL 1.1 Functions
7. DESCRIBE, CONSTRUCT & INSERT



## 3.1. The multiple SPARQL specifications

1. SPARQL Query Language - 1.1 & 1.2
2. SPARQL Update


## 3.2. Property Paths

```mermaid
flowchart LR
    d["Dataset A"]
    px["Person X"]
    d --prov:qualifiedAttribution--> qa1
    qa1 --prov:agent--> px    
    qa1 --prov:hadRole--> dr:custodian
```

There are two property paths here:

```mermaid
flowchart LR
    d --prov:qualifiedAttribution / prov:agent--> px
```

and

```mermaid
flowchart LR
    d --prov:qualifiedAttribution / prov:hadRole--> dr:custodian
```

and remember the person's name is a further step:

```mermaid
flowchart LR
    d["Dataset A"]
    px["Person X"]
    d --prov:qualifiedAttribution / prov:agent / schema:name--> px
```

In [2]:
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result


def table_print(r):
    display(Markdown(render_sparql_result(r)))
    

rdf_bn = """
PREFIX dr: <https://linked.data.gov.au/def/data-roles/>
PREFIX ex: <http://example.com/>
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX schema: <https://schema.org/>

ex:a
    a schema:Dataset ;
    schema:name "Dataset A" ;
    prov:qualifiedAttribution [
        prov:agent people:px ;
        prov:hadRole dr:custodian ;
    ] ,
    [
        prov:agent people:py ;
        prov:hadRole dr:rightsHolder
    ] ;
.

people:px
    a schema:Person ;
    schema:name "Person X" ;
.

people:py
    a schema:Person ;
    schema:name "Person Y" ;
.
"""

q = """
PREFIX dr: <https://linked.data.gov.au/def/data-roles/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX schema: <https://schema.org/>

SELECT ?name
WHERE {       
    ex:a prov:qualifiedAttribution/prov:agent/schema:name ?name .
}
"""

In [3]:
r = query(rdf_bn, q)
table_print(r)

| name |
| --- |
Person X
| Person Y |


### 3.2.2 QALI Example

A QALI Property Path query that gets all the values for all the parts of an Address:

```parql
PREFIX schema: <https://schema.org/>
PREFIX addr: <https://linked.data.gov.au/def/addr/>

SELECT *
WHERE {
  GRAPH ?g {
	  <https://linked.data.gov.au/dataset/qld-addr/address/65cb1e52-fc1d-5dee-a2d2-ea7882d12c7e> 
		schema:hasPart/schema:value ?v
  }
}
```

Try it at <https://training.cam.kurrawong.ai>...

### 3.2.1 More Path expressions

As per <https://www.w3.org/TR/sparql12-query/#pp-language>.


#### Sequence & Inverse

Operator | Name | Description
--- | --- | ---
`/` | Sequence | `x/y/z` is `x` then `y` then `z`
`^` | Inverse | reverse direction

```sparql
PREFIX schema: <https://schema.org/>
SELECT ?a ?v
WHERE {
  GRAPH ?g {
    BIND (<https://linked.data.gov.au/dataset/qld-addr/address/65cb1e52-fc1d-5dee-a2d2-ea7882d12c7e> AS ?a)
	# Property Path
    # ?a schema:hasPart/schema:value ?v . 
    
    # Property inverse
    ?p ^schema:hasPart ?a .
    ?p schema:value ?v .
  }
}
```

#### Alternative & Negation

Operator | Name | Description
--- | --- | ---
`\|` | Alternative | `a\|b` is `a` or `b`
`!` | Negation | `!x` is anything but `x`

```sparql
PREFIX addr: <https://linked.data.gov.au/def/addr/>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?v ?lbl
WHERE {
  GRAPH ?g {
	VALUES ?v {
  	  <https://linked.data.gov.au/dataset/qld-addr/road-label/QLDRBAR1531342627625024590>
  	  <https://linked.data.gov.au/dataset/qld-addr/gn/45306>
  	  <https://sws.geonames.org/2152274/>
  	  <https://sws.geonames.org/2077456/>
  	}
 
	# Alternative
    ?v schema:name|skos:prefLabel ?lbl .
    
    # Negation
    ?v !skos:prefLabel ?lbl .
  }
}
```

### Path operators

Operator | Name | Description
--- | --- | ---
`+` | One or more | Path of `x` or `x`/x` or `x/x/x...`
`*` | Zero or more | Path of None, `x` or `x`/x` or `x/x/x...`
`?` | One or Zero | None or `x`

In [None]:
rdf_data = \
    """
    PREFIX ex: <http://example.com/>
    PREFIX schema: <https://schema.org/>
    
    ex:nick 
        a schema:Person ;
        schema:parent ex:george ;
    .

    ex:george 
        a schema:Person ;
        schema:parent ex:miko ;
    .

    ex:miko 
        a schema:Person ;
        schema:parent ex:ivan ;
    .

    ex:ivan 
        a schema:Person ;
    .

    ex:mickie
        a schema:Person ;
        schema:parent ex:nick ;
    .    
    """

from rdflib import Graph
g = Graph().parse(data=rdf_data, format="turtle")

# how many triples?
print(len(g))

In [None]:
# Parent of nick
q = """
    PREFIX ex: <http://example.com/>
    PREFIX schema: <https://schema.org/>

    SELECT ?p 
    WHERE {
        ex:nick schema:parent ?p .
    }
    """

r = query(rdf_data, q)
table_print(r)

In [None]:
# All ancestors of nick
q = """
    PREFIX ex: <http://example.com/>
    PREFIX schema: <https://schema.org/>

    SELECT ?p 
    WHERE {
        ex:nick schema:parent+ ?p .
    }
    """

r = query(rdf_data, q)
table_print(r)

In [1]:
# nick and all his ancestors 
q = """
    PREFIX schema: <https://schema.org/>

    SELECT ?p 
    WHERE {
        ex:nick schema:parent* ?p
    }
    """

r = query(rdf_data, q)
table_print(r)

NameError: name 'query' is not defined

In [None]:
# nick, his parent and grandparent
q = """
    PREFIX schema: <https://schema.org/>

    SELECT ?p 
    WHERE {
        ex:nick schema:parent/schema:parent? ?p
    }
    """

r = query(rdf_data, q)
table_print(r)

There are more... but that's enough! See <https://www.w3.org/TR/sparql12-query/#pp-language>

## 3.3. Assignment

Assigning values to variables in queries.

* BIND
* VALUES

### BIND

Assigning a static value or the result of a calculation to a variable.

For static values, see the **Sequence & Inverse section**, above.

For results of calculation:

In [None]:
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result


def table_print(r):
    display(Markdown(render_sparql_result(r)))
    

rdf = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a schema:Person ;
    schema:name "Nick" ;
    schema:age 42 ;
.
"""

q = """
PREFIX schema: <https://schema.org/>

SELECT ?age ?ageInMonths
WHERE {
    ?p schema:age ?age .

    BIND ((?age*12) AS ?ageInMonths)
}
"""

r = query(rdf, q)
table_print(r)

### VALUES

Assigning multiple values (static or as a result of calculations) to a variable.

Literals example:

In [None]:
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result


def table_print(r):
    display(Markdown(render_sparql_result(r)))
    

rdf = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a schema:Person ;
    schema:name "Nick" ;
    schema:age 42 ;
.

people:george
    a schema:Person ;
    schema:name "George" ;
    schema:age 70 ;
.

people:cathy
    a schema:Person ;
    schema:name "Cathy" ;
    schema:age 68 ;
.
"""

q = """
PREFIX schema: <https://schema.org/>

SELECT ?p
WHERE {
    VALUES ?name {
        "Nick"
        "Bob"
    }
    
    ?p schema:name ?name .
}
"""

r = query(rdf, q)
table_print(r)

For an IRIs example, see the **Alternative & Negation section**, above.

## 3.4. Aggregation, ORDER BY & LIMIT

Just like SQL...

In [None]:
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result


def table_print(r):
    display(Markdown(render_sparql_result(r)))
    

rdf = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a schema:Person ;
    schema:name "Nick" ;
    schema:age 42 ;
.

people:george
    a schema:Person ;
    schema:name "George" ;
    schema:age 70 ;
.

people:cathy
    a schema:Person ;
    schema:name "Cathy" ;
    schema:age 68 ;
.
"""

q = """
PREFIX schema: <https://schema.org/>

SELECT ?p
WHERE {
    ?p 
        a schema:Person ;
        schema:age ?age ;
    .
}
ORDER BY DESC(?age)
LIMIT 2
"""

r = query(rdf, q)
table_print(r)

## 3.5. GRAPH

Selecting data from only a portion of a database - one graph amongst many.

Most modern RDF DBs store multiple **GRAPH**s of data:

&nbsp; | &nbsp; | &nbsp; | &nbsp; 
--- | --- | --- | --- 
`subject` | `predicate` | `object` | `graph`

`graph` then is just another filter/subset...

```sparql
SELECT * 
WHERE {
    GRAPH ?g {
        ?s ?p ?o
    }
}
```

> **NOTE**: RDF DBs are _sometimes_ configured to search all GRAPHs if none are specified but not always... so if you get no results, try with and without GRAPH

### A complex example using all of the above

```sparql
# Gets all the values of the parts of an Address and their labels
PREFIX addr: <https://linked.data.gov.au/def/addr/>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?v ?lbl
WHERE {
  BIND (<https://linked.data.gov.au/dataset/qld-addr/address/65cb1e52-fc1d-5dee-a2d2-ea7882d12c7e> AS ?a)
  {
    # if the Address Part is...
    GRAPH ?gAddr {
      # ... a Literal, bind its value to ?lbl
      ?a  schema:hasPart/schema:value ?v .
    
      FILTER ISLITERAL(?v)
      BIND (?v AS ?lbl)
  }
  }
  UNION
  {
    # ...not a Literal, get its label for ?lbl from other graphs
    # and allow either schema:name or skos:prefLabel
  	GRAPH ?gAddr2 {
	    ?a schema:hasPart/schema:value ?v .   
  	}
  	GRAPH ?gOther {
	    ?v schema:name|skos:prefLabel ?lbl .
  	}
  }
}
```

* Run at <https://training.cam.kurrawong.ai/#/dataset/qali/query> and then:

#### Each query part

##### BIND

`BIND (<https://linked.data.gov.au/dataset/qld-addr/address/65cb1e52-fc1d-5dee-a2d2-ea7882d12c7e> AS ?a)`

Assigning the long IRI to a variable, `?a` for reuse.

##### GRAPH

The Address and GeoNames & Roads info are in different graphs.

`GRAPH ?gAddr {...}` is all the Address stuff.

`GRAPH ?gOther {...}` is needed for the labels of GeoNames & Roads

##### Property Paths

`?a schema:hasPart/schema:value ?v .`

`?v schema:name|skos:prefLabel ?lbl .`

## 3.6. Functions

SPARQL [defines](https://www.w3.org/TR/sparql12-query/#SparqlOps) a long list of in-built functions. Above we have used `ISLITERAL(...)`.

Here are a few function examples:

In [None]:
# generate a UUID URN
rdf = "PREFIX : <http://example.com/> :a :b :c ."
q = """
    SELECT (UUID() AS ?uuid)
    WHERE {
      ?s ?p ?o
    }
    LIMIT 1
    """
r = query(rdf, q)
table_print(r)

In [None]:
# cast it to a string literal with STR()
rdf = "PREFIX : <http://example.com/> :a :b :c ."
q = """
    SELECT (STR(UUID()) AS ?uuid)
    WHERE {
      ?s ?p ?o
    }
    LIMIT 1
    """
r = query(rdf, q)
table_print(r)

In [None]:
# split the string to tidy with STRAFTER()
rdf = "PREFIX : <http://example.com/> :a :b :c ."
q = """
    SELECT ?uuid ?str_uuid
    WHERE {
      ?s ?p ?o .

      BIND (STRAFTER(STR(UUID()), "uuid:") AS ?uuid)

      BIND (STRUUID() AS ?str_uuid)
    }
    LIMIT 1
    """
r = query(rdf, q)
table_print(r)

### Literal filters

`CONTAINS(...)` will check to see if a literal contains a substring:

`CONTAINS("Springwood", "wood")` --> True  
`CONTAINS("Queensland", "wood")` --> False

> WARNING: The order of applying filters like CONTAINS is really important!

#### Ugly

```sparql
PREFIX addr: <https://linked.data.gov.au/def/addr/>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?a ?lbl
WHERE {
  GRAPH ?g {
    ?a 
      a addr:Address ;
      schema:hasPart/schema:value ?v ;
    .
  }
  GRAPH ?gOther {
    ?v schema:name|skos:prefLabel ?lbl .
  }
  
  FILTER CONTAINS(?lbl, "wood")  
}
LIMIT 100
```

The `GRAPH` parts get _ALL_ addresses & values & labels (2.5M * x * y), _THEN_ the `FILTER` is applied _THEN_ the `LIMIT`.

#### Bad

```sparql
PREFIX addr: <https://linked.data.gov.au/def/addr/>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?a ?lbl
WHERE {
  GRAPH ?g {
    ?a 
      a addr:Address ;
      schema:hasPart/schema:value ?v ;
    .
  }
  GRAPH ?gOther {
    ?v schema:name|skos:prefLabel ?lbl .

    FILTER CONTAINS(?lbl, "wood") 
  } 
}
LIMIT 100
```

The first `GRAPH` parts get _ALL_ addresses, then the second `GRAPH` gets all the labels for all addresses, _THEN_ the `FILTER` is applied _THEN_ the `LIMIT`.

#### Good

```sparql
PREFIX sc: <http://purl.org/science/owl/sciencecommons/>
PREFIX addr: <https://linked.data.gov.au/def/addr/>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?a ?lbl
WHERE {
  {
    SELECT *
    WHERE {
      GRAPH ?g {
        ?a 
          a addr:Address ;
          schema:hasPart/schema:value ?v ;
        .
      }
      GRAPH ?gOther {
        ?v schema:name|skos:prefLabel ?lbl .
      }
    }
    LIMIT 100
  }
  FILTER CONTAINS(?lbl, "wood")  
}
```

The subquery gets 100 Addresses & parts & labels _THEN_ the `FILTER` is applied.

> NOTE: we will look at text indexes in Module 4

## 3.7. DESCRIBE

Simple but powerful!

Gets all the inbound and outbound edges & nodes for a selected node.

Static value:

```sparql
DESCRIBE <https://linked.data.gov.au/dataset/qld-addr/address/65cb1e52-fc1d-5dee-a2d2-ea7882d12c7e>

DESCRIBE <https://linked.data.gov.au/def/addr-part-types/countryName>

DESCRIBE <https://linked.data.gov.au/def/addr-part-types>
```

Selected value:

```sparql
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

DESCRIBE ?v
WHERE {
  GRAPH ?g {
    BIND (<https://linked.data.gov.au/dataset/qld-addr/address/65cb1e52-fc1d-5dee-a2d2-ea7882d12c7e> AS ?a)
    
    ?a
      schema:hasPart/schema:value ?v ;
    .
  }
}
```

## 3.8. CONSTRUCT

`CONSTRUCT` lets you make a graph return, rather than a table.

`CONSTRUCT` is frequently used to return an RDF subgraph of a larger graph. APIs often use it, people, less so.

In [None]:
# a pretty-print function
def construct_print(rdf, query):
    from rdflib import Graph
    g = Graph(bind_namespaces="none").parse(data=rdf_data, format="turtle")
    x = g.query(q)
    for prefix, namespace in g.namespaces():
        # print(prefix, namespace)
        x.graph.bind(prefix, namespace)

    print(x.serialize(format="turtle").decode())

In [None]:
rdf_data = \
    """
    PREFIX ex: <http://example.com/>
    PREFIX schema: <https://schema.org/>
    
    ex:nick 
        a schema:Person ;
        schema:parent ex:george ;
    .

    ex:george 
        a schema:Person ;
        schema:parent ex:miko ;
    .

    ex:miko 
        a schema:Person ;
        schema:parent ex:ivan ;
    .

    ex:ivan 
        a schema:Person ;
    .

    ex:mickie
        a schema:Person ;
        schema:parent ex:nick ;
    .    
    """

# just get all the people, in graph form
q = """
    PREFIX ex: <http://example.com/>
    PREFIX schema: <https://schema.org/>
    
    CONSTRUCT {
        ?p a schema:Person .
    }
    WHERE {
        ?p a schema:Person .
    }
    """

construct_print(rdf_data, q)

In [None]:
# Create a new predicate: grandparent = parent + parent
# and return all values in a graph
q = """
    PREFIX ex: <http://example.com/>
    PREFIX schema: <https://schema.org/>
    
    CONSTRUCT {
        ?p1 ex:grandParent ?p2
    }
    WHERE {
        ?p1 
            a schema:Person ;
            schema:parent/schema:parent ?p2 ;
        .
    }
    """

construct_print(rdf_data, q)