# 2. Querying RDF - SPARQL

This module applies simple SPARQL queries to simple data.

Some pre-build Python functions are used to lodge the query and present neat results:

* `query()` - the [kurra]() toolki's general-purpose query function that works with inline RDF data or databased
* `table_print()` - a function that lets Jupyter Notebooks render a SPARQL query result in Markdown nicely

---

## 2.1. Revision: Running a basic query

The Turtle data from Notebook 1 was:

```turtle
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a
        schema:Person ,
        schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schema:parent people:george ;
.

people:george
    a schema:Person ;
    schema:name "George" ;
    schema:age 70 ;
.
```

Here there are two people, `people:nick` and `people:george`. To find all the people with age greater than 50 (just George), we can query the data like this:

```sparql
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

SELECT ?p
WHERE {
    ?p
        a schema:Person ;
        schema:age ?age ;
    .

    FILTER (?age > 50)
}
```

This part matches a "subgraph":

```
    ?p
        a schema:Person ;
        schema:age ?age ;
    .
```

where `?p` & `?age` are variables and `a`, `schema:Person` & `schema:name` are all fixed values.

This part filters all the matched subgraphs:

```FILTER (?age > 50)```

Let's really run this:

In [2]:
# importing some things we need
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result

# a pretty table printing function
def table_print(r):
    display(Markdown(render_sparql_result(r)))


# our data, in Turtle format
rdf_data = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a
        schema:Person ,
        schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schema:parent people:george ;
.

people:george
    a schema:Person ;
    schema:name "George" ;
    schema:age 70 ;
.
"""

# our SPARQL query
q = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

SELECT ?p ?name
WHERE {
    ?p
        a schema:Person ;
        schema:name ?name ;
        schema:age ?age ;
    .

    FILTER (?age > 50)
}
"""

In [None]:
# run the query on the data
r = query(rdf_data, q)
table_print(r)

If we have 2 people older than 50 and we wanted their ages:

In [None]:
rdf_data2 = """
PREFIX ex: <http://example.com/>
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a
        schema:Person ,
        schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schema:parent people:george ;
.

people:george
    a schema:Person ;
    schema:name "George" ;
    schema:age 70 ;
    schema:spouse people:cathy ;  # NEW
.

people:cathy
    a schema:Person ;
    schema:name "Cathy" ;
    schema:age 68 ;
    schema:spouse people:george ; # symmetrical
.
"""

q2 = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

SELECT ?name ?age
WHERE {
    ?p
        a schema:Person ;
        schema:name ?name ;
        schema:age ?age ;
    .

    FILTER (?age > 50)
}
"""

r = query(rdf_data2, q2)
table_print(r)

## 2.2 Interactive Query making Session

For the following data, we will make some queries interactively.

In [13]:
rdf_data3 = """
PREFIX ex: <http://example.com/>
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>

# linking Patient to Person, using information from schema.org
schema:Patient
    rdfs:subclassOf schema:Person ;
.

people:nick
    a schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schema:parent people:george ;
.

people:george
    a schema:Person ;
    schema:name "George" ;
    schema:age 70 ;
    ex:sex ex:male ;
    schema:spouse people:cathy ;
    schema:parent
        people:miko ,
        people:vasma ;
.

people:cathy
    a schema:Person ;
    schema:name "Cathy" ;
    ex:sex ex:female ;
    schema:age 68 ;
    schema:spouse people:george ;
.

people:miko
    a schema:Person ;
    schema:name "Miko" ;
    ex:sex ex:male ;
.

people:vasma
    a schema:Person ;
    schema:name "Vasma" ;
    ex:sex ex:female ;
.
"""

Make queries to answer the following:

1. Who are all the `Person` instances
    * hint: not just those typed as `schema:Person`
2. Find all the persons with a spouse
3. Find all the male spouses
4. Find all the parents of Nick
    * hint: not just those people declared directly as such
5. Find all the grandparents of Nick

In [None]:
# incomplete answer for 1. above
q3 = """
    PREFIX schema: <https://schema.org/>

    SELECT ?p ?name
    WHERE {
        ?p
            a schema:Person ;
            schema:name ?name ;
        .
    }
    ORDER BY ?name
    """

r = query(rdf_data3, q3)
table_print(r)

## 2.3 Querying RDF files

We will now write the code to query an RDF file stored on disk, not just a Python data object.

The data in the `rdf_data3` python variable above is replicated in the file `module-2-persons.ttl` which we can query like this:

In [19]:
# create the query
q = """
    PREFIX schema: <https://schema.org/>

    SELECT ?p ?name
    WHERE {
        ?p
            a schema:Person ;
            schema:name ?name ;
        .
    }
    ORDER BY ?name
    """

# import Python's Path module to access files
from pathlib import Path

# query the file, just like querying data
r = query(Path("module-2-files/persons.ttl"), q)
table_print(r)

| p | name |
| --- | --- |
[cathy](https://linked.data.gov.au/dataset/people/cathy) | Cathy
| [george](https://linked.data.gov.au/dataset/people/george) | George
| [miko](https://linked.data.gov.au/dataset/people/miko) | Miko
| [vasma](https://linked.data.gov.au/dataset/people/vasma) | Vasma |


### 2.3.1 additional file handling

To query multiple files, load each into an in-memory graph like this:

In [21]:
from rdflib import Graph

g = Graph()
print(f"Created g, g now has {len(g)} triples")
g.parse("module-2-files/persons.ttl")
print(f"Loaded persons.ttl, g now has {len(g)} triples")
g.parse("module-2-files/schema-org-bits.ttl")
print(f"Loaded schema-org-bits.ttl, g now {len(g)} triples")

q = """
    PREFIX schema: <https://schema.org/>

    SELECT ?p ?name
    WHERE {
        VALUES ?t {
            schema:Person
            schema:Patient
        }

        ?p
            a ?t ;
            schema:name ?name ;
        .
    }
    ORDER BY ?name
    """

r = query(g, q)  # querying a Graph object
table_print(r)

Created g, g now has 0 triples
Loaded persons.ttl, g now has 23 triples
Loaded schema-org-bits.ttl, g now 24 triples


| p | name |
| --- | --- |
[cathy](https://linked.data.gov.au/dataset/people/cathy) | Cathy
| [george](https://linked.data.gov.au/dataset/people/george) | George
| [miko](https://linked.data.gov.au/dataset/people/miko) | Miko
| [nick](https://linked.data.gov.au/dataset/people/nick) | Nick
| [vasma](https://linked.data.gov.au/dataset/people/vasma) | Vasma |


Just one more demo, a predicate ("property") path:

In [31]:
q = """
    PREFIX schema: <https://schema.org/>

    SELECT DISTINCT ?p ?name ?cls
    WHERE {
        ?p
            a ?cls ;
            schema:name ?name ;
        .

        ?cls
            rdfs:subclassOf* schema:Person ;
        .
    }
    ORDER BY ?name
    """

r = query(g, q)  # querying a Graph object
table_print(r)

| p | name | cls |
| --- | --- | --- |
[cathy](https://linked.data.gov.au/dataset/people/cathy) | Cathy | [Person](https://schema.org/Person)
| [george](https://linked.data.gov.au/dataset/people/george) | George | [Person](https://schema.org/Person)
| [miko](https://linked.data.gov.au/dataset/people/miko) | Miko | [Person](https://schema.org/Person)
| [nick](https://linked.data.gov.au/dataset/people/nick) | Nick | [Patient](https://schema.org/Patient)
| [vasma](https://linked.data.gov.au/dataset/people/vasma) | Vasma | [Person](https://schema.org/Person) |


## 2.4. Running a query on a remote DB

Run a query for 10 Concepts from the KurrawongAI demo server within the _Seabed geomorphology - Part 1 Morphology_ vocabulary:

SPARQL consists not only of a query language, but also:

* **update extension** - how to perform write queries and manipulate whole datasets
* **service description** - how to describe the capabilities of a SPARQL DB
* **protocol** - defining how queries are to be sent to remote servers
* **several results formats**

All the SPARQL documents link to these, see <https://www.w3.org/TR/sparql12-query/#related>

Having a defined _protocol_ allows us to know how to interact with any DB claiming to conform to SPARQL.

We do this via HTTP (web) requests.

Let's just get on and perform a SPARQL query using a public-avialable "SPARQL Endpoint":

In [None]:
# importing some Python things we need
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result

# a pretty table printing function
def table_print(r):
    display(Markdown(render_sparql_result(r)))

# a simple query to list terms in a vocabulary - Seabed geomorphology - Part 1 Morphology
q = """
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?c ?pl
WHERE {
  ?c
    a skos:Concept ;
    skos:inScheme <https://pid.geoscience.gov.au/def/voc/ga/SeabedGeomorphologyMorphology> ;
  	skos:prefLabel ?pl ;
  .
}
ORDER BY ?pl
LIMIT 10
"""

# run the query against the DB
r = query("https://prez.niceforest-128e6d31.australiaeast.azurecontainerapps.io/sparql", q)

# pretty-print the result
table_print(r)

## 2.5. Using a SPARQL Endpoint UI

Now we will try this directly on the DB UI and talk through that interface: 

* <https://demo.dev.kurrawong.ai/sparql>

## 2.6 Using local SPARQL-enabled DB

We can run a database locally that can respond to SPARQL queries.

There are many DBs that implement the SPARQL standard, and today we will be using GraphDB locally, a free version of which can be downloaded from <https://graphdb.ontotext.com/>.

<div class="alert alert-block alert-info"><b>Tip:</b>
<p>GraphDB is a polished product but it's not open source and fully functional versions are not free. The free version is totally fine for non-enterprise use.</p>
<p>For open source, free alternatives, see <a href="https://jena.apache.org/documentation/fuseki2/">Fuseki</a> and <a href="https://rdf4j.org">RDF4J</a>.</p>
</div>

To be demonstrated with GraphDB:

* Downloading
* Running
* Loading an RDF data file
* Querying

## 2.7 Training Resources

Here are some SPARQL training resources for homework:

1. SPARQL 1.2 Query Language
    * https://www.w3.org/TR/sparql12-query/
    * the main SPARQL resources
    * links to supplementary resources for SPARQL Protocol, Results formats etc. See "Set of Documents" section
2. GraphDB
    * Free version download: https://graphdb.ontotext.com/
    * good to play with, on your desktop
3. _Learning SPARQL_ textbook
    * https://www.learningsparql.com/
    * the most commonly used intro to SPARQL book
4. YouTube: SPARQL Tutorial
    * https://www.youtube.com/playlist?list=PLea0WJq13cnA6k4B6Tr1ljj2nleUl9dZt
    * a 29-part video series on SPARQL
    * uses _Learning SPARQL_
5. YouTube: _SPARQL in 11 minutes_
    * https://www.youtube.com/watch?v=FvGndkpa4K0
6. YouTube: _Stardog Academy Fundamentals: Getting Started with RDF & SPARQL_
    * https://www.youtube.com/watch?v=bDxclRhDb-o
    * a 1/2 hr intro to RDF & SPARQL