# 4. Extending SPARQL

Today's outline:

0. Module 3 revision
1. Custom functions
    1. GeoSPARQL
    2. Timefuncs
    3. Others
4. Text Indexes

## 4.0 Module 3 revision

Before training with this module, a revision of Modules 3 should be done first.

### 4.1 Custom Functions

The SPARQL language allows for the definition of custom functions. 

While in-built functions use keywords such as `CONTAINS(...)`, custom functions need to define IRIs for their functions, e.g. `ex:myfunc(...)` and thus `FILTER (ex:myfunc(...))`.

* 4.1: A widely used SPARQL function extension is [GeoSPARQL](http://www.opengis.net/doc/IS/geosparql/1.1).

* 4.2: [RDFLib Time functions](https://github.com/RDFLib/timefuncs) are not widely used but are a good demonstration of how to implement SPARQL extension functions in an RDF toolkit.

* 4.3: Extension functions to leverage text indexes are implemented by all modern RDF DBs that support SPARQL queries.

### 4.1.A GeoSPARQL

The [GeoSPARQL Standard](http://www.opengis.net/doc/IS/geosparql/1.1) defines toplogical and other functions, e.g.:

'Simple Features _Within_' function is `<http://www.opengis.net/def/function/geosparql/sfWithin>` or `geof:sfWithin`.

This query will return all the Features, `?f`, whos Geometry's coordinates are within "POLYGON((...))".


```turtle
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT *

WHERE {
    ?f 
       a geo:Feature ;
       schema:name ?l ;
       geo:hasGeometry/geo:asWKT ?wkt ;
    .

    FILTER STRSTARTS(?l, "N")
    
    FILTER geof:sfWithin(?wkt, "POLYGON((...))")
}
```

All GeoSAPRQL examples: <https://docs.ogc.org/is/22-047r1/22-047r1.html#_7bbe8aba-08ea-4910-bca1-bfaa79c79a71>

#### 4.1.A.1 EIA Demo System GeoSPARQL

We will now test some GeoSPARQL functions usinf the EIA Demonstrator system at https://eia.testing.bdr.gov.au/sparql.

##### 4.1.A.1.1 Hand-created query

A straightforward GeoSPARQL intersection:

```sparql
PREFIX geo: <http://www.opengis.net/ont/geosparql#> 
PREFIX geof: <http://www.opengis.net/def/function/geosparql/> 
PREFIX schema: <https://schema.org/> 

SELECT DISTINCT ?iri ?name
WHERE {
  ?iri a schema:Dataset ; 
  schema:name ?name ;
  geo:hasBoundingBox/geo:asWKT ?wktGeometry ; . 

  BIND ("POLYGON ((148.23641690634832 -35.34799618325066, 148.23641690634832 -35.90143822828261, 148.99954757917214 -35.90143822828261, 148.99954757917214 -35.34799618325066, 148.23641690634832 -35.34799618325066))"^^geo:wktLiteral AS ?input_area)

  FILTER geof:sfIntersects(?input_area, ?wktGeometry) 
} 
ORDER BY ?name
```

We should see something like 7 results.

Also see the example queries on the page https://eia.testing.bdr.gov.au/sparql.

##### 4.1.A.1.2 Custom areas by hand

Use a GIS system to get coordinates, perhaps https://geojson.io & https://geojson-to-wkt-converter.onrender.com/ and use them in queries at https://eia.testing.bdr.gov.au/sparql.


##### 4.1.A.1.3 UI-created query

Try the "S1" scenario at https://eia.testing.bdr.gov.au/eia-demo

### 4.3 Time functions

The time functions library implements custom SPARQL functions in RDFLib for temporal relations: the temporal equivalent to GeoSAPRQL: <https://github.com/RDFLib/timefuncs>.

In [None]:
from rdflib import Graph, Namespace
from rdflib.namespace import TIME
from timefuncs import is_after, is_before


def timefuncs_print(rdf, query, relation="is before"):
    def iri_clean(s):
        return str(s).replace("http://example.com/", "")
        
    g = Graph().parse(data=rdf_data, format="turtle")
    
    for r in g.query(
        q,
        initNs={
            "time": TIME, 
            "tfun": TFUN, 
            "ex": Namespace("http://example.com/")
        }):
        print(iri_clean(r[0]), relation, iri_clean(r[1]))
    
    
TFUN = Namespace("https://w3id.org/timefuncs/")

rdf_data = \
    """
    PREFIX time: <http://www.w3.org/2006/time#>
    PREFIX : <http://example.com/>
    
    :a
        a time:TemporalEntity ;
        time:inXSDDateTimeStamp "2021-07-15T23:59:59Z" ;
    .
    
    :b
        a time:TemporalEntity ;
        time:inXSDDateTimeStamp "2021-07-16T23:59:59Z" ;
    .
    
    :c
        a time:TemporalEntity ;
        time:inXSDDateTimeStamp "2021-07-17T23:59:59Z" ;
    .
        
    :d
        a time:TemporalEntity ;
        time:inXSDDateTimeStamp "2021-07-10T23:59:59Z" ;
    .
    """



q = """
     SELECT ?a ?b
     WHERE {
         ?a a time:TemporalEntity .
         ?b a time:TemporalEntity .

         FILTER tfun:isBefore(?a, ?b)
     }
     """

timefuncs_print(rdf_data, q)

In [None]:
q = """
     SELECT ?a ?b
     WHERE {
         ?a a time:TemporalEntity .
         ?b a time:TemporalEntity .

         FILTER tfun:isAfter(?a, ?b)
     }
     """

timefuncs_print(rdf_data, q, relation="is after")

## 4.1.C Others

Queensland Spatial Information have a custom function, realised in both GraphDB and Fuseki, that compounds an address from multiple, related objects, recursively as per the national Address Model, https://linked.data.gov.au/def/addr.

This saves users a lot of time and works both "manually" - via the SPARQL UI - and within systems.

### 4.2. Lucene Text Index

RDF DBs implement graph indexes and can implement others too, such as spatial (using GeoSPARQL / GDAL) and (full) text indexes with tools like the well-known text indexer [Lucene](https://lucene.apache.org/).

This means you don't need to do this:

```sparql
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?a ?lbl
WHERE {
  ?a rdfs:label ?lbl .
    
  FILTER STRSTARTS(?lbl, "Something")
}
```

Do this:

```sparql
PREFIX  text: <http://jena.apache.org/text#>

SELECT DISTINCT ?a ?lbl
WHERE {
  ( ?a ?score ?lbl ) text:query "Something"
}
ORDER BY DESC(?score)
LIMIT 5
```

Indexes other than graph indexes are custom per RDF DB product. Jena's full text index documentation is at <https://jena.apache.org/documentation/query/text-query.html>.

The Lucense search syntax is described here: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html.

#### 4.2.A

We will now try some FTS v. FILTERS at https://data.idnau.org/sparql

```sparql
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT * WHERE {
  ?iri rdfs:label ?name .
  
  FILTER (REGEX(?name, "ajamanu", "i"))
}
```

```sparql
PREFIX  text: <http://jena.apache.org/text#>

SELECT DISTINCT ?a ?lbl
WHERE {
  	( ?a ?score ?lbl ) text:query "*ajamanu*"
}
ORDER BY DESC(?score)
LIMIT 5
```

A note on what fields are FTS indexed...