# Prerequisites

We will start by importing some modules, and defining the RDF prefixes and some other common variables that will be used throughout the notebook.

In [1]:
from rdflib import Graph

# SPARQL-compatible format so that we can use it both for data and queries
prefixes = '''
# Common vocabularies
prefix qb:   <http://purl.org/linked-data/cube#>
prefix qudt: <http://qudt.org/schema/qudt/>
prefix unit: <http://qudt.org/vocab/unit/>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix dct:  <http://purl.org/dc/terms/>
prefix owl:  <http://www.w3.org/2002/07/owl#>

# Some observable property vocabularies
prefix cf: <http://purl.oclc.org/NET/ssnx/cf/cf-property#>
prefix eea: <https://www.eea.europa.eu/help/glossary/eea-glossary/>
prefix wmo: <https://space.oscar.wmo.int/variables/view/>

# AD4GD prefixes
prefix ad4gd-prop: <https://w3id.org/ad4gd/properties/>

# Shorthand for defining resources in this document
prefix : <http://example.com/c/>
'''

# OGC Hosted SPARQL endpoint
sparql_endpoint = 'https://defs-dev.opengis.net/fuseki-hosted/query'

# Querying a DCAT catalog

The following example shows how a DCAT catalog can be queried to find all datasets that observe a given property.

Let us start by building the catalog itself in Turtle format with 3 datasets:

In [2]:
catalog_ttl = prefixes + '''
:catalog a dcat:Catalog ;
  dct:title "My catalog" ;
  dcat:dataset :dataset1, :dataset2, :dataset3 ;
.

:dataset1 a dcat:Dataset ;
  dct:title "Dataset 1 about PM10" ;
  dcat:distribution [
    dcat:downloadURL <http://example.com/downloads/d11.csv> ;
    dct:format <https://www.iana.org/assignments/media-types/text/csv> ;
  ] ;
  qb:structure [
    qb:component [
      qb:measure eea:pm10 ;
      qudt:hasUnit unit:MicroGM-PER-M3 ;
    ]
  ] ;
.

:dataset2 a dcat:Dataset ;
  dct:title "Dataset 2 about NO2" ;
  dcat:distribution [
    dcat:downloadURL <http://example.com/export/d2.xlsx> ;
    dct:format <https://www.iana.org/assignments/media-types/application/vnd.ms-excel> ;
  ] ;
  qb:structure [
    qb:component [
      qb:measure wmo:no2 ;
      qudt:hasUnit unit:MicroGM-PER-M3 ;
    ]
  ] ;
.

:dataset3 a dcat:Dataset ;
  dct:title "Dataset 3 about PM10, but different" ;
  dcat:distribution [
    dcat:downloadURL <http://example.net/files/d33.csv> ;
    dct:format <https://www.iana.org/assignments/media-types/text/csv> ;
  ] ;
  qb:structure [
    qb:component [
      qb:measure cf:mass_fraction_of_pm10_ambient_aerosol_in_air ;
    ]
  ] ;
.
'''
catalog = Graph().parse(data=catalog_ttl, format='ttl')
print('Loaded', len(catalog), 'triples')

Loaded 31 triples


Each dataset describes its structural (`qb:structure`) components (`qb:component`), in this case their measured properties; additionally, two of the datasets also include metadata about the units employed.

Next, we will try to find which datasets have data about PM10. We will use `https://w3id.org/ad4gd/properties/pm10` (or `ad4gd-prop:pm10` if using the prefixes declared above), the AD4GD observable property defined to that effect:

In [3]:
query = prefixes + '''
SELECT DISTINCT ?dataset ?title ?property ?unit WHERE {
  {
    # This subquery is resolved first, and it retrieves the aliases for PM10, both direct and inverse
    SELECT DISTINCT ?propertyAlias WHERE {
      SERVICE <@SPARQL_ENDPOINT@> {
        { ad4gd-prop:pm10 owl:sameAs ?propertyAlias }
        UNION
        { ?propertyAlias owl:sameAs ad4gd-prop:pm10 }
      }
    }
  }
  
  ?dataset a dcat:Dataset;                                # Find a Dataset
    dct:title ?title ;                                    # with a title
    qb:structure/qb:component ?structureComponent ;       # and with a structure component
  . 
  ?structureComponent qb:measure ?property .              # that has a measure with a property
  FILTER (?property IN (ad4gd-prop:pm10, ?propertyAlias)) # and the property is PM10 or one of its aliases
  OPTIONAL { ?structureComponent qudt:hasUnit ?unit }     # we also retrieve the unit, if any
}
'''.replace('@SPARQL_ENDPOINT@', sparql_endpoint)

# Uncomment this to see the query with line numbers
# print('\n'.join(f"{i: 3}: {l}" for i, l in enumerate(query.split('\n'))))

pm10_bindings = catalog.query(query).bindings
print(' -', '\n - '.join(f"{b['title']} ({b['dataset']}), measuring {b['property']}"
                         f"{' in ' + str(b['unit']) if b.get('unit') else ''}"
                         for b in pm10_bindings))

 - Dataset 3 about PM10, but different (http://example.com/c/dataset3), measuring http://purl.oclc.org/NET/ssnx/cf/cf-property#mass_fraction_of_pm10_ambient_aerosol_in_air
 - Dataset 1 about PM10 (http://example.com/c/dataset1), measuring https://www.eea.europa.eu/help/glossary/eea-glossary/pm10 in http://qudt.org/vocab/unit/MicroGM-PER-M3


This is just one of the many ways to query our catalog; for example, a subquery is used here, but two separate queries (one for the property aliases and another for the datasets) could have been run instead.

Additionally, the bottom part of the query could be re-run against SPARQL endpoints for different catalog sources, merging the results together in one single graph.

# Adding sensor metadata to observations

TBD