# Data Discovery, Search and Transactions

This part of the Workshop will demonstrate the EOEPCA Resource Catalogue building block. The Resource Catalogue is built upon the OSGeo project [pycsw](https://pycsw.org/). 

[pycsw](https://pycsw.org/) is an OGC API - Records and OGC CSW server implementation written in Python. Started in 2010 (more formally announced in 2011), pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OGC API - Records, OAI-PMH, SRU), providing a standards-based metadata and catalogue component of spatial data infrastructures. 

pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X).

pycsw is [Certified OGC Compliant](https://www.ogc.org/resources/product-details/?pid=1661) and is an [OGC Reference Implementation](https://www.ogc.org/resources/product-details/?pid=1661)

EOEPCA Resource Catalogue is powered by the upcoming version 3.x of pycsw. You can find more details in the [pycsw documentation](https://docs.pycsw.org/en/latest/) or the recent [FOSS4G presentation](https://pycsw.org/publications/foss4g2023/#/)

In this notebook, we will demonstrate data discovery, search and transactions using Python.

More specifically we are going to use the OWSLib python library.

[OWSLib](https://geopython.github.io/OWSLib) is a Python package for client programming with Open Geospatial Consortium (OGC) web service (hence OWS) interface standards, and their related content models. In this demo we’ll work with the CSW, OGC API - Records and OpenSearch interfaces.

In [None]:
from owslib.csw import CatalogueServiceWeb
from owslib.ogcapi.records import Records
from owslib.opensearch import OpenSearch
from owslib.fes import And, Or, PropertyIsEqualTo, PropertyIsGreaterThanOrEqualTo, PropertyIsLessThanOrEqualTo, PropertyIsLike, BBox, SortBy, SortProperty
from geolinks import sniff_link
import folium
import json

First we define the url of the EOEPCA Resource Catalogue (demonstration cluster).

In [None]:
domain = "demo.eoepca.org"
#domain = "develop.eoepca.org"

In [None]:
system_catalogue_endpoint = f"https://resource-catalogue.{domain}/"

### Data Discovery with OGC CSW

In this part of the workshop, the user will use the Resource Catalogue CSW endpoint to discover data collections and datasets.
The `owslib.csw` class of OWSLib is instantiated and service metadata are shown.

In [None]:
csw_endpoint = f'{system_catalogue_endpoint}/csw'

In [None]:
csw = CatalogueServiceWeb(csw_endpoint, timeout=30)

Service metadata shown here includes identification type (from ISO-19115), CSW version and supported operations

In [None]:
csw.identification.type

In [None]:
csw.version

In [None]:
[op.name for op in csw.operations]

As well as catalogue queryables:

In [None]:
csw.get_operation_by_name('GetRecords').constraints

The user can make a GetRecords request to get all records of the catalogue, with a page limit of 10.

In [None]:
csw.getrecords2(maxrecords=10)
csw.results

In [None]:
for rec in csw.records:
    print(f'identifier: {csw.records[rec].identifier}\ntype: {csw.records[rec].type}\ntitle: {csw.records[rec].title}\n')

If the user wishes to discover data with usage of filters, an OGC Filter can be used. Here we demonstrate how to create spatial (`bbox`), temporal (`time`), and attribute (`apiso:CloudCover`) filters combined with logical operators like and/or

In [None]:
bbox_query = BBox([37, 13.9, 37.9, 15.1])

In [None]:
begin = PropertyIsGreaterThanOrEqualTo(propertyname='apiso:TempExtent_begin', literal='2019-09-10 00:00')

In [None]:
end = PropertyIsLessThanOrEqualTo(propertyname='apiso:TempExtent_end', literal='2019-09-12 00:00')

In [None]:
cloud = PropertyIsLessThanOrEqualTo(propertyname='apiso:CloudCover', literal='20')

In [None]:
filter_list = [
    And(
        [
            bbox_query,  # bounding box
            begin, end,  # start and end date
            cloud        # cloud
        ]
    )
]

The filter is then applied to the GetRecords request and results are shown:

In [None]:
csw.getrecords2(constraints=filter_list, outputschema='http://www.isotc211.org/2005/gmd')
csw.results

In [None]:
selected_record = list(csw.records)[1]

In [None]:
for rec in csw.records:
    print(f'identifier: {csw.records[rec].identifier}\ntype: {csw.records[rec].identification[0].identtype}\ntitle: {csw.records[rec].identification[0].title}\n')

Another option is to perform a collection level search, using the `apiso:parentIdentifier` queryable. Here only the Sentinel2 L1C datasets will be discovered.

In [None]:
collection_query = PropertyIsEqualTo('apiso:ParentIdentifier', 'S2MSI1C')

In [None]:
csw.getrecords2(constraints=[collection_query], outputschema='http://www.isotc211.org/2005/gmd')
csw.results

Or we can just search using the bbox filter

In [None]:
csw.getrecords2(constraints=[bbox_query], outputschema='http://www.isotc211.org/2005/gmd')
csw.results

Or we can perform a full text search (here the keyword Orthoimagery is used)

In [None]:
anytext_query = PropertyIsEqualTo('csw:AnyText', 'Orthoimagery')

In [None]:
filter_list = [
    And(
        [
            bbox_query,  # bounding box
            anytext_query # any text
        ]
    )
]

In [None]:
csw.getrecords2(constraints=filter_list)
csw.results

We can also iterate through the catalogue search results by passing the `startposition` and `maxrecords` parameters to GetRecords request:

In [None]:
csw_records = {}
sortby = SortBy([SortProperty('dc:title', 'ASC')])
pagesize=10
maxrecords=20
startposition = 0
nextrecord = getattr(csw, 'results', 1)
while nextrecord != 0:
    csw.getrecords2(constraints=[anytext_query], startposition=startposition,
                    maxrecords=pagesize, sortby=sortby)
    csw_records.update(csw.records)
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break
csw.records.update(csw_records)
records = '\n'.join(csw.records.keys())
print('Found {} records.\n'.format(len(csw.records.keys())))
for key, value in list(csw.records.items()):
    print(f'identifier: {value.identifier}\ntype: {value.type}\ntitle: {value.title}\n')

The user then selects a record identifier and asks the catalogue to fetch the full record. Here we demonstrate how to obtain properties like title, bbox, full xml and links from the metadata record.

In [None]:
csw.getrecordbyid(id=[selected_record])

In [None]:
rec = csw.records[selected_record]

In [None]:
rec.title

In [None]:
rec.xml

In [None]:
rec.references

In [None]:
print("dataset bbox = (%s, %s, %s, %s)" % (rec.bbox.miny, rec.bbox.minx, rec.bbox.maxy, rec.bbox.maxx))

Using the [geolinks](https://github.com/geopython/geolinks) Python library we can filter the links that are of a specific type (here WMS and WCS links to be used for visualization)

In [None]:
msg = 'geolink: {geolink}\nscheme: {scheme}\nURL: {url}\n'.format
for ref in rec.references:
    print(msg(geolink=sniff_link(ref['url']), **ref))

In [None]:
for ref in rec.references:
    url = ref['url']
    if 'WMS' in url:
        print(msg(geolink=sniff_link(url), **ref))
        break

In [None]:
for ref in rec.references:
    url = ref['url']
    if 'WCS' in url:
        print(msg(geolink=sniff_link(url), **ref))
        break

We demonstrate how to show the record footprint on a map, using the [Folium](https://github.com/python-visualization/folium) Python library:

In [None]:
m = folium.Map(location=[38, 20], zoom_start=6, tiles='OpenStreetMap')
folium.Rectangle(bounds=[[float(rec.bbox.miny), float(rec.bbox.minx)], [float(rec.bbox.maxy), float(rec.bbox.maxx)]]).add_to(m)
m

### Data Discovery with OpenSearch

In this part of the demo, the user will use the OpenSearch capability of the system resource catalogue to discover datasets.

In [None]:
opensearch_endpoint = f'{csw_endpoint}?service=CSW&version=3.0.0&request=GetCapabilities&mode=opensearch'

Here, the OWSLib OpenSearch client is used.

In [None]:
os = OpenSearch(opensearch_endpoint)

We use the description object to retrieve service metadata.

In [None]:
os.description.shortname

In [None]:
os.description.longname

In [None]:
os.description.description

In [None]:
os.description.urls

In [None]:
os.description.tags

Another posibility is to use the HTTP GET requests directly with a generic client like requests.

In [None]:
import requests
from bs4 import BeautifulSoup

In [None]:
S = requests.Session()

Here, the user asks for the OpenSearch entrypoint template

In [None]:
R = S.get(url=opensearch_endpoint)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

A GetRecords request (in the context of OpenSearch) is performed

In [None]:
url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

A collections search is also demonstrated, for Sentinel2 Level 2A results

In [None]:
url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&eo:parentIdentifier=S2MSI2A'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can also use OpenSearch EO mathematical notation to filter based on other parameters

In [None]:
# OpenSearch EO mathematical notation
# n1 equal to field = n1
# {n1,n2,…} equals to field=n1 OR field=n2 OR …
# [n1,n2] equal to n1 <= field <= n2
# [n1,n2[ equals to n1 <= field < n2
# ]n1,n2[ equals to n1 < field < n2
# ]n1,n2] equal to n1 < field  <= n2
# [n1 equals to n1<= field
# ]n1 equals to n1 < field
# n2] equals to field <= n2
# n2[ equals to field < n2

url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&eo:cloudCover=]20'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can get one record using the identifier through the OpenSearch EO API:

In [None]:
url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&recordids=S2B_MSIL1C_20190910T095029_N0208_R079_T33SUB_20190910T120214.SAFE'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can search using a bbox parameter:

In [None]:
url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&bbox=13.9,37,15.1,37.9'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can search using a time parameter:

In [None]:
url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=2019-09-10/2019-09-12'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

Or separate time start/stop parameters:

In [None]:
url = csw_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&start=2019-09-10&stop=2019-09-12'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

### Data Discovery with OGC API - Records

In this part of the workshop, the user will use the system level resource catalogue OGC API Records endpoint to discover data datasets.
The `owslib.ogcapi.records` class of OWSLib is instantiated and service metadata are shown.

In [None]:
domain = "demo.eoepca.org"
#domain = "develop.eoepca.org"
system_catalogue_endpoint = f"https://resource-catalogue.{domain}/"

In [None]:
w = Records(system_catalogue_endpoint)

In [None]:
w.url

Conformance classes supported by the OGC API Records server:

In [None]:
w.conformance()

OpenAPI document of the OGC API Records server:

In [None]:
w.api()

Collections available on the catalogue:

In [None]:
w.collections()

In [None]:
records = w.records()

In [None]:
len(records)

The user can then specify the collection to search within the catalogue:

In [None]:
my_catalogue = w.collection('metadata:main')

In [None]:
my_catalogue['id']

Collection level queryables:

In [None]:
w.collection_queryables('metadata:main')

Query the catalogue for all records:

In [None]:
my_catalogue_query = w.collection_items('metadata:main')

In [None]:
my_catalogue_query['numberMatched']

Metadata of first result:

In [None]:
my_catalogue_query['features'][0]['properties'].keys()

In [None]:
my_catalogue_query['features'][0]['properties']['title']

Query the catalogue using filters:

In [None]:
#my_catalogue_query2 = w.collection_items('metadata:main', q='Orthoimagery')
my_catalogue_query2 = w.collection_items('metadata:main', bbox=['13.9','37','15.1','37.9'])

In [None]:
my_catalogue_query2['numberMatched']

Full query result:

In [None]:
my_catalogue_query2

Text CQL query:

In [None]:
my_catalogue_cql_text_query = w.collection_items('metadata:main', filter="title LIKE 'S2B_MSIL1C_%'")

In [None]:
my_catalogue_cql_text_query['numberMatched']

In [None]:
my_catalogue_cql_text_query['features'][0]['properties']['title']

JSON CQL query:

In [None]:
my_catalogue_cql_json_query = w.collection_items('metadata:main', limit=1, cql={'op': '=', 'args': [{ 'property': 'title' }, 'S2B_MSIL1C_20190917T112119_N0208_R037_T31UCB_20190917T132014.SAFE']})

In [None]:
my_catalogue_cql_json_query['features'][0]['properties']['title']

### Data Discovery with STAC API

In this part of the demo, the user will use the system level resource catalogue STAC API endpoint to discover data datasets.
The `pystac_client` library is used as a STAC client.

In [None]:
from pystac_client import Client
from pystac_client import ConformanceClasses

In [None]:
domain = "demo.eoepca.org"
#domain = "develop.eoepca.org"
system_catalogue_endpoint = f"https://resource-catalogue.{domain}/"

In [None]:
catalog = Client.open(system_catalogue_endpoint)

STAC catalogue metadata:

In [None]:
catalog.id

In [None]:
catalog.title

In [None]:
catalog.description

Conformance classes supported by the STAC API endpoint:

In [None]:
dir(ConformanceClasses)

Validation of STAC API using `pystac_client`:

In [None]:
catalog._stac_io.assert_conforms_to(ConformanceClasses.ITEM_SEARCH)

In [None]:
catalog._stac_io.assert_conforms_to(ConformanceClasses.CORE)

Query the catalogue for all STAC items:

In [None]:
mysearch = catalog.search(collections=['metadata:main'], max_items=10)
print(f"{mysearch.matched()} items found")

Query the STAC API using filters:

In [None]:
mysearch = catalog.search(collections=['metadata:main'], bbox=[13.9,37,15.1,37.9], max_items=10)
#mysearch = catalog.search(collections=['metadata:main'], bbox=[-72.5,40.5,-72,41], max_items=10)
print(f"{mysearch.matched()} items found")

Iterate through the query results:

In [None]:
items = mysearch.get_items()
for item in items:
    print(item.id)

Show last STAC item JSON:

In [None]:
print(json.dumps(item.to_dict(), indent=2))
#print(item.to_dict())

### Transactions with EOEPCA Registration API

This is an example on how to use the EOEPCA Registration API to register metadata records in the Resource Catalogue.

In [None]:
import requests
import datetime
import time
import json

In [None]:
#domain = "demo.eoepca.org"
domain = "develop.eoepca.org"
registration_endpoint = f'https://registration-api-open.{domain}'

We will now register an ADES instance in the Registration API

In [None]:
response = requests.post(
    f"{registration_endpoint}/register",
    json={
        "type": "ades",
        "url": "https://demo.pygeoapi.io/stable/processes",
    }
)
response.raise_for_status()
response

Check if the ADES is actually registered:

In [None]:
time.sleep(3)
response = requests.get(f"https://resource-catalogue.{domain}/collections/metadata:main/items?type=service")
response.raise_for_status()
response.json()

We will now register a STAC Item from the Registration API JSON endpoint.

In [None]:
url='https://raw.githubusercontent.com/radiantearth/stac-spec/master/examples/core-item.json'
stac_item=requests.get(url).text
#print(stac_item)

In [None]:
response = requests.post(
    f"{registration_endpoint}/register-json",
    json=json.loads(stac_item)
)
response.raise_for_status()
response

Check if the STAC Item is actually registered:

In [None]:
time.sleep(3)
# response = requests.get(f"https://resource-catalogue.{domain}/collections/metadata:main/items/20201211_223832_CS2")
response = requests.get(f"https://resource-catalogue.{domain}/stac/collections/metadata:main/items/20201211_223832_CS2")
response.raise_for_status()
response.json()

### OGC API Records Transactions demo

Examples of OGC API transactions using OGC API - Features - Part 4: Create, Replace, Update and Delete in the context of OGC API - Records metadata management.

In this part of the demo, the user will use the OWSLib client library to create/update/delete a record using the OGC API Transactions interface.

In [None]:
import json
import requests
from owslib.ogcapi.records import Records

In [None]:
record_data = '../data/sample-record.json'
base_domain = "develop.eoepca.org"
system_catalogue_endpoint = f'https://resource-catalogue.{base_domain}'
collection_id = 'metadata:main'

In [None]:
r = Records(system_catalogue_endpoint)

In [None]:
cat = r.collection(collection_id)

In [None]:
with open(record_data) as fh:
    data = json.load(fh)

In [None]:
identifier = data['id']

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Insert metadata.

In [None]:
r.collection_item_create(collection_id, data)

# Similar approach using requests
#url = f'{system_catalogue_endpoint}/collections/metadata:main/items'
#headers = {'content-type': 'application/geo+json'}
#payload = open(record_data)
#req = requests.post(url, data=payload, headers=headers)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Update metadata.

In [None]:
data['properties']['description'] = "Update description"

In [None]:
r.collection_item_update(collection_id, identifier, data)

Delete metadata.

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Next, the user will ingest a STAC Item through the OGC API transactions.

In [None]:
stac_data = '../data/S2B_MSIL2A_20190910T095029_N0213_R079_T33UWQ_20190910T124513.json'
with open(stac_data) as sf:
    si = json.load(sf)

In [None]:
identifier = si['id']

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Delete metadata in case the identifier already exists.

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Insert metadata.

In [None]:
#Disabled due to POST issue https://github.com/geopython/pycsw/issues/809
#r.collection_item_create(collection_id, si)

url = f'{system_catalogue_endpoint}/collections/metadata:main/items'
headers = {'content-type': 'application/geo+json'}
payload = open(stac_data)
req = requests.post(url, data=payload, headers=headers)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Delete metadata.

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Alternative method to demo Transactions using curl from terminal:

In [None]:
# insert metadata
# curl -v -H "Content-Type: application/geo+json" -XPOST https://resource-catalogue.develop.eoepca.org/collections/metadata:main/items -d @sample-record.json
# update metadata
# curl -v -H "Content-Type: application/geo+json" -XPUT https://resource-catalogue.develop.eoepca.org/collections/metadata:main/items/foorecord -d @sample-record.json
# delete metadata
# curl -v -XDELETE https://resource-catalogue.develop.eoepca.org/collections/metadata:main/items/foorecord

### QGIS catalogue demo

In this part of the demo, the user will use the system level resource catalogue endpoint to discover and visualize datasets through QGIS desktop application.

![QGIS main window with OSM loaded](../images/Screenshot_QGIS_01.png)

The user starts up QGIS and loads some data, in this case the OSM base map. The MetaSearch tool is available on the toolbar

![MetaSearch main window](../images/Screenshot_QGIS_02.png)

The user can add a new catalogue endpoint by pressing the New button, then needs to add the Resource Catalogue endpoint in the URL text box

![EOEPCA Resource Catalogue in QGIS](../images/Screenshot_QGIS_03.png)

The service metadata for the Resource Catalogue are available from the Service Info button

![Service Info](../images/Screenshot_QGIS_04.png)

The Service Capabilities are also available from the 'GetCapabilities Response' button

![Service Capabilities](../images/Screenshot_QGIS_05.png)

The user moves to the 'Search' tab of MetaSearch main window to perform a catalogue search (in this case by adding a bounding box)

![Search with bbox](../images/Screenshot_QGIS_06.png)

By pressing 'Search' the Resource Catalogue is performing a dataset search. Then by selecting a result, the dataset bbox is shown on the map.

![Search for datasets](../images/Screenshot_QGIS_07.png)

The user can choose to view the selected dataset by selecting the 'Add Data' button, where MetaSearch automatically discovers available WMS, WFS, WCS links and enables the capability to load the data directly to QGIS map. In this case, the WMS and WCS endpoints are available from the Data Access links included in the catalogue record.

![Adding Data](../images/Screenshot_QGIS_08.png)

By selecting 'Add WMS/WMTS' the default QGIS WMS dialog shows up. Here, the user can browse through the available layers offered by the Data Access component and select a layer to add to the map.

![QGIS WMS dialog](../images/Screenshot_QGIS_09.png)

After selecting a layer, QGIS will add the layer to the map and user can preview the selected dataset

![Data preview](../images/Screenshot_QGIS_10.png)

By double clicking the catalogue record, the user can also preview the record metadata

![Metadata preview](../images/Screenshot_QGIS_11.png)