## Resource Catalogue demo

[OWSLib](https://geopython.github.io/OWSLib) is a Python package for client programming with Open Geospatial Consortium (OGC) web service (hence OWS) interface standards, and their related content models. In this demo we’ll work with the CSW, WMS and WCS interfaces.

In [None]:
from owslib.csw import CatalogueServiceWeb
from owslib.ogcapi.records import Records
from owslib.opensearch import OpenSearch
from owslib.fes import And, Or, PropertyIsEqualTo, PropertyIsGreaterThanOrEqualTo, PropertyIsLessThanOrEqualTo, PropertyIsLike, BBox, SortBy, SortProperty
from geolinks import sniff_link
import folium
import json
import requests

In [None]:
base_domain = "dev-1.hsc.eofarm.com"

### System Catalogue Discovery

In this part of the demo, the user will use the system level resource catalogue endpoint to discover data collections and datasets.
The `owslib.csw` class of OWSLib is instantiated and service metadata are shown.

In [None]:
system_catalogue_endpoint = f'https://catalogue.{base_domain}/csw'

In [None]:
csw = CatalogueServiceWeb(system_catalogue_endpoint, timeout=30)

Service metadata shown here includes identification type (from ISO-19115), CSW version and supported operations

In [None]:
csw.identification.type

In [None]:
csw.version

In [None]:
[op.name for op in csw.operations]

As well as catalogue queryables:

In [None]:
csw.get_operation_by_name('GetRecords').constraints

The user can make a GetRecords request to get all records of the catalogue, with a page limit of 10.

In [None]:
csw.getrecords2(maxrecords=10)
csw.results

In [None]:
for rec in csw.records:
   print(f'identifier: {csw.records[rec].identifier}\ntype: {csw.records[rec].type}\ntitle: {csw.records[rec].title}\n')

If the user wishes to discover data with usage of filters, an OGC Filter can be used. Here we demonstrate how to create spatial (`bbox`), temporal (`time`), and attribute (`apiso:CloudCover`) filters combined with logical operators like and/or

In [None]:
bbox_query = BBox([37.8, 23.3, 38.8, 24.5])
#bbox_query = BBox([39.66, 19.82, 40.64, 21.11])
# bbox_query = BBox([37, 13.9, 37.9, 15.1])
# bbox_query = BBox([47.7, 14.9, 48.7, 16.4])

In [None]:
begin = PropertyIsGreaterThanOrEqualTo(propertyname='apiso:TempExtent_begin', literal='2024-11-20 00:00')

In [None]:
end = PropertyIsLessThanOrEqualTo(propertyname='apiso:TempExtent_end', literal='2024-11-21 00:00')

In [None]:
cloud = PropertyIsLessThanOrEqualTo(propertyname='apiso:CloudCover', literal='20')

In [None]:
filter_list = [
    And(
        [
            bbox_query,  # bounding box
            begin, end,  # start and end date
            cloud        # cloud
        ]
    )
]

The filter is then applied to the GetRecords request and results are shown:

In [None]:
csw.getrecords2(constraints=filter_list, outputschema='http://www.isotc211.org/2005/gmd')
csw.results

In [None]:
selected_record = list(csw.records)[0]

In [None]:
for rec in csw.records:
    print(f'identifier: {csw.records[rec].identifier}\ntype: {csw.records[rec].identification[0].identtype}\ntitle: {csw.records[rec].identification[0].title}\n')

Another option is to perform a collection level search, using the `apiso:parentIdentifier` queryable. Here only the Sentinel2 L1C datasets will be discovered.

In [None]:
collection_query = PropertyIsEqualTo('apiso:ParentIdentifier', 'sentinel-2-l2a')

In [None]:
csw.getrecords2(constraints=[collection_query], outputschema='http://www.isotc211.org/2005/gmd')
csw.results

Or we can just search using the bbox filter

In [None]:
csw.getrecords2(constraints=[bbox_query], outputschema='http://www.isotc211.org/2005/gmd')
csw.results

In [None]:
for rec in csw.records:
   print(f'identifier: {csw.records[rec].identifier}\ntype: {csw.records[rec].identification[0].identtype}\ntitle: {csw.records[rec].identification[0].title}\n')

Or we can perform a full text search (here the keyword Orthoimagery is used)

In [None]:
# anytext_query = PropertyIsEqualTo('csw:AnyText', 'Orthoimagery')
anytext_query = PropertyIsEqualTo('csw:AnyText', 'msi')

In [None]:
filter_list = [
    And(
        [
            bbox_query,  # bounding box
            anytext_query # any text
        ]
    )
]

In [None]:
csw.getrecords2(constraints=filter_list)
csw.results

We can also iterate through the catalogue search results by passing the `startposition` and `maxrecords` parameters to GetRecords request:

In [None]:
# csw_records = {}
# sortby = SortBy([SortProperty('dc:title', 'ASC')])
# pagesize=10
# maxrecords=1000
# startposition = 0
# nextrecord = getattr(csw, 'results', 1)
# while nextrecord != 0:
#    csw.getrecords2(constraints=[anytext_query], startposition=startposition,
#                    maxrecords=pagesize, sortby=sortby)
#    csw_records.update(csw.records)
#    if csw.results['nextrecord'] == 0:
#        break
#    startposition += pagesize
#    if startposition >= maxrecords:
#        break
# csw.records.update(csw_records)
# records = '\n'.join(csw.records.keys())
# print('Found {} records.\n'.format(len(csw.records.keys())))
# for key, value in list(csw.records.items()):
#    print(f'identifier: {value.identifier}\ntype: {value.type}\ntitle: {value.title}\n')

The user then selects a record identifier and asks the catalogue to fetch the full record. Here we demonstrate how to obtain properties like title, bbox, full xml and links from the metadata record.

In [None]:
#csw.getrecordbyid(id=['S2B_MSIL2A_20200902T090559_N0214_R050_T34SGH_20200902T113910.SAFE'])
csw.getrecordbyid(id=[selected_record])

In [None]:
#rec = csw.records['S2B_MSIL2A_20200902T090559_N0214_R050_T34SGH_20200902T113910.SAFE']
rec = csw.records[selected_record]

In [None]:
selected_record

In [None]:
rec.title

In [None]:
rec.xml

In [None]:
rec.references

In [None]:
print("dataset bbox = (%s, %s, %s, %s)" % (rec.bbox.miny, rec.bbox.minx, rec.bbox.maxy, rec.bbox.maxx))

Using the [geolinks](https://github.com/geopython/geolinks) Python library we can filter the links that are of a specific type (here WMS and WCS links to be used for visualization)

In [None]:
msg = 'geolink: {geolink}\nscheme: {scheme}\nURL: {url}\n'.format
for ref in rec.references:
    print(msg(geolink=sniff_link(ref['url']), **ref))

In [None]:
for ref in rec.references:
    url = ref['url']
    if 'WMS' in url:
        print(msg(geolink=sniff_link(url), **ref))
        break

In [None]:
for ref in rec.references:
    url = ref['url']
    if 'WCS' in url:
        print(msg(geolink=sniff_link(url), **ref))
        break

Finally we demonstrate how to show the record footprint on a map, using the [Folium](https://github.com/python-visualization/folium) Python library:

In [None]:
centre_x = float(rec.bbox.minx) + ((float(rec.bbox.maxx) - float(rec.bbox.minx))/2)
centre_y = float(rec.bbox.miny) + ((float(rec.bbox.maxy) - float(rec.bbox.miny))/2)
m = folium.Map(location=[centre_y, centre_x], zoom_start=10, tiles='OpenStreetMap')
folium.Rectangle(bounds=[[float(rec.bbox.miny), float(rec.bbox.minx)], [float(rec.bbox.maxy), float(rec.bbox.maxx)]]).add_to(m)
m

### OpenSearch

In this part of the demo, the user will use the OpenSearch capability of the system resource catalogue to discover datasets.

In [None]:
endpoint = system_catalogue_endpoint + '?service=CSW&version=3.0.0&request=GetCapabilities&mode=opensearch'

Here, the OWSLib OpenSearch client is used.

In [None]:
os = OpenSearch(endpoint)

We use the description object to retrieve service metadata.

In [None]:
os.description.shortname

In [None]:
os.description.longname

In [None]:
os.description.description

In [None]:
os.description.urls

In [None]:
os.description.tags

Then we perform a search using atom encoding.

In [None]:
results = os.search('application/atom+xml')
len(results)

Another posibility is to use the HTTP GET requests directly with a generic client like requests.

In [None]:
import requests
from bs4 import BeautifulSoup

In [None]:
S = requests.Session()

Here, the user asks for the OpenSearch entrypoint template

In [None]:
R = S.get(url=endpoint)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

A GetRecords request (in the context of OpenSearch) is performed

In [None]:
url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

A collections search is also demonstrated, for Sentinel2 Level 2A results

In [None]:
url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&eo:parentIdentifier=sentinel-2-l2a'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can also use OpenSearch EO mathematical notation to filter based on other parameters

In [None]:
# OpenSearch EO mathematical notation
# n1 equal to field = n1
# {n1,n2,…} equals to field=n1 OR field=n2 OR …
# [n1,n2] equal to n1 <= field <= n2
# [n1,n2[ equals to n1 <= field < n2
# ]n1,n2[ equals to n1 < field < n2
# ]n1,n2] equal to n1 < field  <= n2
# [n1 equals to n1<= field
# ]n1 equals to n1 < field
# n2] equals to field <= n2
# n2[ equals to field < n2

url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&eo:cloudCover=]20'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can get one record using the identifier through the OpenSearch EO API:

In [None]:
url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&recordids=S2B_MSIL2A_20241120T091209_R050_T34SGH_20241120T115657'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can search using a bbox parameter:

In [None]:
url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&bbox=23.3,37.8,24.5,38.8'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

The user can search using a time parameter:

In [None]:
url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=2024-11-20/2024-11-21'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

Or separate time start/stop parameters:

In [None]:
url = system_catalogue_endpoint + '?mode=opensearch&service=CSW&version=3.0.0&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&start=2024-11-20&stop=2024-11-21'
R = S.get(url=url)
bs = BeautifulSoup(R.text, 'xml')
print(bs.prettify())

### OGC API Records demo

In this part of the demo, the user will use the system level resource catalogue OGC API Records endpoint to discover data datasets.
The `owslib.ogcapi.records` class of OWSLib is instantiated and service metadata are shown.

In [None]:
system_catalogue_endpoint = f'https://catalogue.{base_domain}'

In [None]:
w = Records(system_catalogue_endpoint)

In [None]:
w.url

Conformance classes supported by the OGC API Records server:

In [None]:
w.conformance()

OpenAPI document of the OGC API Records server:

In [None]:
w.api()

Collections available on the catalogue:

In [None]:
w.collections()

In [None]:
records = w.records()

In [None]:
len(records)

The user can then specify the collection to search within the catalogue:

In [None]:
my_catalogue = w.collection('metadata:main')

In [None]:
my_catalogue['id']

Collection level queryables:

In [None]:
w.collection_queryables('metadata:main')

Query the catalogue for all records:

In [None]:
my_catalogue_query = w.collection_items('metadata:main')

In [None]:
my_catalogue_query['numberMatched']

Metadata of first result:

In [None]:
my_catalogue_query['features'][0]

In [None]:
my_catalogue_query['features'][1]['properties'].keys()

In [None]:
my_catalogue_query['features'][1]['properties']['platform']

Query the catalogue using filters:

In [None]:
my_catalogue_query2 = w.collection_items('metadata:main', bbox=['23.3','37.8','24.5','38.8'])

In [None]:
my_catalogue_query2['numberMatched']

Full query result:

In [None]:
my_catalogue_query2

Text CQL query:

In [None]:
my_catalogue_cql_text_query = w.collection_items('metadata:main', filter="identifier LIKE 'S2B_%'")

In [None]:
my_catalogue_cql_text_query['numberMatched']

In [None]:
my_catalogue_cql_text_query['features'][0]['id']

JSON CQL query:

In [None]:
my_catalogue_cql_json_query = w.collection_items('metadata:main', limit=1, cql={'op': '=', 'args': [{ 'property': 'identifier' }, 'S2B_MSIL2A_20241120T091209_R050_T34SGH_20241120T115657']})

In [None]:
my_catalogue_cql_json_query['features'][0]['id']

### OGC API Records Transactions demo

Examples of OGC API transactions using OGC API - Features - Part 4: Create, Replace, Update and Delete in the context of OGC API - Records metadata management.

In this part of the demo, the user will use the OWSLib client library to create/update/delete a record using the OGC API Transactions interface.

In [None]:
record_data = '../data/sample-record.json'
# record_data = '../data/record.json'
system_catalogue_endpoint = f'https://catalogue.{base_domain}'
collection_id = 'metadata:main'

In [None]:
r = Records(system_catalogue_endpoint)

In [None]:
cat = r.collection(collection_id)

In [None]:
with open(record_data) as fh:
    data = json.load(fh)

In [None]:
identifier = data['id']

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Insert metadata.

In [None]:
r.collection_item_create(collection_id, data)

# Similar approach using requests
#url = f'{system_catalogue_endpoint}/collections/metadata:main/items'
#headers = {'content-type': 'application/geo+json'}
#payload = open(record_data)
#req = requests.post(url, data=payload, headers=headers)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Update metadata.

In [None]:
data['properties']['description'] = "Update description"

In [None]:
r.collection_item_update(collection_id, identifier, data)

Delete metadata.

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Next, the user will ingest a STAC Item through the OGC API transactions.

In [None]:
stac_data = '../data/S2B_MSIL2A_20241120T091209_R050_T34SGH_20241120T115657.json'
with open(stac_data) as sf:
    si = json.load(sf)

In [None]:
identifier = si['id']

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Delete metadata in case the identifier already exists.

In [None]:
r.collection_item_delete(collection_id, identifier)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

Insert metadata.

In [None]:
r.collection_item_create(collection_id, si)

In [None]:
my_catalogue_query = r.collection_items(collection_id)
my_catalogue_query['numberMatched']

### STAC API demo

In this part of the demo, the user will use the system level resource catalogue STAC API endpoint to discover data datasets.
The `pystac_client` library is used as a STAC client.

In [None]:
from pystac_client import Client
from pystac_client import ConformanceClasses

In [None]:
system_catalogue_endpoint = f'https://catalogue.{base_domain}/stac'

In [None]:
catalog = Client.open(system_catalogue_endpoint)

STAC catalogue metadata:

In [None]:
catalog.id

In [None]:
catalog.title

In [None]:
catalog.description

Conformance classes supported by the STAC API endpoint:

In [None]:
dir(ConformanceClasses)

Validation of STAC API using `pystac_client`:

In [None]:
catalog._stac_io.assert_conforms_to(ConformanceClasses.ITEM_SEARCH)

In [None]:
catalog._stac_io.assert_conforms_to(ConformanceClasses.CORE)

Query the catalogue for all STAC items:

In [None]:
mysearch = catalog.search(collections=['sentinel-2-l2a'], max_items=10)
print(f"{mysearch.matched()} items found")

Query the STAC API using filters:

In [None]:
# mysearch = catalog.search(collections=['sentinel-2-l2a'], bbox=[23.3,37.8,24.5,38.8], max_items=10)
# print(f"{mysearch.matched()} items found")

Iterate through the query results:

In [None]:
items = mysearch.get_items()
for item in items:
    print(item.id)

Show first STAC item JSON:

In [None]:
# print(json.dumps(item.to_dict(), indent=2))

In [None]:
first_item = next(mysearch.get_items())
first_item