# Metadata API Introduction

This notebook shows basic use of our metadata API using a Python library PySTAC-Client. It demonstrates how to fetch all collections, fetch a given collection/item, and perform simple searches.

# Connecting to metadata API

We first connect to the metadata API by retrieving the root catalog

To do this, you will need to go to https://dashboard.staging.reefdata.io/ and copy your Authentication token.
This can then be pasted into the password prompt.

In [4]:
from getpass import getpass
import pystac_client

# Metadata STAC API root url
URL = 'https://stac.reefdata.io'

# Go to https://dashboard.reefdata.io/, copy your API key and paste into password box

# Create the client
api = pystac_client.Client.open(
    url="https://stac.reefdata.io/",
    headers={
        'Authorization': f"Bearer {getpass()}"
    },
    #ignore_conformance=True
)

api.title

'GBR-DMS Data Catalogue'

# Fetch all STAC collections

In [8]:
for collection in api.get_collections():
    print(collection)

<CollectionClient id=bom-reeftemp>
<CollectionClient id=jcu-tropwater-seagrass-mapping>
<CollectionClient id=jcu-nerp-effectiveness-inshore-monitoring>
<CollectionClient id=abs-spatial>
<CollectionClient id=imos-anmn-moorings>
<CollectionClient id=abs-census>
<CollectionClient id=aims-ltmp-mmp-coralreef>
<CollectionClient id=aims-temp>
<CollectionClient id=seltmp>
<CollectionClient id=coral-sea-boundary>
<CollectionClient id=imos-anmn-nrs>
<CollectionClient id=qtmr-vessels>
<CollectionClient id=des-slats>
<CollectionClient id=imos-anfog>
<CollectionClient id=des-qlump>
<CollectionClient id=imos-nrmn>
<CollectionClient id=mmp>
<CollectionClient id=aims-weather>
<CollectionClient id=des-wildnet>
<CollectionClient id=des-wq>
<CollectionClient id=amsa-vessel-tracking>
<CollectionClient id=bom-auswave>
<CollectionClient id=noaa-crw>
<CollectionClient id=ereefs>
<CollectionClient id=nasa-jpl-mursst>
<CollectionClient id=ga-gbr-hr-depth-model>
<CollectionClient id=imos-satellite-remote-sensin

# Fetch a given collection by ID

In [9]:
collection = api.get_collection('aims-temp')
collection

# Fetch all items

The function get_items return iterators, where pystac-client will handle retrieval of additional pages when needed. Note that one request is made for the first ten items, you can make a second request for the next ten.

In [10]:
items = collection.get_items()

# flush stdout so we can see the exact order that things happen
def get_ten_items(items):
    for i, item in enumerate(items):
        print(f"{i}: {item}", flush=True)
        if i == 9:
            return

print('First page', flush=True)
get_ten_items(items)

First page
0: <Item id=aims-temp-loggers>


# Fetch a given item


In [12]:
item = collection.get_item('aims-temp-loggers')
item

# Inspect an item for assets

In [13]:
# Inspect assets
item_assets = item.get_assets()
data_asset = item_assets['data']
if data_asset is not None:
    print(data_asset.to_dict())  

{'href': 's3://gbr-dms-data-public/aims-temp-loggers/data.parquet', 'type': 'application/x-parquet', 'title': 'AIMS - Sea Water Temperature Observing System', 'description': 'S3 address of the AIMS - Sea Water Temperature Observing System in GeoParquet format.\n***\n\n### Connect to the data via Python:\n```python\nimport pyarrow.dataset as ds\n\ndataset = ds.dataset("s3://gbr-dms-data-public/aims-temp-loggers/data.parquet")\nprint(dataset.schema)\n# See https://arrow.apache.org/docs/python/dataset.html for the use of pyarrow library\n```\n\n### Connect to the data via R:\n```r\nlibrary(arrow)\n\nbucket <- s3_bucket("s3://gbr-dms-data-public/aims-temp-loggers/data.parquet")\ndf <- open_dataset(bucket)\nprint(df$schema)\n# See https://arrow.apache.org/docs/r/index.html for the use of arrow library\n```\n', 'roles': ['data']}


# Inspect an item for link to data API

In [14]:
# Inspect link to data API
link = item.get_single_link(rel="describedby")
if link is not None:
    print(link.to_dict())

{'rel': 'describedby', 'href': 'https://pygeoapi.reefdata.io/collections/aims-temp-loggers', 'title': 'Link to Data API'}


# Search for items by spatial and temporal extent

In [15]:
geom = {
    "type": "Polygon",
    "coordinates": [
      [
        [
          162,
          -33
        ],
        [
          162,
          -3
        ],
        [
          136,
          -3
        ],
        [
          136,
          -33
        ],
        [
          162,
          -33
        ]
      ]
    ]
}

results = api.search(
    max_items = 15,
    limit = 5,
    intersects = geom,
    datetime = "2000-01-01/2023-03-07",
)

for item in results.items():
    print(item.id)

des-wq-gr-buoy3-env-nearrealtime
des-wq-gr-buoy3-nearrealtime
des-wq-burnett-nearrealtime
des-wq-el-buoy1-nearrealtime
des-wq-el-buoy1-env-nearrealtime
des-wq-el-buoy1-current-nearrealtime
abs-lgas-2022
amsa-ais
csiro-seltmp-baseline-surveys-jul22
abs-postal-areas-2021
abs-lgas-2021
gbrmpa-geomorphic-map
gbrmpa-benthic-map
des-qlump-gbr-nrm-2021
des-wq-dempster-nearrealtime


# Search for items using query

In [27]:
# Search for items using a query to filter by keywords
# Currently "contains" operator is not supported, so you can only "eq" operator find exact matches

results = api.search(
    max_items = 15,
    limit = 5,
    query={"themes": {"eq": [{'scheme': 'Topic Category Keywords',
                              'concepts': [{'id': 'oceans', 'title': 'Oceans'}]}]}}
)


for item in results.items():
    print(item.id)