# Spatio GeoCatalog (PREVIEW/CONFIDENTIAL) - Getting Started

Welcome! In this tutorial you will create your very own STAC collection within a Spatio GeoCatalog. The first step to creating a STAC Collection is to define and create the Collection itself. Once created, you will be able to ingest STAC Items and assets into your collection, then search for and retrieve those items via a STAC API.

#### Prerequisites

You'll need to have created a Spatio GeoCatalog instance in your Azure subscription. You'll also need an environment to execute this notebook and install the necessary packages. We suggest running this tutorial through Visual Studio Code's notebook integration in a Python virtual environment. However, this notebook should run wherever you can run Jupyter notebooks, provided the following requirements are met:

* Python 3.8 or later
* Azure CLI is installed, and you have run az login to log into your Azure account
* You've installed the necessary requirements with pip install -r requirements.txt

#### What is a STAC Collection?

A SpatioTemporal Asset Catalog (STAC) Collection is a group of related STAC Items that share common metadata and are organized in a structured way. STAC Collections are used to describe a set of spatiotemporal assets that are related to each other, such as a set of satellite images that are taken by the same satellite over a period of time. STAC Collections provide additional metadata about the data, such as the spatial and temporal extent of the data, the license, keywords, providers, etc. They also can define common properties so that items in the collection don’t have to duplicate common data for each item.

#### How is a STAC Collection Defined?

A STAC Collection is defined as a JSON file that contains both required and optional fields. The required fields include:

* **id** - A unique ID for the collection
* **title** - A descriptive title for the collection
* **description** - A short description of the type of data contained within the collection
* **extent** - The extent field describes the spatial and temporal bounds of the collection

A STAC Collection can also contain additional optional fields including:

* **keywords** - Describes the content of the collection
* **license** - List the license under which the data is made available
* **providers** - Describe the organizations that provided the data
* **summaries** - Summaries of the data in the collection


#### Log in to Azure with the Azure CLI
The following command logs you into Azure using the Azure CLI. Run the command and follow the instructions to log in.

In [None]:
!az login

# Select Tutorial Options

Before running this tutorial you will need to have contributor access to an existing GeoCatalog instance. Enter the url of your GeoCatalog instance in the cell below. For the purposes of this tutorial, you will be creating a collection for Sentinel-2 imagery provided by the European Space Agency (ESA) that is currently stored in Microsoft's Planetary Computer.

In [None]:
# URL for your given GeoCatalog
geocatalog_url = (
    "<GEOCATALOG_URL>"
)
geocatalog_url = geocatalog_url.rstrip("/")  # Remove trailing slash if present

# User Selections for Demo

# Collection within the Planetary Computer
pc_collection = "sentinel-2-l2a"

# Bounding box for AOI
bbox_aoi = [-22.455626, 63.834083, -22.395201, 63.880750]

# Date range to search for imagery
param_date_range = "2024-02-04/2024-02-11"

# Maximum number of items to ingest
param_max_items = 6

#### Create a STAC Collection

##### Import the Required Packages

Before we can create a STAC collection we need to import a few python packages. We also define helper functions to retrieve the required access token and print any error messages.

In [None]:
# Install the required python packages via pip
%pip install -r requirements.txt

In [None]:
# Import the required packages
import json
import random
import string
import time
from datetime import datetime, timedelta, timezone
from io import BytesIO
from typing import Any, Optional, Dict

import requests
from azure.identity import AzureCliCredential
from IPython.display import Markdown as md
from IPython.display import clear_output
from PIL import Image
from pystac_client import Client

# Function to get a bearer token for the Spatio API
SPATIO_APP_ID = "https://geocatalog.spatio.azure.com"

_access_token = None
def getBearerToken():
    global _access_token
    if not _access_token or datetime.fromtimestamp(_access_token.expires_on) < datetime.now() + timedelta(minutes=5):
        credential = AzureCliCredential()
        _access_token = credential.get_token(f"{SPATIO_APP_ID}/.default")

    return {"Authorization": f"Bearer {_access_token.token}"}

# Method to print error messages when checking response status
def raise_for_status(r: requests.Response) -> None:
    try:
        r.raise_for_status()
    except requests.exceptions.HTTPError as e:
        try:
            print(json.dumps(r.json(), indent=2))
        except:
            print(r.content)
        finally:
            raise

##### Define a STAC Collection JSON
Next we need to define a STAC collection as a JSON item. For the purposes of this tutorial, we will read an existing STAC collection JSON for the Sentinel-2-l2a collection within Microsoft's Planetary Computer. Note that your collection is assigned a random id and title so as not to conflict with other existing collections.

In [None]:
# Load example STAC Collection JSON

response = requests.get(
    f"https://planetarycomputer.microsoft.com/api/stac/v1/collections/{pc_collection}"
)
raise_for_status(response)
stac_collection = response.json()

collection_id = pc_collection + "-tutorial-" + str(random.randint(0, 1000))

# Genereate a unique name for the test collection
stac_collection["id"] = collection_id
stac_collection["title"] = collection_id

# Determine the storage account and container for the assets
pc_storage_account = stac_collection.pop("msft:storage_account")
pc_storage_container = stac_collection.pop("msft:container")
pc_collection_asset_container = (
    f"https://{pc_storage_account}.blob.core.windows.net/{pc_storage_container}"
)

# View your STAC Collection JSON
stac_collection

When creating a collection within GeoCatalog a collection JSON can't have any assets associated with the collection, so we will remove those existing assets below (don't worry we will show you how to add these back later). However, before we remove the assets we will save the thumbnail url for this collection for use later on.

In [None]:
# Save the thumbnail url
thumbnail_url = stac_collection['assets']['thumbnail']['href']

# Remove the assets field from the JSON (you will see how to add this back later)
print("Removed the following items from the STAC Collection JSON:")
stac_collection.pop('assets')

In [None]:
# Create a STAC Collection by posting to the STAC collections API

collections_endpoint = f"{geocatalog_url}/api/collections"

response = requests.post(
    collections_endpoint,
    json=stac_collection,
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview"}
)

if response.status_code==201:
    print("STAC Collection created named:",stac_collection['title'])
else:
    raise_for_status(response)

If you open your GeoCatalog instance in a web browser you should see your new collection.

##### Access Collection Thumbnail

Next we want to add a thumbnail to our collection to be displayed along with our collection. For the purposes of this demo we will use the thumbnail from the existing Sentinel-2 collection within Microsoft's Planetary Computer.

In [None]:
# Read thumbnail for your collection

thumbnail_response = requests.get(thumbnail_url)
raise_for_status(thumbnail_response)
img = Image.open(BytesIO(thumbnail_response.content))
img

##### Add Thumbnail to your Spatio GeoCatalog

After reading the thumbnail, we can add it to our collection in Spatio by posting it to Spatio's collection assets API endpoint along with the required asset json.

In [None]:
# Define the Spatio Collections API endpoint
collection_assets_endpoint = f"{geocatalog_url}/api/collections/{collection_id}/assets"

# Read the example thumbnail from this collection from the Planetary Computer
thumbnail = {"file": ("lulc.png", thumbnail_response.content)}

# Define the STAC Collection asset type - thumbnail in this case
asset = {
    "data": '{"key": "thumbnail", "href":"", "type": "image/png", '
    '"roles":  ["test_asset"], "title": "test_asset"}'
}

# Post the thumbnail to the Spatio collections asset endpoint
response = requests.post(
    collection_assets_endpoint,
    data=asset,
    files=thumbnail,
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview"}
)

if response.status_code==201:
    print("STAC Collection thumbnail updated for:",stac_collection['title'])
else:
    raise_for_status(response)

#### Read new collection from within your Spatio GeoCatalog

If you refresh your browser you should be able to see the thumbnail.  You can also retrieve the collection JSON programmatically by making the following call to the collections endpoint:

In [None]:
# Request the collection JSON from Spatio
response = requests.get(
    collections_endpoint,
    json={'collection_id':stac_collection['id']},
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview"}
)

if response.status_code==200:
    print("STAC Collection successfully read:",stac_collection['title'])
else:
    raise_for_status(response)

response.json()

In [None]:
print(f"""
You successfully created a new STAC Collection in GeoCatalog named {collection_id}.
You can view your collection by visiting the GeoCatalog Explorer: {geocatalog_url}/collections
""")

# Ingest STAC Items & Assets

After creating the collection above you are ready to ingest new STAC items into your STAC collection using Spatio's Items API! In this tutorial we will accomplish this by:

1. Obtain a SAS token from Microsoft's Planetary Computer
2. Register that token as an Ingestion Source within GeoCatalog
3. Post STAC Items from that collection to GeoCatalog's Item API
4. Verify the Items were ingested successfully


In [None]:
ingestion_sources_endpoint = f"{geocatalog_url}/api/ingestion-sources"
ingestion_source_endpoint = lambda id: f"{geocatalog_url}/api/ingestion-sources/{id}"


def find_ingestion_source(container_url: str) -> Optional[Dict[str, Any]]:

    response = requests.get(
        ingestion_sources_endpoint,
        headers=getBearerToken(),
        params={"api-version": "2024-01-31-preview"},
    )

    for source in response.json():
        ingestion_source_id = source["id"]

        response = requests.get(
            ingestion_source_endpoint(ingestion_source_id),
            headers=getBearerToken(),
            params={"api-version": "2024-01-31-preview"},
        )
        raise_for_status(response)

        response = response.json()

        if response["connectionInfo"]["containerUrl"] == container_url:
            return response


def create_ingestion_source(container_url: str, sas_token: str):
    response = requests.post(
        ingestion_sources_endpoint,
        json={
            "sourceType": "SasToken",
            "connectionInfo": {
                "containerUrl": container_url,
                "sasToken": sas_token,
            },
        },
        headers=getBearerToken(),
        params={"api-version": "2024-01-31-preview"},
    )
    raise_for_status(response)


def remove_ingestion_source(ingestion_source_id: str):
    response = requests.delete(
        ingestion_source_endpoint(ingestion_source_id),
        headers=getBearerToken(),
        params={"api-version": "2024-01-31-preview"},
    )
    raise_for_status(response)

#### Query the Planetary Computer
First we need to query the Planetary Computer to search for Sentinel-2 images that match our specific requirements. In this case we are looking for Sentinel-2 imagery in the Planetary Computer that matches the following criteria:

* Collection - Imagery from the Sentinel-2-l2a collection
* Time range - Collected between February 4th and February 11th
* Area of interest - Imagery collected over southern Iceland (defined as a bounding box)

After performing this search we can see the matching STAC items are found within the Planetary Computer.

In [None]:
# Search Criteria
print("Using the below parameters to search the Planetary Computer:\n")
print("Collection:", pc_collection)
print("Bounding box for area of interest:",bbox_aoi)
print("Date range:",param_date_range)
print("Max number of items:",param_max_items)

In [None]:
# Query the Planetary Computer

# Connect to the Planetary Computer
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = catalog.search(collections=[pc_collection], bbox=bbox_aoi, datetime=param_date_range)
total_items = search.item_collection()

items = total_items[:param_max_items]
print("Total number of matching items:",len(total_items))
print("Total number of items for ingest base on user selected parameter:",len(items))

if total_items==0:
    print("No items matched your user specified parameters used at the top of this demo. Update these parameters")

In [None]:
# Print an example STAC item returned by the Planetary Computer
items[0]

### Register an Ingestion Source
Before we can ingest these STAC items and their related assets (images) into our GeoCatalog collection we will need to determine if we need to register a new ingestion source for these items. Ingestion Sources are used by GeoCatalog to track which  storage locations (Azure Blob Storage containers) it has been given access to. 

Within GeoCatalog, this is accomplished by providing GeoCatalog the location of the storage container and a SAS token with read permissions to access the container. If STAC items or their related assets are located in a storage container that GeoCatalog has not been given access to it will be unable to read those items and our ingest will fail.

Given this, first we will request a SAS token from the Planetary Computer that grants us read access to the container where the Sentinel-2 images reside.

In [None]:
# Request API Token from the Planetary Computer

pc_token = requests.get("https://planetarycomputer.microsoft.com/api/sas/v1/token/{}".format(pc_collection)).json()
print(f"Planetary Computer API Token will expire {pc_token['msft:expiry']}")

Next we will attempt to register this Azure Blob Storage container and associated SAS token as an ingestion source with GeoCatalog. There is the potential that an ingestion source already exists for this storage container. If this is the case, in the below code we find the id of the existing ingestion source.

**Warning** - If a duplicate ingestion source is found with a token that expires in the next 15 minutes, it will be deleted and replaced. Note that deleting an ingestion source that is in use by currently running ingestions may break those ingestions.

In [None]:
existing_ingestion_source: Optional[Dict[str, Any]] = find_ingestion_source(pc_collection_asset_container)

if existing_ingestion_source:
    connection_info = existing_ingestion_source["connectionInfo"]
    expiration = datetime.fromisoformat(connection_info["expiration"].split('.')[0]) # works in all Python 3.X versions
    expiration = expiration.replace(tzinfo=timezone.utc) # set timezone to UTC
    if expiration < datetime.now(tz=timezone.utc) + timedelta(minutes=15):
        print(f"Recreating existing ingestion source for {pc_collection_asset_container}")
        remove_ingestion_source(existing_ingestion_source["id"])
        create_ingestion_source(pc_collection_asset_container, pc_token["token"])
    else:
        print(f"Using existing ingestion source for {pc_collection_asset_container} with expiration {expiration}")
else:
    print(f"Creating ingestion source for {pc_collection_asset_container}")
    create_ingestion_source(pc_collection_asset_container, pc_token["token"])


### Ingest STAC Items using GeoCatalog's Items API
Now that we have registered an ingestion source or validated that a source exists we will ingest the STAC items we found within the Planetary Computer using GeoCatalog's Items API. We accomplish this by posting each item to the Items API which creates a new ingestion operation within GeoCatalog.

In [None]:
# Ingest Items

items_endpoint = f"{geocatalog_url}/api/collections/{collection_id}/items"

for item in items:

    item_json = item.to_dict()
    item_json['collection'] = collection_id

    response = requests.post(
        items_endpoint,
        json=item_json,
        headers=getBearerToken(),
        params={"api-version": "2024-01-31-preview"}
    )


Given that Sentinel-2 item ingestion can take some time, you can run the below code to check the status of your ingestion operations using GeoCatalog's operations API.

In [None]:
# Check the status of the operations
operations_endpoint = f"{geocatalog_url}/api/collections/{collection_id}/operations"

pending=True

start = time.time()

while pending==True:

    response = requests.get(
            operations_endpoint,
            headers=getBearerToken(),
            params={"api-version": "2024-01-31-preview"}
    )

    raise_for_status(response)


    finished_count = len([i for i in response.json() if i['status']=="Finished"])
    running_count = len([i for i in response.json() if i['status']=="Running"])
    pending_count = len([i for i in response.json() if i['status']=="Pending"])
    failed_count = len([i for i in response.json() if i['status']=="Failed"])

    stop=time.time()
    clear_output(wait=True)

    print("Ingesting Imagery:")
    print("Finished Items:", finished_count)
    print("Running Items:", running_count)
    print("Pending Items:", pending_count)
    print("Failed Items:", failed_count)
    print("Time Elapsed (seconds):",str(stop-start))

    if pending_count==0 and running_count==0:
        pending = False

        print("Ingestion Complete! {} items ingested".format(finished_count))
        break

    time.sleep(10)

You should be able to refresh your web browser and click on the Items tab to see these newly uploaded items.

# Collection Management

Now that we have ingested our STAC items and their associated assets (images) into our STAC collection, we will provide GeoCatalog with some additional configuration files so we will be able to visualize the items within our collection using GeoCatalog's Explorer.

### Collection Render Config
First we will download a render configuration file for our collection from the Planetary Computer. This config file can be read by GeoCatalog to render images in different ways within the Explorer. This is due to the fact that STAC items may contain many different assets (images) that can be combined to create entirely new images of a given area that highlight visible or non-visible phenomena. For instance, Sentinel-2 STAC items have over 12 different images from different portions of the electromagnetic spectrum. The below render config instructs GeoCatalog on how to combine these images so it can display images in Natural Color or False Color (Color Infrared).

In [None]:
# Read Render JSON from Planetary Computer

render_json = requests.get("https://planetarycomputer.microsoft.com/api/data/v1/mosaic/info?collection={}".format(pc_collection)).json()
render_json['renderOptions']

After reading this render options config from the Planetary Computer we can enable these render options for our collection by posting this config to the render-options endpoint.

In [None]:
# Post Render Options Config to GeoCatalog render-options API

render_config_endpoint = f"{geocatalog_url}/api/collections/{collection_id}/config/render-options"

for render_option in render_json['renderOptions']:

    # Rename render configs such that they can be stored by GeoCatalog
    render_option['id'] = render_option['name'].translate(str.maketrans('', '', string.punctuation)).lower().replace(" ","-")[:30]

    # Post Render Definition
    response = requests.post(
        render_config_endpoint,
        json=render_option,
        headers=getBearerToken(),
        params={"api-version": "2024-01-31-preview"}
    )

### Mosaic Definitions

Similar to the Render Config discussed above, GeoCatalog's Explorer allows us to specify one or more mosaic definitions for our collection. These mosaic definitions enable us to instruct GeoCatalog's Explorer on how to filter which items are displayed within the Explorer. For example, one basic render configuration (shown below) instructs GeoCatalog to simply display the most recent image for any given area. More advanced render configurations allow us to render different views such as the least cloudy image for a given location captured in October 2023.

In [None]:
# Post Mosaic Definition

mosiacs_config_endpoint = f"{geocatalog_url}/api/collections/{collection_id}/config/mosaics"

response = requests.post(
    mosiacs_config_endpoint,
    json={"id": "mos1",
          "name": "Most recent available",
          "description": "Most recent available imagery in this collection",
          "cql": []
    },
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview"}
)

### Open GeoCatalog Explorer

Congrats! You have now created a collection, added STAC items and assets, and updated your collection to include the required configuration files so it can be viewed through the GeoCatalog Explorer.

**Navigate back to the GeoCatalog Explorer to view your collection!**

# Query Collection via STAC API

Now that you have viewed your collection in the GeoCatalog Explorer we will walk through how to use GeoCatalog's STAC APIs to search for and retrieve STAC items and assets for further analysis.

This process starts by posting a search to your GeoCatalog's STAC API. Specifically, you will search for imagery within your collection that falls within the original bounding box you used to extract imagery from the Planetary Computer.

Unsurprisingly this query returns all the STAC items you previously placed within your collection.

In [None]:
stac_search_endpoint = f"{geocatalog_url}/api/search"

response = requests.post(
    stac_search_endpoint,
    json={"collections":[collection_id],
          "bbox":bbox_aoi
    },
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview", "sign": "true"}
)

matching_items = response.json()['features']
print(len(matching_items))

In your prior query, you also provided an additional parameter: **sign:true**. This instructs GeoCatalog to return a signed href (item href + SAS token) which allows you to read the given assets from Azure Blob Storage as shown below.

In [None]:
asset_href = matching_items[0]['assets']['rendered_preview']['href']
print(asset_href)

response = requests.get(asset_href)
img = Image.open(BytesIO(response.content))
img

### Delete Items

At this point you have created a GeoCatalog Collection, added items and assets to the collection, and retrieved those items and assets using GeoCatalog's STAC API. For the final phase of this tutorial you are going to remove these items and delete your collection.

In [None]:
# Delete all items

for item in matching_items:
    response = requests.delete(
        f"{items_endpoint}/{item['id']}",
        headers=getBearerToken(),
        params={"api-version": "2024-01-31-preview"}
    )

You can confirm all of your items were deleted by running the command below. Note it may take a minute or two to fully delete items and their associated assets.

In [None]:
response = requests.post(
    stac_search_endpoint,
    json={"collections":[stac_collection['id']],
          "bbox": bbox_aoi
    },
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview", "sign": "true"}
)

matching_items = response.json()['features']
print(len(matching_items))

### Delete Collection

Now as a final step, you may want to fully delete your collection from your GeoCatalog instance. This can be accomplished by running the code below.

In [None]:
response = requests.delete(
    f"{collections_endpoint}/{collection_id}",
    headers=getBearerToken(),
    params={"api-version": "2024-01-31-preview"}
)

raise_for_status(response)
print(f"STAC Collection deleted: {collection_id}")

# Tutorial Complete

Congrats! You just created your first collection with GeoCatalog. We hope you found this tutorial helpful! If you have any questions about this tutorial or GeoCatalog please contact spatio-support-team@microsoft.com