# Tutorial: Using the Microsoft Planetary Computer Pro SDK to ingest and visualize data

STAC (SpatioTemporal Asset Catalog) Collections are used within a GeoCatalog to index and store related spatiotemporal assets. In this end-to-end tutorial, you'll use the **[azure-planetarycomputer SDK](https://learn.microsoft.com/python/api/overview/azure/planetarycomputer-readme)** to create a new STAC collection, ingest Sentinel-2 images into the collection, and query those images via GeoCatalog's APIs.

In this tutorial, you:
* Will create your very own STAC collection within a Planetary Computer Pro GeoCatalog using the Python SDK
* Ingest satellite imagery into that collection from the European Space Agency
* Configure the collection so the imagery in the collection can be visualized in the Planetary Computer Pro's web interface
* Query data from within the STAC collection using the SDK's STAC API client

## Prerequisites

Before running this tutorial, you need a Planetary Computer Pro GeoCatalog deployed in your Azure subscription. You also need an environment to execute this notebook and install the necessary packages. We suggest running this tutorial through an Azure Machine Learning Virtual Machine or Visual Studio Code's notebook integration in a Python virtual environment. However, this notebook should run wherever you can run Jupyter notebooks, provided the following requirements are met:

* Python 3.10 or later
* Azure CLI is installed, and you have run az login to log into your Azure account
* The [azure-planetarycomputer](https://pypi.org/project/azure-planetarycomputer/) SDK package and other necessary requirements listed in the Tutorial Options section are installed

## Open a Jupyter notebook in Azure Machine Learning or VS Code

### Log in to Azure with the Azure CLI
The following command logs you into Azure using the Azure CLI. Run the command and follow the instructions to log in.

In [None]:
!az login

## Select tutorial options

Before running this tutorial, you need contributor access to an existing GeoCatalog instance. Enter the url of your GeoCatalog instance in the geocatalog_url variable. This URL will be used to initialize the PlanetaryComputerProClient from the azure-planetarycomputer SDK.

In this tutorial, you'll create a collection for Sentinel-2 imagery provided by the European Space Agency (ESA) that is currently stored in Microsoft's Planetary Computer Data Catalog.

In [None]:
# URL for your given GeoCatalog
geocatalog_url = (
    "<GEOCATALOG_URL>"
)
geocatalog_url = geocatalog_url.rstrip("/")  # Remove trailing slash if present

# User selections for demo

# Collection within the Planetary Computer
pc_collection = "sentinel-2-l2a"

# Bounding box for AOI
bbox_aoi = [-22.455626, 63.834083, -22.395201, 63.880750]

# Date range to search for imagery
param_date_range = "2024-02-04/2024-02-11"

# Maximum number of items to ingest
param_max_items = 6

### Import the required packages

Before you can create a STAC collection you need to import a few python packages including the **[azure-planetarycomputer SDK](https://learn.microsoft.com/python/api/overview/azure/planetarycomputer-readme)**. The SDK simplifies interaction with the Planetary Computer Pro APIs by providing strongly-typed Python classes and methods.

In [None]:
!pip install pystac-client azure-identity azure-planetarycomputer requests pillow

In [None]:
# Import the required packages
import json
import random
import string
import time
from datetime import datetime, timedelta, timezone
from io import BytesIO
from typing import Any, Optional, Dict

import requests
from azure.core.exceptions import ResourceNotFoundError, HttpResponseError
from azure.identity import DefaultAzureCredential
from azure.planetarycomputer import PlanetaryComputerProClient
from azure.planetarycomputer.models import (
    IngestionSourceType,
    SharedAccessSignatureTokenConnection,
    SharedAccessSignatureTokenIngestionSource,
    StacMosaic,
    StacSearchParameters,
    StacAssetUrlSigningMode,
)
from IPython.display import Markdown as md
from IPython.display import clear_output
from PIL import Image
from pystac_client import Client

# Initialize the Planetary Computer Pro client
credential = DefaultAzureCredential()
pc_client = PlanetaryComputerProClient(endpoint=geocatalog_url, credential=credential)

# Method to print error messages when checking response status
def raise_for_status(r: requests.Response) -> None:
    try:
        r.raise_for_status()
    except requests.exceptions.HTTPError as e:
        try:
            print(json.dumps(r.json(), indent=2))
        except:
            print(r.content)
        finally:
            raise

## Create a STAC collection

### Define a STAC Collection JSON
Next, you define a STAC collection as a JSON item. For this tutorial, use an existing STAC collection JSON for the Sentinel-2-l2a collection within Microsoft's Planetary Computer. Your collection is assigned a random ID and title so as not to conflict with other existing collections. The SDK will handle the API calls to create this collection in your GeoCatalog.

In [None]:
# Load example STAC collection JSON

response = requests.get(
    f"https://planetarycomputer.microsoft.com/api/stac/v1/collections/{pc_collection}"
)
raise_for_status(response)
stac_collection = response.json()

collection_id = pc_collection + "-tutorial-" + str(random.randint(0, 1000))

# Genereate a unique name for the test collection
stac_collection["id"] = collection_id
stac_collection["title"] = collection_id

# Determine the storage account and container for the assets
pc_storage_account = stac_collection.pop("msft:storage_account")
pc_storage_container = stac_collection.pop("msft:container")
pc_collection_asset_container = (
    f"https://{pc_storage_account}.blob.core.windows.net/{pc_storage_container}"
)

# View your STAC collection JSON
stac_collection

When creating a collection within GeoCatalog a collection JSON can't have any collection level assets (such as a collection thumbnail) associated with the collection, so first remove those existing assets (don't worry you add the thumbnail back later).


In [None]:
# Save the thumbnail url
thumbnail_url = stac_collection['assets']['thumbnail']['href']

# Remove the assets field from the JSON (you'll see how to add this back later)
print("Removed the following items from the STAC Collection JSON:")
stac_collection.pop('assets')

In [None]:
# Create a STAC collection using the SDK
collection_create_operation = pc_client.stac.begin_create_collection(body=stac_collection, polling=True)
collection_create_operation.result()
print("STAC Collection created named:", stac_collection["title"])

Open your GeoCatalog web interface and you should see your new collection listed under the "Collections" tab.

### Access collection thumbnail

Next you want to add a thumbnail to our collection to be displayed along with our collection. For the purposes of this demo, use the thumbnail from the existing Sentinel-2 collection within Microsoft's Planetary Computer.


In [None]:
# Read thumbnail for your collection

thumbnail_response = requests.get(thumbnail_url)
raise_for_status(thumbnail_response)
img = Image.open(BytesIO(thumbnail_response.content))
img

### Add thumbnail to your Planetary Computer Pro GeoCatalog

After reading the thumbnail, you can add it to our collection using the SDK's `create_collection_asset` method, which handles the upload automatically.

In [None]:
# Add thumbnail to collection using the SDK

# Define thumbnail asset metadata
asset_data = {
    "key": "thumbnail",
    "href": "",
    "type": "image/png",
    "roles": ["thumbnail"],
    "title": "Thumbnail"
}

# Prepare thumbnail file for upload
thumbnail_file = ("thumbnail.png", thumbnail_response.content)

# Post the thumbnail to the GeoCatalog collections asset endpoint using SDK
pc_client.stac.create_collection_asset(
    collection_id=collection_id,
    body={"data": asset_data, "file": thumbnail_file}
)

print("STAC Collection thumbnail updated for:", stac_collection['title'])


### Read new collection from within your Planetary Computer Pro GeoCatalog

Refresh your browser and you should be able to see the thumbnail. You can also retrieve the collection JSON programmatically using the SDK's `get_collection` method:

In [None]:
# Request the collection JSON from your GeoCatalog using the SDK

collection = pc_client.stac.get_collection(collection_id=collection_id)
print("STAC Collection successfully read:", collection.title)
collection.as_dict()

In [None]:
print(f"""
You successfully created a new STAC Collection in GeoCatalog named {collection_id}.
You can view your collection by visiting the GeoCatalog Explorer: {geocatalog_url}/collections
""")

## Ingest STAC items & assets

After creating this collection, you're ready to ingest new STAC items into your STAC collection using the SDK's ingestion management methods! Accomplish this process by:

1. Obtaining a SAS token from Microsoft's Planetary Computer
2. Register that token as an Ingestion Source within GeoCatalog using the SDK
3. Ingest STAC Items from that collection using the SDK's item creation methods
4. Verify the Items were ingested successfully

### Query the Planetary Computer
First you need to query the Planetary Computer to search for Sentinel-2 images that match our specific requirements. In this case, you're looking for Sentinel-2 imagery in the Planetary Computer that matches the following criteria:

* Collection - Imagery from the Sentinel-2-l2a collection
* Time range - Collected between February 4 and February 11 2024
* Area of interest - Imagery collected over southern Iceland (defined as a bounding box)

By performing this search, you can see the matching STAC items are found within the Planetary Computer.


In [None]:
# Search criteria
print("Using the below parameters to search the Planetary Computer:\n")
print("Collection:", pc_collection)
print("Bounding box for area of interest:",bbox_aoi)
print("Date range:",param_date_range)
print("Max number of items:",param_max_items)

In [None]:
# Query the Planetary Computer

# Connect to the Planetary Computer
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = catalog.search(collections=[pc_collection], bbox=bbox_aoi, datetime=param_date_range)
total_items = search.item_collection()

items = total_items[:param_max_items]
print("Total number of matching items:",len(total_items))
print("Total number of items for ingest base on user selected parameter:",len(items))

if total_items==0:
    print("No items matched your user specified parameters used at the top of this demo. Update these parameters")

In [None]:
# Print an example STAC item returned by the Planetary Computer
items[0]

### Register an ingestion source
Before you can ingest these STAC items and their related assets (images) into a GeoCatalog collection you need to determine if you need to register a new ingestion source. Ingestion Sources are used by GeoCatalog to track which storage locations (Azure Blob Storage containers) it has access to. 

Registering an ingestion source is accomplished using the SDK by providing the location of the storage container and a SAS token with read permissions to access the container. If STAC items or their related assets are located in a storage container your GeoCatalog hasn't been given access to, the ingest will fail.

To start this process, you first request a SAS token from the Planetary Computer that grants us read access to the container where the Sentinel-2 images reside.


In [None]:
# Request API token from the Planetary Computer

pc_token = requests.get("https://planetarycomputer.microsoft.com/api/sas/v1/token/{}".format(pc_collection)).json()
print(f"Planetary Computer API Token will expire {pc_token['msft:expiry']}")

Next attempt to register this Azure Blob Storage container and associated SAS token as an ingestion source with GeoCatalog. There's the potential that an ingestion source already exists for this storage container. If so, find the ID of the existing ingestion source.

**Warning!!!**
If a duplicate ingestion source is found with a token that expires in the next 15 minutes, it's deleted and replaced. Deleting an ingestion source that is in use by currently running ingestions may break those ingestions.


In [None]:
# Find existing ingestion source for this container
existing_source = None
for source_summary in pc_client.ingestion.list_sources():
    if source_summary.kind == IngestionSourceType.SHARED_ACCESS_SIGNATURE_TOKEN:
        source = pc_client.ingestion.get_source(source_summary.id)
        if source.connection_info.container_uri == pc_collection_asset_container:
            existing_source = source
            break

# Check if we need to create or recreate the source
should_create = True
if existing_source:
    expiration = existing_source.connection_info.expiration.replace(tzinfo=timezone.utc)
    if expiration < datetime.now(tz=timezone.utc) + timedelta(minutes=15):
        print(f"Deleting expired ingestion source for {pc_collection_asset_container}")
        pc_client.ingestion.delete_source(id=existing_source.id)
    else:
        print(f"Using existing ingestion source with expiration {expiration}")
        should_create = False

if should_create:
    print(f"Creating ingestion source for {pc_collection_asset_container}")
    pc_client.ingestion.create_source(
        body=SharedAccessSignatureTokenIngestionSource(
            connection_info=SharedAccessSignatureTokenConnection(
                container_uri=pc_collection_asset_container,
                shared_access_signature_token=pc_token["token"]
            )
        )
    )

### Ingest STAC items using the SDK
Now that you registered an ingestion source or validated that a source exists, you'll ingest the STAC items you found within the Planetary Computer using the SDK's `begin_create_item` method. This creates a new ingestion operation within GeoCatalog for each item.

In [None]:
# Ingest items

operations = []

for item in items:

    item_json = item.to_dict()
    item_json['collection'] = collection_id

    # Remove non-static assets
    del(item_json['assets']['rendered_preview'])
    del(item_json['assets']['preview'])
    del(item_json['assets']['tilejson'])

    # Use SDK to create item
    operation = pc_client.stac.begin_create_item(
        collection_id=collection_id,
        body=item_json,
        polling=True
    )

    operations.append(operation)
    print(f"Ingesting item {item_json['id']}")

Given that Sentinel-2 item ingestion can take a little time, you can run this code to check the status of your ingestion operations by calling `.status()` on the poller objects returned from `begin_create_item`.


In [None]:
# Check the status of the operations
pending=True

start = time.time()

while pending:
    # Count the number of operations that are finished vs unfinished
    num_running = 0
    num_finished = 0
    num_failed = 0
    clear_output(wait=True)
    for operation in operations:
        status = operation.status()
        print(f"Operation status: {status}")
        if status == "Pending":
            num_running+=1
        if status == "Running":
            num_running+=1
        elif status == "Failed":
            num_failed+=1
        elif status == "Succeeded":
            num_finished+=1
    
    num_running
    stop=time.time()
    # Print the sumary of num finished, num running and num failed
    
    print("Ingesting Imagery:")
    print(f"\tFinished: {num_finished}\n\tRunning: {num_running}\n\tFailed: {num_failed}")
    print("Time Elapsed (seconds):",str(stop-start))
    
    if num_running == 0:
        pending=False
        print(f"Ingestion Complete!\n\t{num_finished} items ingested.\n\t{num_failed} items failed.")

    else:
        print(f"Waiting for {num_running} operations to finish")
        time.sleep(5)

In [None]:
# Show detailed error messages
for operation in operations:
    try:
        operation.result()
    except HttpResponseError as e:
        exception_body = json.loads(e.response.body())
        error = exception_body["statusHistory"][-1]
        print(json.dumps(error, indent=2))


You should be able to refresh your web browser and click on the Items tab to see these newly uploaded items.

## Collection management

Now that you ingested these STAC items and their associated assets (images) into the STAC collection, you need to provide you GeoCatalog with some other configuration files before you can visualize these items in the GeoCatalog web interface.

### Collection render config
First download a render configuration file for this collection from the Planetary Computer. This config file can be read by GeoCatalog to render images in different ways within the Explorer. This is because STAC items may contain many different assets (images) that can be combined to create entirely new images of a given area that highlight visible or nonvisible features. For instance, Sentinel-2 STAC items have over 12 different images from different portions of the electromagnetic spectrum. This render config instructs GeoCatalog on how to combine these images so it can display images in Natural Color or False Color (Color Infrared).


In [None]:
# Read render JSON from Planetary Computer

render_json = requests.get("https://planetarycomputer.microsoft.com/api/data/v1/mosaic/info?collection={}".format(pc_collection)).json()
render_json['renderOptions']

After reading this render options config from the Planetary Computer, you can enable these render options for the collection using the SDK's `create_render_option` method.

In [None]:
# Post render options config to GeoCatalog using the SDK

for render_option in render_json['renderOptions']:

    # Rename render configs such that they can be stored by GeoCatalog
    render_option['id'] = render_option['name'].translate(str.maketrans('', '', string.punctuation)).lower().replace(" ", "-")[:30]

    # Post render definition using SDK
    pc_client.stac.create_render_option(
        collection_id=collection_id,
        body=render_option
    )

### Mosaic definitions

Similar to the Render Config discussed above, GeoCatalog's Explorer allows us to specify one or more mosaic definitions for the collection. These mosaic definitions enable us to instruct GeoCatalog's Explorer on how to filter which items are displayed within the Explorer. For example, one basic render configuration (shown in the next cell) instructs GeoCatalog to display the most recent image for any given area. More advanced render configurations allow us to render different views such as the least cloudy image for a given location captured in October 2023.


In [None]:
# Post mosaic definition using the SDK

pc_client.stac.add_mosaic(
    collection_id=collection_id,
    body=StacMosaic(
        id="mos1",
        name="Most recent available",
        description="Most recent available imagery in this collection",
        cql=[]
    )
)

### Open GeoCatalog web interface

Congrats! You created a collection using the azure-planetarycomputer SDK, added STAC items and assets, and updated your collection to include the required configuration files so it can be viewed through the Explorer within the GeoCatalog web interface.

**Navigate back to the GeoCatalog Explorer in the web interface to view your collection!**

## Query collection via STAC API

Now that you've viewed your collection in the GeoCatalog Explorer, you'll walk through how to use the SDK's STAC search methods to search for and retrieve STAC items and assets for further analysis.

This process uses the SDK's `search` method to search your GeoCatalog's STAC API. Specifically, you'll search for imagery within your collection that falls within the original bounding box you used to extract imagery from the Planetary Computer.

Unsurprisingly this query returns all the STAC items you previously placed within your collection.

In [None]:
# Search for items using the SDK with signed URLs

search_params = StacSearchParameters(
    collections=[collection_id],
    bounding_box=bbox_aoi,
    sign=StacAssetUrlSigningMode.TRUE
)

search_result = pc_client.stac.search(body=search_params, params={"sign": "true"})

matching_items = search_result.features
print(len(matching_items))

In your prior query, you also provided another parameter: **sign=True**. This instructs the SDK to return signed hrefs (item href + SAS token) which allows you to read the given assets from Azure Blob Storage.

In [None]:
# Download one of the asset bands, band 09
asset_href = matching_items[0].as_dict()['assets']['B09']['href']
print(asset_href)

response = requests.get(asset_href)
img = Image.open(BytesIO(response.content))
img

## Clean up resources
### Delete items

At this point, you have created a GeoCatalog Collection using the azure-planetarycomputer SDK, added items and assets to the collection, and retrieved those items and assets using the SDK's search methods. For the final phase of this tutorial, you're going to remove these items and delete your collection using the SDK.

In [None]:
# Delete all items using the SDK

for item in matching_items:
    pc_client.stac.begin_delete_item(
        collection_id=collection_id,
        item_id=item.id,
        polling=False
    )

You can confirm all of your items were deleted by running the next command. Note it may take a minute or two to fully delete items and their associated assets.


In [None]:
# Confirm that all the items have been deleted
search_params = StacSearchParameters(
    collections=[collection_id],
    bounding_box=bbox_aoi,
    sign=StacAssetUrlSigningMode.TRUE
)

search_result = pc_client.stac.search(body=search_params)

matching_items = search_result.features
print(len(matching_items))

### Delete collection

Now as a final step, you may want to fully delete your collection from your GeoCatalog instance.


In [28]:
# Delete the collection using the SDK
delete_collection_operation = pc_client.stac.begin_delete_collection(
    collection_id=collection_id,
    polling=True
)
delete_collection_operation.result()

print(f"STAC Collection deleted: {collection_id}")

STAC Collection deleted: sentinel-2-l2a-tutorial-52


## Related content
In this end-to-end tutorial, you walked through the process of creating a new STAC collection, ingesting Sentinel-2 images into the collection, and querying those images via GeoCatalog's APIs. If you would like to learn more about each of these topics, explore these other materials:

* [Create a GeoCatalog](https://learn.microsoft.com/en-us/azure/planetary-computer/deploy-geocatalog-resource.md)
* [Create a collection](https://learn.microsoft.com/en-us/azure/planetary-computer/create-stac-collection.md)
* [Ingest STAC items](https://learn.microsoft.com/en-us/azure/planetary-computer/ingestion-source.md)
* [Create a Render Configuration](https://learn.microsoft.com/en-us/azure/planetary-computer/render-configuration.md)
* [Configure collection Tile Settings](https://learn.microsoft.com/en-us/azure/planetary-computer/tile-settings.md)
* [Mosaic Configuration](https://learn.microsoft.com/en-us/azure/planetary-computer/mosaic-configurations-for-collections.md)
* [Queryables Configuration](https://learn.microsoft.com/en-us/azure/planetary-computer/queryables-for-explorer-custom-search-filter.md)