# Catalog Discovery with STAC API

This notebook demonstrates how to programmatically discover and explore datasets available on the Ocean Data Platform using the STAC (SpatioTemporal Asset Catalog) API.

**What you'll learn:**
- Query the STAC API to list available collections
- Search for datasets by spatial extent and keywords
- Retrieve dataset metadata before loading data
- Connect discovered datasets to the Python SDK

**Prerequisites:**
- Running in ODP Workspace (auto-authenticated) or have an API key
- `odp-sdk` installed (`pip install -U odp-sdk`)

## 1. Setup and Configuration

In [None]:
import requests
import json
from pprint import pprint

# STAC API base URL
STAC_BASE_URL = "https://api.hubocean.earth/api/stac"

# Helper function for STAC requests
def stac_get(endpoint):
    """GET request to STAC API endpoint."""
    url = f"{STAC_BASE_URL}{endpoint}"
    response = requests.get(url)
    response.raise_for_status()
    return response.json()

def stac_post(endpoint, payload):
    """POST request to STAC API endpoint."""
    url = f"{STAC_BASE_URL}{endpoint}"
    response = requests.post(url, json=payload)
    response.raise_for_status()
    return response.json()

## 2. Explore the Root Catalog

The STAC root catalog provides links to collections and search endpoints.

In [None]:
# Get the root catalog
root_catalog = stac_get("/")

print("Catalog ID:", root_catalog.get("id"))
print("Description:", root_catalog.get("description"))
print("\nAvailable links:")
for link in root_catalog.get("links", []):
    print(f"  - {link.get('rel')}: {link.get('href')}")

## 3. List All Collections

Collections represent datasets in the STAC model. Each collection has metadata describing its spatial/temporal extent and available assets.

In [None]:
# List all available collections
collections_response = stac_get("/collections")
collections = collections_response.get("collections", [])

print(f"Found {len(collections)} collections:\n")

for coll in collections:
    print(f"ID: {coll.get('id')}")
    print(f"  Title: {coll.get('title', 'N/A')}")
    print(f"  Description: {coll.get('description', 'N/A')[:100]}...")
    
    # Spatial extent
    extent = coll.get("extent", {})
    spatial = extent.get("spatial", {}).get("bbox", [])
    if spatial:
        print(f"  Bounding Box: {spatial[0]}")
    
    print()

## 4. Search by Spatial Extent

The STAC search endpoint allows filtering by:
- **bbox**: Bounding box `[west, south, east, north]`
- **intersects**: GeoJSON geometry
- **datetime**: ISO 8601 date/time range
- **collections**: List of collection IDs to search within

In [None]:
# Search for datasets covering Norwegian waters
# Approximate bounding box for Norwegian Sea
norwegian_sea_bbox = [-5, 55, 30, 75]  # [west, south, east, north]

search_payload = {
    "bbox": norwegian_sea_bbox,
    "limit": 10
}

search_results = stac_post("/search", search_payload)

print(f"Found {len(search_results.get('features', []))} items in Norwegian waters:\n")

for feature in search_results.get("features", []):
    props = feature.get("properties", {})
    print(f"ID: {feature.get('id')}")
    print(f"  Collection: {feature.get('collection')}")
    print(f"  Datetime: {props.get('datetime', 'N/A')}")
    print()

## 5. Search with GeoJSON Polygon

For more precise spatial queries, use a GeoJSON polygon with the `intersects` parameter.

In [None]:
# Define a polygon around the North Sea
north_sea_polygon = {
    "type": "Polygon",
    "coordinates": [[
        [-5, 51],   # SW corner
        [9, 51],    # SE corner  
        [9, 62],    # NE corner
        [-5, 62],   # NW corner
        [-5, 51]    # Close polygon
    ]]
}

search_payload = {
    "intersects": north_sea_polygon,
    "limit": 10
}

search_results = stac_post("/search", search_payload)

features = search_results.get("features", [])
print(f"Found {len(features)} items intersecting North Sea polygon:\n")

# Collect unique collection IDs from search results
discovered_collections = set()
for feature in features:
    collection_id = feature.get('collection')
    discovered_collections.add(collection_id)
    print(f"  - {feature.get('id')} (collection: {collection_id})")

# Store first collection for use in subsequent steps
if discovered_collections:
    DISCOVERED_COLLECTION_ID = list(discovered_collections)[0]
    print(f"\n>> Using collection '{DISCOVERED_COLLECTION_ID}' for next steps")

## 6. Get Detailed Collection Metadata

Using the collection ID discovered in the previous step, retrieve its full metadata.

In [None]:
# Get details for the collection discovered in step 5

if 'DISCOVERED_COLLECTION_ID' not in dir():
    raise ValueError(
        "No collection found in step 5. "
        "Re-run the North Sea polygon search above, or try a different region."
    )

collection_id = DISCOVERED_COLLECTION_ID

try:
    collection_detail = stac_get(f"/collections/{collection_id}")
    
    print("Collection Details:")
    print(f"  ID: {collection_detail.get('id')}")
    print(f"  Title: {collection_detail.get('title')}")
    print(f"  Description: {collection_detail.get('description')}")
    print(f"  License: {collection_detail.get('license')}")
    
    # Temporal extent
    temporal = collection_detail.get("extent", {}).get("temporal", {}).get("interval", [])
    if temporal:
        print(f"  Temporal Range: {temporal[0]}")
    
    # Keywords/tags
    keywords = collection_detail.get("keywords", [])
    if keywords:
        print(f"  Keywords: {', '.join(keywords)}")
        
except requests.exceptions.HTTPError as e:
    print(f"Could not fetch collection '{collection_id}': {e}")
    print("\nThis may indicate the STAC search returns item IDs that differ from collection IDs.")
    print("Try using a collection ID from step 3 (List All Collections) instead.")

## 7. Connect to Python SDK

Once you've discovered a dataset via STAC, connect to it using the ODP Python SDK for data access.

In [None]:
from odp.client import Client

# Initialize client (auto-authenticated in ODP Workspace)
client = Client()

# Connect to the dataset discovered via STAC
# Use the collection ID from step 6
dataset = client.dataset(collection_id)

# Get schema to understand the data structure
schema = dataset.table.schema()
if schema:
    print(f"Dataset Schema for {collection_id}:")
    for field in schema:
        print(f"  {field.name}: {field.type}")
else:
    print("This dataset may be file-based rather than tabular.")

In [None]:
# Get table statistics
stats = dataset.table.stats()
if stats:
    print(f"Total rows: {stats.num_rows:,}")
    print(f"Size: {stats.size:,} bytes")

In [None]:
# Preview the first few rows
preview_df = dataset.table.select().all(max_rows=5).dataframe()
preview_df

## 8. Build a Dataset Inventory

Create a summary inventory of available datasets for reference.

In [None]:
import pandas as pd

# Build inventory from collections
inventory = []

for coll in collections:
    extent = coll.get("extent", {})
    spatial = extent.get("spatial", {}).get("bbox", [[]])[0] if extent.get("spatial", {}).get("bbox") else None
    temporal = extent.get("temporal", {}).get("interval", [[]])[0] if extent.get("temporal", {}).get("interval") else None
    
    inventory.append({
        "id": coll.get("id"),
        "title": coll.get("title", "N/A"),
        "description": coll.get("description", "N/A")[:100] + "..." if coll.get("description") else "N/A",
        "license": coll.get("license", "N/A"),
        "bbox": str(spatial) if spatial else "N/A",
        "temporal_start": temporal[0] if temporal else "N/A",
        "temporal_end": temporal[1] if temporal and len(temporal) > 1 else "N/A",
        "keywords": ", ".join(coll.get("keywords", []))
    })

inventory_df = pd.DataFrame(inventory)
print(f"Dataset Inventory ({len(inventory_df)} collections):")
inventory_df

In [None]:
# Save inventory to CSV for reference
inventory_df.to_csv("odp_dataset_inventory.csv", index=False)
print("Inventory saved to odp_dataset_inventory.csv")

## Next Steps

Now that you've discovered available datasets, continue with:

- **02_geospatial_analysis.ipynb**: Query and visualize data using H3 hexagonal aggregation
- **03_data_pipeline.ipynb**: Ingest files and transform into tabular data
- **04_multi_dataset_join.ipynb**: Combine multiple datasets for analysis

## Resources

- [ODP Documentation](https://docs.hubocean.earth/)
- [STAC Specification](https://stacspec.org/)
- [Python SDK Reference](https://docs.hubocean.earth/python_sdk/intro/)
- [ODP Catalog (Web UI)](https://app.hubocean.earth/catalog)