![Course header](../assets/img/header.png)

# 04 ‚Äî STAC Fundamentals
Search, filter, and preview satellite data using a real STAC API

This notebook introduces the **SpatioTemporal Asset Catalog (STAC)** standard.
You will use the Microsoft Planetary Computer STAC API to find Sentinel‚Äë2 scenes.

## Learning Objectives

This notebook serves as a **quick reference** for STAC discovery. If you're already comfortable with STAC, feel free to skim or skip ahead to Notebook 05.

By the end of this notebook, you will be able to:

- Explain what STAC is and why it matters for EO data discovery
- Connect to a STAC API and list available collections
- Search for satellite imagery by area, time, and cloud cover
- Inspect Item metadata and assets
- Build a results table with pandas
- Preview a band from a STAC Item
- Export results for reuse

Tooling in this notebook:
- pystac-client
- planetary-computer
- pandas
- matplotlib

‚è±Ô∏è Estimated time: **1 ‚Äì 1.5 hours**

We keep the AOI small and the number of Items low to stay fast.

---

## How to use this notebook

1. Run cells in order.
2. Keep bbox small.
3. If you get empty results, loosen filters (wider time range or higher cloud threshold).
4. If something breaks, restart kernel and run all.

---

## Table of contents

1. Setup
2. What is STAC?
3. Connect to a STAC API
4. Define a search area (bbox)
5. Search for Sentinel-2 images
6. Inspect Item metadata and assets
7. Build a results table with pandas
8. Preview a band from a STAC Item
9. Export results
10. Exercises
11. Recap

---

## 1) Setup

### Imports

In [None]:
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import pystac_client
import planetary_computer

### Paths
Outputs go to `../outputs/`.

In [None]:
OUT_DIR = Path('..') / 'outputs'
OUT_DIR.mkdir(exist_ok=True)
OUT_DIR.resolve()

---

## 2) What is STAC?

**STAC** (SpatioTemporal Asset Catalog) is a specification for describing geospatial data.
It has become the standard for organising and discovering satellite imagery in the cloud.

### STAC Components

| Component      | Description                          | Example                                |
|----------------|--------------------------------------|----------------------------------------|
| **Catalog**    | The root entry point                 | Planetary Computer, Earth Search       |
| **Collection** | A group of related Items             | Sentinel-2 L2A, Landsat 8             |
| **Item**       | A single observation (one scene)     | One Sentinel-2 granule on 2024-06-15  |
| **Asset**      | A file associated with an Item       | Red band GeoTIFF, thumbnail PNG       |

### Why STAC?

Before STAC, every data provider had their own API and query syntax.
With STAC:

- **Standardised** ‚Äî the same query works across providers
- **Cloud-native** ‚Äî direct links to COGs (Cloud-Optimised GeoTIFFs), no bulk download needed
- **Rich metadata** ‚Äî spatial extent, temporal coverage, cloud cover, band info, etc.

Think of STAC as a **library catalogue for satellite images**: you search the catalogue first, then load only the data you need.

---

## 3) Connect to a STAC API

We use **Microsoft Planetary Computer**. It hosts Sentinel-2, Landsat, and many other collections.

Planetary Computer requires **signed URLs** ‚Äî the `planetary_computer.sign_inplace` modifier handles that automatically.

In [None]:
STAC_API_URL = 'https://planetarycomputer.microsoft.com/api/stac/v1'

catalog = pystac_client.Client.open(
    STAC_API_URL,
    modifier=planetary_computer.sign_inplace,
)
print(f'Connected to: {catalog.title}')

### 3.1 List available collections

A STAC Catalog can host many Collections. Let's see a selection.

In [None]:
# Print collections whose names contain common EO keywords
print('Selected Collections:')
print('-' * 60)
keywords = ['sentinel', 'landsat', 'modis', 'dem']
for collection in catalog.get_collections():
    if any(kw in collection.id.lower() for kw in keywords):
        print(f'  {collection.id}: {collection.title}')

---

## 4) Define a search area (bbox)

We define an **area of interest (AOI)** as a bounding box:

```
(west, south, east, north)
```

in geographic coordinates (longitude / latitude).

Keep it small ‚Äî a few kilometres is enough for this exercise.

In [None]:
# AOI: small area near W√ºrzburg, Germany
AOI_BBOX = (9.95, 49.78, 10.05, 49.83)

print(f'West:  {AOI_BBOX[0]}')
print(f'South: {AOI_BBOX[1]}')
print(f'East:  {AOI_BBOX[2]}')
print(f'North: {AOI_BBOX[3]}')

---

## 5) Search for Sentinel-2 images

We search the `sentinel-2-l2a` collection (Level-2A = surface reflectance) with filters:

- **bbox** ‚Äî spatial extent
- **datetime** ‚Äî time range
- **query** ‚Äî cloud cover threshold

In [None]:
# Search parameters
DATE_RANGE = '2024-06-01/2024-06-30'
MAX_CLOUD = 20  # percent

search = catalog.search(
    collections=['sentinel-2-l2a'],
    bbox=AOI_BBOX,
    datetime=DATE_RANGE,
    query={'eo:cloud_cover': {'lt': MAX_CLOUD}},
)

items = list(search.items())
print(f'Found {len(items)} items with cloud cover < {MAX_CLOUD}%')

### 5.1 Quick list of results

In [None]:
print('Found Items:')
print('-' * 70)
for item in items:
    cloud = item.properties.get('eo:cloud_cover', 'N/A')
    dt = item.properties.get('datetime', 'N/A')
    print(f'  {item.id}')
    print(f'    Date: {dt}    Cloud cover: {cloud:.1f}%')
    print()

> **üí° Tip:** If `Found 0 items`, try widening the time range or raising `MAX_CLOUD`.

---

## 6) Inspect Item metadata and assets

Each STAC **Item** carries rich metadata. Let's examine the first result.

In [None]:
item = items[0]

print('Item Details')
print('=' * 50)
print(f'ID:       {item.id}')
print(f'Datetime: {item.datetime}')
print(f'Geometry: {item.geometry["type"]}')
print(f'Bbox:     {item.bbox}')

### 6.1 Key properties

Properties are the metadata dictionary of the Item.

In [None]:
print('Key Properties:')
print('-' * 50)

props_of_interest = [
    'eo:cloud_cover',
    'proj:epsg',
    's2:granule_id',
    'platform',
    'constellation',
]

for prop in props_of_interest:
    value = item.properties.get(prop, 'N/A')
    print(f'  {prop}: {value}')

### 6.2 Available assets

Assets are the actual data files (GeoTIFFs, thumbnails, metadata) linked to an Item.

In [None]:
print('Available Assets:')
print('-' * 50)
for key, asset in item.assets.items():
    title = asset.title if asset.title else key
    print(f'  {key}: {title}')

### 6.3 Get URLs for specific bands

Each spectral band is an Asset with a URL pointing to a Cloud-Optimised GeoTIFF (COG).
These URLs let you stream pixel data directly ‚Äî no download required.

In [None]:
print('Band URLs:')
print('-' * 50)

bands_to_check = ['B02', 'B03', 'B04', 'B08']  # blue, green, red, NIR
for band in bands_to_check:
    if band in item.assets:
        url = item.assets[band].href
        # Shorten for display
        short_url = '...' + url[-60:] if len(url) > 60 else url
        print(f'  {band}: {short_url}')
    else:
        print(f'  {band}: not found in assets')

---

## 7) Build a results table with pandas

For more than a handful of Items, a DataFrame is easier to work with than looping and printing.

In [None]:
rows = []
for it in items:
    props = it.properties
    rows.append({
        'id': it.id,
        'datetime': props.get('datetime'),
        'cloud_cover': props.get('eo:cloud_cover', np.nan),
        'platform': props.get('platform', 'N/A'),
        'epsg': props.get('proj:epsg', 'N/A'),
    })

df = pd.DataFrame(rows)
df['datetime'] = pd.to_datetime(df['datetime'], utc=True, errors='coerce')
df['cloud_cover'] = pd.to_numeric(df['cloud_cover'], errors='coerce')
df.sort_values('cloud_cover').head(10)

### 7.1 Quick stats

In [None]:
print(f'Total items:       {len(df)}')
print(f'Date range:        {df["datetime"].min()} to {df["datetime"].max()}')
print(f'Mean cloud cover:  {df["cloud_cover"].mean():.1f}%')
print(f'Platforms:         {df["platform"].unique().tolist()}')

### 7.2 Cloud-cover histogram

In [None]:
fig, ax = plt.subplots(figsize=(6, 3))
ax.hist(df['cloud_cover'].dropna(), bins=10, edgecolor='white')
ax.set_xlabel('Cloud cover (%)')
ax.set_ylabel('Number of scenes')
ax.set_title('Cloud-cover distribution of search results')
plt.tight_layout()
plt.show()

---

## 8) Preview a band from a STAC Item

We can load a single band directly from its COG URL using `rioxarray`.
We use `overview_level=3` (a coarse overview pyramid) so it downloads fast.

> **Note:** `rioxarray` extends xarray with rasterio-backed I/O.
> Install with `pip install rioxarray` if needed.

In [None]:
import rioxarray

# Pick the best (lowest-cloud) item
best_item = items[0]
red_url = best_item.assets['B04'].href

print(f'Loading red band (B04) from: {best_item.id}')
%time red_band = rioxarray.open_rasterio(red_url, overview_level=3)

print(f'Shape:  {red_band.shape}')
print(f'CRS:    {red_band.rio.crs}')
print(f'Bounds: {red_band.rio.bounds()}')

In [None]:
fig, ax = plt.subplots(figsize=(8, 8))
red_band.squeeze().plot(ax=ax, cmap='Reds', vmin=0, vmax=3000)
ax.set_title(f'Red band (B04) ‚Äî {best_item.id}\n{best_item.datetime}')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

### 8.1 Display a thumbnail (if available)

Many STAC Items include a pre-rendered thumbnail or preview image as an Asset.

In [None]:
# Check for a rendered preview or thumbnail asset
preview_key = 'rendered_preview' if 'rendered_preview' in best_item.assets else 'thumbnail'

if preview_key in best_item.assets:
    from IPython.display import Image, display
    thumb_url = best_item.assets[preview_key].href
    print(f'Displaying: {preview_key}')
    display(Image(url=thumb_url, width=400))
else:
    print('No thumbnail or rendered_preview asset available for this item.')

---

## 9) Export results

Save the results table to CSV so you can reuse it later (e.g., in Notebook 05).

In [None]:
out_csv = OUT_DIR / 'stac_search_results.csv'
df.to_csv(out_csv, index=False)
print(f'Saved {len(df)} rows to {out_csv.resolve()}')

---

## 10) Exercises

### ‚úÖ Try it ‚Äî Change the search parameters

1. Pick a **different AOI** ‚Äî try a place you know.
   Look up approximate coordinates on Google Maps or [bboxfinder.com](http://bboxfinder.com).
2. Change the **time range** to a different month.
3. Tighten `MAX_CLOUD` to `10` ‚Äî how many results do you get?

In [None]:
# TODO: fill in your own AOI and time range
my_bbox = (___,  ___,  ___,  ___)  # (west, south, east, north)
my_date_range = '____-__-__/____-__-__'
my_max_cloud = 10

my_search = catalog.search(
    collections=['sentinel-2-l2a'],
    bbox=my_bbox,
    datetime=my_date_range,
    query={'eo:cloud_cover': {'lt': my_max_cloud}},
)
my_items = list(my_search.items())
print(f'Found {len(my_items)} items')

In [None]:
# TODO: Print the dates and cloud cover of your first 5 results
for it in my_items[:5]:
    dt = it.properties.get('datetime', 'N/A')
    cc = it.properties.get('eo:cloud_cover', 'N/A')
    print(f'  {dt}  ‚Äî  cloud: {cc}%')

### ‚úÖ Try it ‚Äî Explore a different collection

Search for **Landsat** imagery instead of Sentinel-2.
The collection ID on Planetary Computer is `landsat-c2-l2`.

In [None]:
# TODO: search for Landsat images over the same AOI

### üß† Checkpoint

**Q1.** What does a STAC "Collection" represent?

- A) A single satellite image
- B) A group of related Items (e.g., all Sentinel-2 L2A scenes)
- C) A file like a GeoTIFF

**Q2.** How do you get the download URL for the red band of a STAC Item?

- A) `item.red.url`
- B) `item.assets['B04'].href`
- C) `item.properties['red']`

**Q3.** What does the `bbox` parameter in a STAC search represent?

- A) The pixel dimensions of the image
- B) The geographic bounding box `(west, south, east, north)` of the area of interest
- C) The cloud cover threshold

**Q4.** Why do we use `planetary_computer.sign_inplace` as a modifier?

- A) It converts images to PNG
- B) It signs the asset URLs so we can access the data without authentication tokens
- C) It compresses the search results

---

## 11) Recap

You now know how to:

| Skill | Tool / Code |
|-------|-------------|
| Connect to a STAC API | `pystac_client.Client.open(url)` |
| Search by area, time, cloud | `catalog.search(collections, bbox, datetime, query)` |
| Get Items from search | `list(search.items())` |
| Read Item metadata | `item.properties`, `item.datetime`, `item.bbox` |
| Access asset URLs | `item.assets['B04'].href` |
| Build a table from Items | Loop ‚Üí list of dicts ‚Üí `pd.DataFrame()` |
| Preview a band | `rioxarray.open_rasterio(url, overview_level=3)` |

### STAC hierarchy reminder

```
Catalog
 ‚îî‚îÄ‚îÄ Collection  (e.g., sentinel-2-l2a)
      ‚îî‚îÄ‚îÄ Item   (one scene on one date)
           ‚îî‚îÄ‚îÄ Asset  (one file: B04.tif, thumbnail.png, ‚Ä¶)
```

### Next steps

In **Notebook 05** you will stack multiple bands and scenes into an **xarray cube** using `stackstac`, compute NDVI, and export results.