<img align="right" src="https://github.com/eo2cube/eo2cube_book/blob/7880672deff906b41f993c856fe1a7eb38ed5b3a/images/banner_siegel.png?raw=true" style="width:1000px;">

# Data Lookup and Loading (Planetary Computer + STAC + odc-stac)

This notebook demonstrates how to:

- Connect to the Microsoft Planetary Computer STAC API using `pystac-client`
- Search for Sentinel-2 L2A items using `catalog.search(...)`
- Load pixels into an `xarray.Dataset` using `odc.stac.stac_load(...)`

We do this **without importing custom course utility functions** — everything is written out explicitly for learning.

## Load packages

We use:
- `pystac-client` to query a STAC API
- `planetary-computer` to sign asset URLs (required for access)
- `odc-stac` to load the selected items into an xarray `Dataset`

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

from pystac_client import Client
import planetary_computer as pc

from odc.stac import stac_load

pd.set_option("display.max_colwidth", 200)
pd.set_option("display.max_rows", None)

## Constants

We define a small set of constants used throughout the notebook.

In [None]:
STAC_URL = "https://planetarycomputer.microsoft.com/api/stac/v1"
COLLECTION = "sentinel-2-l2a"

# Würzburg, Bavaria (approx) in EPSG:4326
bbox = (9.88, 49.75, 10.00, 49.82)  # (min_lon, min_lat, max_lon, max_lat)

# Output grid
crs = "EPSG:32632"  # UTM zone 32N
resolution = 20

## Connect to the STAC API

`Client.open(...)` returns a STAC client which we use to list collections and run searches.

In [None]:
catalog = Client.open(STAC_URL)
catalog

## List collections (optional)

STAC catalogues organize data into **collections**.

In [None]:
collections = list(catalog.get_collections())
pd.DataFrame([{"id": c.id, "title": c.title} for c in collections]).sort_values("id").reset_index(drop=True).head(20)

## Search items

A STAC search filters by space/time and any additional metadata.

Typical filters for Sentinel-2 are:
- `datetime`: a string like `"2022-03"` or an interval like `"2022-03-01/2022-03-15"`
- `eo:cloud_cover`: e.g. less than 40

You can also add a tile filter (example):

```python
query={"s2:mgrs_tile": dict(eq="32UPU")}
```

In [None]:
datetime = "2022-03-01/2022-03-15"
cloud_cover_lt = 40

# Optional: uncomment to restrict to a single MGRS tile (example value)
# tile = "32UPU"

query = {"eo:cloud_cover": {"lt": cloud_cover_lt}}
# if tile is not None:
#     query["s2:mgrs_tile"] = {"eq": tile}

search = catalog.search(
    collections=[COLLECTION],
    bbox=bbox,
    datetime=datetime,
    query=query,
)

items = list(search.get_items())
len(items), items[0].id

## Inspect assets/bands

STAC items expose **assets** (often Cloud-Optimized GeoTIFFs). Here we show the asset keys of one item.

In [None]:
item = items[0]
sorted(item.assets.keys())[:30]

## Load pixels with `odc-stac`

We now load a small multi-band cube into an `xarray.Dataset`.

Key points:
- We request the **STAC asset keys** (e.g. `B02`, `B03`, `B04`, `B08`, `SCL`).
- We use `patch_url=pc.sign` so every asset URL gets signed automatically during load.
- We use `dtype="uint16"` and `nodata=0` for reading, then convert reflectance to ~0..1.

In [None]:
bands = ["B02", "B03", "B04", "B08", "SCL"]

# Resampling: categorical SCL uses nearest; reflectance uses bilinear
resampling = {"*": "bilinear", "SCL": "nearest"}

ds_raw = stac_load(
    items,
    bands=bands,
    crs=crs,
    resolution=resolution,
    chunks={"x": 2048, "y": 2048},
    patch_url=pc.sign,
    dtype="uint16",
    nodata=0,
    groupby="solar_day",
    resampling=resampling,
)

ds_raw

## Rename bands and scale reflectance

Sentinel-2 reflectance values are stored as integers. We scale by $10^{-4}$ to get approximate reflectance in 0..1.

We also rename bands to short names used in later notebooks.

In [None]:
rename_map = {
    "B02": "blue",
    "B03": "green",
    "B04": "red",
    "B08": "nir",
    "SCL": "scl",
}

present = {k: v for k, v in rename_map.items() if k in ds_raw.data_vars}
ds = ds_raw.rename(present)

for name in list(ds.data_vars):
    if name == "scl":
        continue
    ds[name] = ds[name].astype("float32") * 1e-4

ds

## Mask clouds using SCL (quick example)

SCL is a categorical layer. A common “keep” set is:
- 4 vegetation
- 5 not-vegetated
- 6 water
- 7 unclassified

In [None]:
import numpy as np
import xarray as xr

keep_classes = np.array([4, 5, 6, 7], dtype=ds["scl"].dtype)
keep = xr.apply_ufunc(np.isin, ds["scl"], keep_classes)

ds_clear = ds.where(keep)

ds_clear

## Quick visual check (RGB)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

rgb = ds_clear.isel(time=0)[["red", "green", "blue"]]
rgb_plot = xr.concat([rgb.red, rgb.green, rgb.blue], dim="band").transpose("y", "x", "band")

plt.figure(figsize=(6, 6))
plt.imshow(np.clip(rgb_plot.values, 0, 0.3) / 0.3)
plt.title("RGB (masked)")
plt.axis("off")
plt.show()

***

## Additional information

<font size="2">This notebook is adapted from public EO teaching materials and updated for STAC/Planetary Computer access using `odc-stac`. Thanks!</font>

**Last modified:** 2026