# Build a Static STAC Catalog and Browse with STAC Browser

This notebook builds STAC items for three public icechunk stores, assembles
them into a static STAC catalog, saves it **locally**, uploads it to S3 with
fsspec, and produces a [STAC Browser](https://github.com/radiantearth/stac-browser)
URL for immediate browsing — no server required.

## Architecture

```
s3://BUCKET/PREFIX/
  catalog.json
  noaa-gfs-forecast-{snapshot}/noaa-gfs-forecast-{snapshot}.json
  noaa-hrrr-forecast-48-hour-{snapshot}/noaa-hrrr-forecast-48-hour-{snapshot}.json
  nldas-3-virtual-zarr-{snapshot}/nldas-3-virtual-zarr-{snapshot}.json
```

The catalog uses `SELF_CONTAINED` relative links so it can be moved between
buckets/prefixes without breaking.  STAC Browser is a client-side Vue.js app
hosted by Radiant Earth — it fetches and renders the catalog directly from S3
with no backend.

## Install dependencies
```
pip install icechunk xarray zarr pystac xstac rioxarray s3fs cloudify
```

In [1]:
import json
import tempfile
import warnings
from pathlib import Path

import icechunk
import pystac
import rioxarray
import s3fs
import xarray as xr

from cloudify.stac import build_stac_item_from_icechunk

warnings.filterwarnings(
    "ignore",
    message="Numcodecs codecs are not in the Zarr version 3 specification*",
    category=UserWarning,
)

## 1. Configure the output catalog location

`CATALOG_BUCKET` and `CATALOG_PREFIX` are set in **Step 6** just before the upload,
so you can inspect the local JSON first.  Set up your AWS credentials in the
environment before running Step 6 (env vars, `~/.aws/credentials`, or instance profile).

## 2. Configure local staging directory

The catalog is first written to a local temp directory, then uploaded to S3.
This lets you inspect the JSON before it goes live and avoids partial writes
on the remote.

In [2]:
LOCAL_CATALOG_DIR = Path(tempfile.mkdtemp(prefix="stac-catalog-"))
print(f"Local staging directory: {LOCAL_CATALOG_DIR}")

Local staging directory: /tmp/stac-catalog-7ngilfxh


## 3. Build STAC items for each dataset

In [3]:
def open_icechunk(bucket, prefix, region, anonymous=True,
                  virtual_source=None, snapshot_id=None, branch="main"):
    """Open an icechunk repo and return (session, ds)."""
    storage = icechunk.s3_storage(
        bucket=bucket, prefix=prefix, region=region, anonymous=anonymous
    )
    config = icechunk.RepositoryConfig.default()
    repo_kwargs = {"storage": storage, "config": config}

    if virtual_source:
        config.set_virtual_chunk_container(
            icechunk.VirtualChunkContainer(
                virtual_source, icechunk.s3_store(region=region)
            )
        )
        repo_kwargs["authorize_virtual_chunk_access"] = icechunk.containers_credentials(
            {virtual_source: icechunk.s3_anonymous_credentials()}
        )

    repo = icechunk.Repository.open(**repo_kwargs)
    session = repo.readonly_session(
        snapshot_id=snapshot_id if snapshot_id else branch
    ) if snapshot_id else repo.readonly_session(branch=branch)
    ds = xr.open_zarr(session.store, chunks=None, consolidated=False, zarr_format=3)
    return session, ds

In [4]:
print("Opening GFS...")
gfs_session, gfs_ds = open_icechunk(
    bucket="dynamical-noaa-gfs",
    prefix="noaa-gfs-forecast/v0.2.7.icechunk/",
    region="us-west-2",
)
print(f"  snapshot: {gfs_session.snapshot_id}  dims: {dict(gfs_ds.sizes)}")

print("Opening HRRR...")
hrrr_session, hrrr_ds = open_icechunk(
    bucket="dynamical-noaa-hrrr",
    prefix="noaa-hrrr-forecast-48-hour/v0.1.0.icechunk/",
    region="us-west-2",
)
print(f"  snapshot: {hrrr_session.snapshot_id}  dims: {dict(hrrr_ds.sizes)}")

NLDAS_SNAPSHOT = "YTNGFY4WY9189GEH1FNG"
NLDAS_VIRTUAL_SOURCE = "s3://nasa-waterinsight/NLDAS3/forcing/daily/"
print("Opening NLDAS-3...")
nldas_session, nldas_ds = open_icechunk(
    bucket="nasa-waterinsight",
    prefix="virtual-zarr-store/NLDAS-3-icechunk/",
    region="us-west-2",
    virtual_source=NLDAS_VIRTUAL_SOURCE,
    snapshot_id=NLDAS_SNAPSHOT,
)
print(f"  snapshot: {nldas_session.snapshot_id}  dims: {dict(nldas_ds.sizes)}")

Opening GFS...


  snapshot: SE71EVA7GS4NTKJTZ1S0  dims: {'init_time': 7031, 'lead_time': 209, 'latitude': 721, 'longitude': 1440}
Opening HRRR...


  snapshot: 1K76B6MAM3VRB2X31WZG  dims: {'init_time': 11122, 'lead_time': 49, 'y': 1059, 'x': 1799}
Opening NLDAS-3...


  snapshot: YTNGFY4WY9189GEH1FNG  dims: {'time': 8399, 'lat': 6500, 'lon': 11700}


In [5]:
import pystac

gfs_snap = gfs_session.snapshot_id
hrrr_snap = hrrr_session.snapshot_id

gfs_item_dict = build_stac_item_from_icechunk(
    gfs_ds,
    item_id=f"noaa-gfs-forecast-{gfs_snap.lower()}",
    icechunk_href="s3://dynamical-noaa-gfs/noaa-gfs-forecast/v0.2.7.icechunk/",
    snapshot_id=gfs_snap,
    storage_schemes={"aws-s3-dynamical-noaa-gfs": {
        "type": "aws-s3", "bucket": "dynamical-noaa-gfs",
        "region": "us-west-2", "anonymous": True,
    }},
    title="NOAA GFS Forecast (dynamical.org)",
    providers=[pystac.Provider(name="dynamical.org", roles=["producer","processor","host"], url="https://dynamical.org")],
    virtual=False,
    temporal_dimension="init_time", x_dimension="longitude", y_dimension="latitude",
)

hrrr_crs = hrrr_ds.rio.crs
hrrr_item_dict = build_stac_item_from_icechunk(
    hrrr_ds,
    item_id=f"noaa-hrrr-forecast-48-hour-{hrrr_snap.lower()}",
    icechunk_href="s3://dynamical-noaa-hrrr/noaa-hrrr-forecast-48-hour/v0.1.0.icechunk/",
    snapshot_id=hrrr_snap,
    storage_schemes={"aws-s3-dynamical-noaa-hrrr": {
        "type": "aws-s3", "bucket": "dynamical-noaa-hrrr",
        "region": "us-west-2", "anonymous": True,
    }},
    title="NOAA HRRR 48-Hour Forecast (dynamical.org)",
    providers=[pystac.Provider(name="dynamical.org", roles=["producer","processor","host"], url="https://dynamical.org")],
    virtual=False,
    temporal_dimension="init_time", x_dimension="x", y_dimension="y",
    reference_system=hrrr_crs.to_epsg() or hrrr_crs.to_wkt(),
)

nldas_item_dict = build_stac_item_from_icechunk(
    nldas_ds,
    item_id=f"nldas-3-virtual-zarr-{NLDAS_SNAPSHOT.lower()}",
    icechunk_href="s3://nasa-waterinsight/virtual-zarr-store/NLDAS-3-icechunk/",
    snapshot_id=NLDAS_SNAPSHOT,
    storage_schemes={"aws-s3-nasa-waterinsight": {
        "type": "aws-s3", "bucket": "nasa-waterinsight",
        "region": "us-west-2", "anonymous": True,
    }},
    title="NLDAS-3 Virtual Zarr Store",
    providers=[pystac.Provider(name="NLDAS", roles=["producer","processor","licensor"], url="https://ldas.gsfc.nasa.gov/nldas")],
    virtual=True,
    virtual_hrefs=[NLDAS_VIRTUAL_SOURCE],
    temporal_dimension="time", x_dimension="lon", y_dimension="lat",
)

print("Items built:")
for d in [gfs_item_dict, hrrr_item_dict, nldas_item_dict]:
    print(f"  {d['id']}  bbox={d['bbox']}")

Items built:
  noaa-gfs-forecast-se71eva7gs4ntkjtz1s0  bbox=[-180.0, -90.125, 180.0, 90.0]
  noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg  bbox=[-134.12142793280145, 21.122192719272277, -60.891244531606546, 52.62870335266728]
  nldas-3-virtual-zarr-ytngfy4wy9189geh1fng  bbox=[-168.9949951171875, 7.005000114440918, -52.00499725341797, 71.9949951171875]


## 4. Assemble the pystac Catalog

We use `SELF_CONTAINED` so all links are relative — the catalog can be
moved between buckets/prefixes without breaking.

In [6]:
catalog = pystac.Catalog(
    id="weather-forecast-icechunk",
    description=(
        "Public weather forecast datasets stored as Icechunk repositories on AWS S3. "
        "All items can be opened directly with xarray via xpystac."
    ),
    catalog_type=pystac.CatalogType.SELF_CONTAINED,
)

for item_dict in [gfs_item_dict, hrrr_item_dict, nldas_item_dict]:
    # from_dict preserves top-level extra_fields (e.g. storage:schemes)
    catalog.add_item(pystac.Item.from_dict(item_dict))

print(f"Catalog assembled with {len(list(catalog.get_items()))} items.")

Catalog assembled with 3 items.


## 5. Save catalog locally

Write all JSON files to the local staging directory for inspection before upload.

In [7]:
catalog.normalize_hrefs(str(LOCAL_CATALOG_DIR))
catalog.save()

print(f"Catalog saved locally to: {LOCAL_CATALOG_DIR}")
print()
for f in sorted(LOCAL_CATALOG_DIR.rglob("*.json")):
    rel = f.relative_to(LOCAL_CATALOG_DIR)
    size = f.stat().st_size
    print(f"  {rel}  ({size:,} bytes)")

Catalog saved locally to: /tmp/stac-catalog-7ngilfxh

  catalog.json  (900 bytes)
  nldas-3-virtual-zarr-ytngfy4wy9189geh1fng/nldas-3-virtual-zarr-ytngfy4wy9189geh1fng.json  (9,405 bytes)
  noaa-gfs-forecast-se71eva7gs4ntkjtz1s0/noaa-gfs-forecast-se71eva7gs4ntkjtz1s0.json  (19,048 bytes)
  noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg/noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg.json  (20,312 bytes)


## 6. Upload to S3-compatible storage

Upload all local JSON files using an AWS profile that holds the endpoint URL
and credentials — nothing sensitive is hardcoded here.

The profile is read from `~/.aws/config` / `~/.aws/credentials`.  Example
`~/.aws/config` entry for an R2-compatible store:

```ini
[profile osc-r2]
endpoint_url = https://...
aws_access_key_id = ...
aws_secret_access_key = ...
```

The public HTTPS catalog root is derived automatically from the profile's
`endpoint_url` — this works when the S3 API endpoint is also the public
read endpoint (e.g. Cloudflare R2, Ceph with public access).

In [8]:
CATALOG_BUCKET = "osc"
CATALOG_PREFIX = "stac/dynamical"
AWS_PROFILE    = "osc-r2"   # profile in ~/.aws/credentials with endpoint_url + creds

# Derive the public HTTPS base URL from the profile's endpoint_url + bucket.
# Checks ~/.aws/config (section "profile NAME") then ~/.aws/credentials (section "NAME").
import configparser, os as _os

def _get_endpoint(profile):
    for path, section in [
        ("~/.aws/config",      f"profile {profile}"),
        ("~/.aws/credentials", profile),
    ]:
        cfg = configparser.ConfigParser()
        cfg.read(_os.path.expanduser(path))
        if cfg.has_option(section, "endpoint_url"):
            return cfg.get(section, "endpoint_url").rstrip("/")
    raise KeyError(f"endpoint_url not found for profile '{profile}'")

_endpoint = _get_endpoint(AWS_PROFILE)
CATALOG_ROOT_HTTPS = f"{_endpoint}/{CATALOG_BUCKET}/{CATALOG_PREFIX}"
print(f"Public catalog root: {CATALOG_ROOT_HTTPS}")

# endpoint_url and credentials are read from the profile — never hardcoded here
fs = s3fs.S3FileSystem(profile=AWS_PROFILE)

print(f"\nUploading to s3://{CATALOG_BUCKET}/{CATALOG_PREFIX} (profile: {AWS_PROFILE}) ...")
for local_file in sorted(LOCAL_CATALOG_DIR.rglob("*.json")):
    rel = local_file.relative_to(LOCAL_CATALOG_DIR)
    s3_dest = f"{CATALOG_BUCKET}/{CATALOG_PREFIX}/{rel}"
    fs.put(str(local_file), s3_dest)
    print(f"  {rel}  →  s3://{s3_dest}")

print("\nUpload complete.")

Public catalog root: https://9cbdcb4884f86a6779032ae561e474a5.r2.cloudflarestorage.com/osc/stac/dynamical

Uploading to s3://osc/stac/dynamical (profile: osc-r2) ...


  catalog.json  →  s3://osc/stac/dynamical/catalog.json


  nldas-3-virtual-zarr-ytngfy4wy9189geh1fng/nldas-3-virtual-zarr-ytngfy4wy9189geh1fng.json  →  s3://osc/stac/dynamical/nldas-3-virtual-zarr-ytngfy4wy9189geh1fng/nldas-3-virtual-zarr-ytngfy4wy9189geh1fng.json


  noaa-gfs-forecast-se71eva7gs4ntkjtz1s0/noaa-gfs-forecast-se71eva7gs4ntkjtz1s0.json  →  s3://osc/stac/dynamical/noaa-gfs-forecast-se71eva7gs4ntkjtz1s0/noaa-gfs-forecast-se71eva7gs4ntkjtz1s0.json


  noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg/noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg.json  →  s3://osc/stac/dynamical/noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg/noaa-hrrr-forecast-48-hour-1k76b6mam3vrb2x31wzg.json

Upload complete.


## 7. Browse with STAC Browser

The [Radiant Earth hosted STAC Browser](https://radiantearth.github.io/stac-browser/)
is a fully client-side app — paste any public STAC catalog URL and it
renders it in the browser with no backend required.

In [9]:
catalog_url = f"{CATALOG_ROOT_HTTPS}/catalog.json"
browser_url = f"https://radiantearth.github.io/stac-browser/#/external/{catalog_url}"

print("Catalog URL:")
print(f"  {catalog_url}")
print()
print("STAC Browser URL:")
print(f"  {browser_url}")

from IPython.display import display, Markdown
display(Markdown(f"[Open in STAC Browser]({browser_url})"))

Catalog URL:
  https://9cbdcb4884f86a6779032ae561e474a5.r2.cloudflarestorage.com/osc/stac/dynamical/catalog.json

STAC Browser URL:
  https://radiantearth.github.io/stac-browser/#/external/https://9cbdcb4884f86a6779032ae561e474a5.r2.cloudflarestorage.com/osc/stac/dynamical/catalog.json


[Open in STAC Browser](https://radiantearth.github.io/stac-browser/#/external/https://9cbdcb4884f86a6779032ae561e474a5.r2.cloudflarestorage.com/osc/stac/dynamical/catalog.json)

## 8. Verify round-trip from catalog

Reload the catalog from S3 via its public HTTPS URL and confirm each item
can be opened with xpystac.

In [10]:
import xpystac  # noqa: F401 — registers xarray backend

loaded_catalog = pystac.Catalog.from_file(catalog_url)

for item in loaded_catalog.get_items():
    asset_key = next(k for k in item.assets if "@" in k)
    asset = item.assets[asset_key]
    ds = xr.open_dataset(asset)
    print(f"{item.id}")
    print(f"  dims:  {dict(ds.sizes)}")
    print(f"  bbox:  {item.bbox}")
    print()

JSONDecodeError: Expecting value: line 1 column 1 (char 0)