# Build a Static STAC Catalog and Browse with STAC Browser

This notebook builds STAC items for three public icechunk stores, assembles
them into a static STAC catalog, saves it **locally**, uploads it to S3 with
fsspec, and produces a [STAC Browser](https://github.com/radiantearth/stac-browser)
URL for immediate browsing — no server required.

## Architecture

```
s3://BUCKET/PREFIX/
  catalog.json
  noaa-gfs-forecast-{snapshot}/noaa-gfs-forecast-{snapshot}.json
  noaa-hrrr-forecast-48-hour-{snapshot}/noaa-hrrr-forecast-48-hour-{snapshot}.json
  nldas-3-virtual-zarr-{snapshot}/nldas-3-virtual-zarr-{snapshot}.json
```

The catalog uses `SELF_CONTAINED` relative links so it can be moved between
buckets/prefixes without breaking.  STAC Browser is a client-side Vue.js app
hosted by Radiant Earth — it fetches and renders the catalog directly from S3
with no backend.

## Install dependencies
```
pip install icechunk xarray zarr pystac xstac rioxarray s3fs cloudify
```

In [None]:
import json
import tempfile
import warnings
from pathlib import Path

import icechunk
import pystac
import rioxarray
import s3fs
import xarray as xr

from cloudify.stac import build_stac_item_from_icechunk

warnings.filterwarnings(
    "ignore",
    message="Numcodecs codecs are not in the Zarr version 3 specification*",
    category=UserWarning,
)

## 1. Configure the output catalog location

`CATALOG_BUCKET` and `CATALOG_PREFIX` are set in **Step 6** just before the upload,
so you can inspect the local JSON first.  Set up your AWS credentials in the
environment before running Step 6 (env vars, `~/.aws/credentials`, or instance profile).

## 2. Configure local staging directory

The catalog is first written to a local temp directory, then uploaded to S3.
This lets you inspect the JSON before it goes live and avoids partial writes
on the remote.

In [None]:
LOCAL_CATALOG_DIR = Path(tempfile.mkdtemp(prefix="stac-catalog-"))
print(f"Local staging directory: {LOCAL_CATALOG_DIR}")

## 3. Build STAC items for each dataset

In [None]:
def open_icechunk(bucket, prefix, region, anonymous=True,
                  virtual_source=None, snapshot_id=None, branch="main"):
    """Open an icechunk repo and return (session, ds)."""
    storage = icechunk.s3_storage(
        bucket=bucket, prefix=prefix, region=region, anonymous=anonymous
    )
    config = icechunk.RepositoryConfig.default()
    repo_kwargs = {"storage": storage, "config": config}

    if virtual_source:
        config.set_virtual_chunk_container(
            icechunk.VirtualChunkContainer(
                virtual_source, icechunk.s3_store(region=region)
            )
        )
        repo_kwargs["authorize_virtual_chunk_access"] = icechunk.containers_credentials(
            {virtual_source: icechunk.s3_anonymous_credentials()}
        )

    repo = icechunk.Repository.open(**repo_kwargs)
    session = repo.readonly_session(
        snapshot_id=snapshot_id if snapshot_id else branch
    ) if snapshot_id else repo.readonly_session(branch=branch)
    ds = xr.open_zarr(session.store, chunks=None, consolidated=False, zarr_format=3)
    return session, ds

In [None]:
print("Opening GFS...")
gfs_session, gfs_ds = open_icechunk(
    bucket="dynamical-noaa-gfs",
    prefix="noaa-gfs-forecast/v0.2.7.icechunk/",
    region="us-west-2",
)
print(f"  snapshot: {gfs_session.snapshot_id}  dims: {dict(gfs_ds.sizes)}")

print("Opening HRRR...")
hrrr_session, hrrr_ds = open_icechunk(
    bucket="dynamical-noaa-hrrr",
    prefix="noaa-hrrr-forecast-48-hour/v0.1.0.icechunk/",
    region="us-west-2",
)
print(f"  snapshot: {hrrr_session.snapshot_id}  dims: {dict(hrrr_ds.sizes)}")

NLDAS_SNAPSHOT = "YTNGFY4WY9189GEH1FNG"
NLDAS_VIRTUAL_SOURCE = "s3://nasa-waterinsight/NLDAS3/forcing/daily/"
print("Opening NLDAS-3...")
nldas_session, nldas_ds = open_icechunk(
    bucket="nasa-waterinsight",
    prefix="virtual-zarr-store/NLDAS-3-icechunk/",
    region="us-west-2",
    virtual_source=NLDAS_VIRTUAL_SOURCE,
    snapshot_id=NLDAS_SNAPSHOT,
)
print(f"  snapshot: {nldas_session.snapshot_id}  dims: {dict(nldas_ds.sizes)}")

In [None]:
import pystac

gfs_snap = gfs_session.snapshot_id
hrrr_snap = hrrr_session.snapshot_id

gfs_item_dict = build_stac_item_from_icechunk(
    gfs_ds,
    item_id=f"noaa-gfs-forecast-{gfs_snap.lower()}",
    icechunk_href="s3://dynamical-noaa-gfs/noaa-gfs-forecast/v0.2.7.icechunk/",
    snapshot_id=gfs_snap,
    storage_schemes={"aws-s3-dynamical-noaa-gfs": {
        "type": "aws-s3", "bucket": "dynamical-noaa-gfs",
        "region": "us-west-2", "anonymous": True,
    }},
    title="NOAA GFS Forecast (dynamical.org)",
    providers=[pystac.Provider(name="dynamical.org", roles=["producer","processor","host"], url="https://dynamical.org")],
    virtual=False,
    temporal_dimension="init_time", x_dimension="longitude", y_dimension="latitude",
)

hrrr_crs = hrrr_ds.rio.crs
hrrr_item_dict = build_stac_item_from_icechunk(
    hrrr_ds,
    item_id=f"noaa-hrrr-forecast-48-hour-{hrrr_snap.lower()}",
    icechunk_href="s3://dynamical-noaa-hrrr/noaa-hrrr-forecast-48-hour/v0.1.0.icechunk/",
    snapshot_id=hrrr_snap,
    storage_schemes={"aws-s3-dynamical-noaa-hrrr": {
        "type": "aws-s3", "bucket": "dynamical-noaa-hrrr",
        "region": "us-west-2", "anonymous": True,
    }},
    title="NOAA HRRR 48-Hour Forecast (dynamical.org)",
    providers=[pystac.Provider(name="dynamical.org", roles=["producer","processor","host"], url="https://dynamical.org")],
    virtual=False,
    temporal_dimension="init_time", x_dimension="x", y_dimension="y",
    reference_system=hrrr_crs.to_epsg() or hrrr_crs.to_wkt(),
)

nldas_item_dict = build_stac_item_from_icechunk(
    nldas_ds,
    item_id=f"nldas-3-virtual-zarr-{NLDAS_SNAPSHOT.lower()}",
    icechunk_href="s3://nasa-waterinsight/virtual-zarr-store/NLDAS-3-icechunk/",
    snapshot_id=NLDAS_SNAPSHOT,
    storage_schemes={"aws-s3-nasa-waterinsight": {
        "type": "aws-s3", "bucket": "nasa-waterinsight",
        "region": "us-west-2", "anonymous": True,
    }},
    title="NLDAS-3 Virtual Zarr Store",
    providers=[pystac.Provider(name="NLDAS", roles=["producer","processor","licensor"], url="https://ldas.gsfc.nasa.gov/nldas")],
    virtual=True,
    virtual_hrefs=[NLDAS_VIRTUAL_SOURCE],
    temporal_dimension="time", x_dimension="lon", y_dimension="lat",
)

print("Items built:")
for d in [gfs_item_dict, hrrr_item_dict, nldas_item_dict]:
    print(f"  {d['id']}  bbox={d['bbox']}")

## 4. Assemble the pystac Catalog

We use `SELF_CONTAINED` so all links are relative — the catalog can be
moved between buckets/prefixes without breaking.

In [None]:
catalog = pystac.Catalog(
    id="weather-forecast-icechunk",
    description=(
        "Public weather forecast datasets stored as Icechunk repositories on AWS S3. "
        "All items can be opened directly with xarray via xpystac."
    ),
    catalog_type=pystac.CatalogType.SELF_CONTAINED,
)

for item_dict in [gfs_item_dict, hrrr_item_dict, nldas_item_dict]:
    # from_dict preserves top-level extra_fields (e.g. storage:schemes)
    catalog.add_item(pystac.Item.from_dict(item_dict))

print(f"Catalog assembled with {len(list(catalog.get_items()))} items.")

## 5. Save catalog locally

Write all JSON files to the local staging directory for inspection before upload.

In [None]:
catalog.normalize_hrefs(str(LOCAL_CATALOG_DIR))
catalog.save()

print(f"Catalog saved locally to: {LOCAL_CATALOG_DIR}")
print()
for f in sorted(LOCAL_CATALOG_DIR.rglob("*.json")):
    rel = f.relative_to(LOCAL_CATALOG_DIR)
    size = f.stat().st_size
    print(f"  {rel}  ({size:,} bytes)")

## 6. Upload to S3

Upload all local JSON files to S3, preserving the directory structure.
Set `CATALOG_BUCKET` and `CATALOG_PREFIX` to a publicly-readable S3 location
you have write access to.  The bucket must allow public `s3:GetObject` so
STAC Browser can fetch the files from the browser.

AWS credentials are read from the environment (env vars, `~/.aws/credentials`,
or an instance profile).  For a public-read upload, set the bucket ACL or
bucket policy separately.

In [None]:
CATALOG_BUCKET = "my-public-bucket"       # ← change me
CATALOG_PREFIX = "stac/weather-forecasts"  # ← change me

CATALOG_ROOT_HTTPS = f"https://{CATALOG_BUCKET}.s3.amazonaws.com/{CATALOG_PREFIX}"

fs = s3fs.S3FileSystem()   # reads credentials from env / ~/.aws/credentials

print(f"Uploading to s3://{CATALOG_BUCKET}/{CATALOG_PREFIX} ...")
for local_file in sorted(LOCAL_CATALOG_DIR.rglob("*.json")):
    rel = local_file.relative_to(LOCAL_CATALOG_DIR)
    s3_dest = f"{CATALOG_BUCKET}/{CATALOG_PREFIX}/{rel}"
    fs.put(str(local_file), s3_dest)
    print(f"  {rel}  →  s3://{s3_dest}")

print("\nUpload complete.")

## 7. Browse with STAC Browser

The [Radiant Earth hosted STAC Browser](https://radiantearth.github.io/stac-browser/)
is a fully client-side app — paste any public STAC catalog URL and it
renders it in the browser with no backend required.

In [None]:
catalog_url = f"{CATALOG_ROOT_HTTPS}/catalog.json"
browser_url = f"https://radiantearth.github.io/stac-browser/#/external/{catalog_url}"

print("Catalog URL:")
print(f"  {catalog_url}")
print()
print("STAC Browser URL:")
print(f"  {browser_url}")

from IPython.display import display, Markdown
display(Markdown(f"[Open in STAC Browser]({browser_url})"))

## 8. Verify round-trip from catalog

Reload the catalog from S3 via its public HTTPS URL and confirm each item
can be opened with xpystac.

In [None]:
import xpystac  # noqa: F401 — registers xarray backend

loaded_catalog = pystac.Catalog.from_file(catalog_url)

for item in loaded_catalog.get_items():
    asset_key = next(k for k in item.assets if "@" in k)
    asset = item.assets[asset_key]
    ds = xr.open_dataset(asset)
    print(f"{item.id}")
    print(f"  dims:  {dict(ds.sizes)}")
    print(f"  bbox:  {item.bbox}")
    print()