# GeoZarr Quickstart: S3 Access & RGB Visualization

**Load cloud-optimized GeoZarr from S3, inspect embedded metadata, create RGB composites.**

**Setup:** `uv sync --extra notebooks` + AWS credentials  
**Dataset:** Sentinel-2 L2A tile (10m bands), pyramids 0-4, STAC-embedded

## 1. Setup

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

# Configure display settings
xr.set_options(display_style="text", display_width=100)

## 2. S3 Credentials (auto-detect from K8s secret or env vars)

In [None]:
import base64
import os
import subprocess
from pathlib import Path

# Find kubectl (search PATH and common locations)
kubectl_locations = [
    "kubectl",  # Use PATH
    "/opt/homebrew/bin/kubectl",  # Homebrew Apple Silicon
    "/usr/local/bin/kubectl",  # Homebrew Intel / Linux
    "/usr/bin/kubectl",  # System (Linux)
    str(Path.home() / ".local/bin/kubectl"),  # User install (Linux)
]
kubectl = next((k for k in kubectl_locations if k == "kubectl" or Path(k).exists()), "kubectl")

# Auto-detect kubeconfig (relative to notebook location or environment)
kubeconfig_paths = [
    Path.cwd().parent / ".work/kubeconfig",  # Relative: ../work/kubeconfig from notebooks/
    Path(os.getenv("KUBECONFIG", "")),  # Environment variable
    Path.home() / ".kube/config",  # Default kubectl location
]
kubeconfig = next((str(p) for p in kubeconfig_paths if p.exists()), None)

# Try to fetch S3 credentials from Kubernetes if missing
if (not os.getenv("AWS_SECRET_ACCESS_KEY") or not os.getenv("AWS_ACCESS_KEY_ID")) and kubeconfig:
    try:
        for key in ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"]:
            result = subprocess.run(
                [
                    kubectl,
                    "get",
                    "secret",
                    "geozarr-s3-credentials",
                    "-n",
                    "devseed",
                    "-o",
                    f"jsonpath={{.data.{key}}}",
                ],
                env={"KUBECONFIG": kubeconfig},
                capture_output=True,
                text=True,
                timeout=5,
            )
            if result.returncode == 0 and result.stdout:
                os.environ[key] = base64.b64decode(result.stdout).decode()
    except Exception:
        pass

# Set default endpoint (matches pipeline configuration in augment_stac_item.py)
if not os.getenv("AWS_ENDPOINT_URL"):
    os.environ["AWS_ENDPOINT_URL"] = "https://s3.de.io.cloud.ovh.net"

# Verify credentials
required_env_vars = {
    "AWS_ACCESS_KEY_ID": os.getenv("AWS_ACCESS_KEY_ID"),
    "AWS_SECRET_ACCESS_KEY": os.getenv("AWS_SECRET_ACCESS_KEY"),
    "AWS_ENDPOINT_URL": os.getenv("AWS_ENDPOINT_URL"),
}

missing = [k for k, v in required_env_vars.items() if not v and k != "AWS_ENDPOINT_URL"]

if missing:
    print("\n❌ Missing AWS credentials!")
    print(f"   Required: {', '.join(missing)}\n")
    print("📖 Manual setup:")
    print("   export AWS_ACCESS_KEY_ID='your-key'")
    print("   export AWS_SECRET_ACCESS_KEY='your-secret'")
    print("\n📖 Or get from Kubernetes:")
    if kubeconfig:
        print(f"   export KUBECONFIG='{kubeconfig}'")
        print("   kubectl get secret geozarr-s3-credentials -n devseed -o json")
    print("\n   See notebooks/README.md for detailed setup instructions")
else:
    print(f"✅ AWS configured: {required_env_vars['AWS_ENDPOINT_URL']}")

## 3. Load RGB bands (level 4 pyramid: 686×686px, ~3.6MB/band)

In [None]:
import dask.array as da
import s3fs
import zarr

# S3 dataset path
s3_base = "s3://esa-zarr-sentinel-explorer-fra/tests-output/sentinel-2-l2a/S2B_MSIL2A_20250921T100029_N0511_R122_T33TUG_20250921T135752.zarr"

# Open S3 filesystem
fs = s3fs.S3FileSystem(anon=False, client_kwargs={"endpoint_url": os.getenv("AWS_ENDPOINT_URL")})

# Load RGB bands at level 4 (overview) with Dask
bands = {}
level = 4
for band_name, band_id in [("Blue", "b02"), ("Green", "b03"), ("Red", "b04")]:
    band_path = f"{s3_base[5:]}/measurements/reflectance/r10m/{level}/{band_id}"
    store = s3fs.S3Map(root=band_path, s3=fs)
    z_array = zarr.open(store, mode="r")
    bands[band_name] = xr.DataArray(da.from_zarr(store), dims=["y", "x"], attrs=dict(z_array.attrs))

# Combine into dataset
ds = xr.Dataset(bands)
print(f"✓ Loaded {len(ds.data_vars)} bands at 10m resolution (level {level})")
print(f"  Shape: {ds['Red'].shape}, Size: ~{ds['Red'].nbytes / 1024**2:.1f}MB per band")
ds

## 4. STAC metadata (embedded in .zattrs)

In [None]:
# Access embedded STAC metadata
stac_item = ds.attrs.get("stac_item", {})

print(f"📍 Item: {stac_item.get('id')}")
print(f"📦 Collection: {stac_item.get('collection')}")
print(f"🗓️  Datetime: {stac_item.get('properties', {}).get('datetime')}")
print(f"🌍 Bbox: {stac_item.get('bbox')}")

## 5. Geospatial properties (CRS, resolution, extent)

In [None]:
# Geospatial properties
crs = ds.attrs.get("crs", "Unknown")
x_res = float((ds.x[1] - ds.x[0]).values) if len(ds.x) > 1 else 0
y_res = float((ds.y[1] - ds.y[0]).values) if len(ds.y) > 1 else 0

print(f"🗺️  CRS: {crs}")
print(f"📏 Dimensions: {len(ds.y)}×{len(ds.x)} pixels")
print(f"🔍 Resolution: {abs(x_res):.1f}m × {abs(y_res):.1f}m")

## 6. RGB composite (2-98% percentile stretch)

In [None]:
# Extract RGB bands
red = ds["Red"].values
green = ds["Green"].values
blue = ds["Blue"].values


# Normalize with percentile stretch
def normalize(band):
    band = np.nan_to_num(band, nan=0)
    p2, p98 = np.percentile(band[np.isfinite(band)], [2, 98])
    return np.clip((band - p2) / (p98 - p2), 0, 1)


rgb = np.dstack([normalize(red), normalize(green), normalize(blue)])

# Plot
fig, ax = plt.subplots(figsize=(12, 10))
ax.imshow(rgb, aspect="auto")
ax.set_title("Sentinel-2 True Color RGB Composite (10m, level 4)", fontsize=14, fontweight="bold")
ax.set_xlabel("X (pixels)", fontsize=11)
ax.set_ylabel("Y (pixels)", fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 7. Single band visualization + stats

In [None]:
# Plot single band
band_name = list(ds.data_vars)[0]
band_data = ds[band_name]

fig, ax = plt.subplots(figsize=(12, 10))
im = ax.imshow(band_data.values, cmap="viridis", aspect="auto")
ax.set_title(f"Band: {band_name}", fontsize=14, fontweight="bold")
ax.set_xlabel("X (pixels)", fontsize=11)
ax.set_ylabel("Y (pixels)", fontsize=11)
plt.colorbar(im, ax=ax, label="Reflectance")
plt.tight_layout()
plt.show()

# Statistics
print(
    f"📊 {band_name}: min={np.nanmin(band_data.values):.3f}, max={np.nanmax(band_data.values):.3f}, mean={np.nanmean(band_data.values):.3f}"
)

## Summary

**Demonstrated:** Cloud-optimized S3 access, STAC metadata extraction, RGB visualization

**GeoZarr benefits:**
- Chunked storage → partial reads (no full download)
- Embedded STAC → metadata + data in one place
- Multi-resolution pyramids → fast tile serving
- TiTiler-ready → web map integration

**Next:** `02_pyramid_performance.ipynb` (benchmarks), `03_multi_resolution.ipynb` (pyramid levels)

**Resources:** [STAC API](https://api.explorer.eopf.copernicus.eu/stac) | [Raster Viewer](https://api.explorer.eopf.copernicus.eu/raster/viewer) | [GitHub](https://github.com/EOPF-Explorer/data-pipeline)