# Converting a Cloud-Optimized GeoTIFF to Zarr

This notebook walks through a complete end-to-end workflow: opening a remote Cloud-Optimized GeoTIFF, extracting geospatial metadata, writing a multi-resolution Zarr V3 store with all three conventions (proj:, spatial:, multiscales), and validating the result.

We use the Sentinel-2 L2A TCI (true-color) band from the [async-geotiff example](https://github.com/developmentseed/async-geotiff#example).

**Prerequisites:** [The proj: Convention](proj-convention.ipynb) | [Composition](composition.ipynb)

## Step 1: Open the remote COG

We use [async-geotiff](https://github.com/developmentseed/async-geotiff) to open the Cloud-Optimized GeoTIFF directly from S3. The GeoTIFF object exposes the geospatial properties we need — `crs`, `transform`, `bounds`, and `shape` — without reading any pixel data.

In [1]:
import json

from geozarr_toolkit import (
    MultiscalesConventionMetadata,
    ProjConventionMetadata,
    SpatialConventionMetadata,
    create_multiscales_layout,
    create_proj_attrs,
    create_spatial_attrs,
    create_zarr_conventions,
)

In [2]:
# Set to True to write to S3, False to use a local store
USE_S3 = False

In [3]:
from async_geotiff import GeoTIFF
from obstore.store import S3Store

store = S3Store("sentinel-cogs", region="us-west-2", skip_signature=True)
path = "sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/TCI.tif"

geotiff = await GeoTIFF.open(path, store=store)

print(f"CRS:       {geotiff.crs}")
print(f"Transform: {geotiff.transform}")
print(f"Shape:     {geotiff.shape}")
print(f"Bounds:    {geotiff.bounds}")
print(f"Bands:     {geotiff.count}")
print(f"Dtype:     {geotiff.dtype}")

CRS:       EPSG:32612
Transform: | 10.00, 0.00, 300000.00|
| 0.00,-10.00, 4100040.00|
| 0.00, 0.00, 1.00|
Shape:     (10980, 10980)
Bounds:    (300000.0, 3990240.0, 409800.0, 4100040.0)
Bands:     3
Dtype:     uint8


### Step 2: Build convention metadata from the COG

The GeoTIFF's properties map directly to convention attributes. The COG also contains internal **overviews** (reduced-resolution copies) which map naturally to the **multiscales** convention — each overview becomes a scale level.

- `geotiff.crs.to_epsg()` → `proj:code`
- `geotiff.transform` (Affine coefficients) → `spatial:transform`
- `geotiff.shape` → `spatial:shape`
- `geotiff.bounds` → `spatial:bbox`
- `geotiff.overviews` → `multiscales` layout

In [4]:
# Build proj: and spatial: attributes from the GeoTIFF's properties
t = geotiff.transform

geozarr_attrs = create_proj_attrs(code=f"EPSG:{geotiff.crs.to_epsg()}")
geozarr_attrs.update(
    create_spatial_attrs(
        dimensions=["Y", "X"],
        bbox=list(geotiff.bounds),
    )
)

# Build multiscales layout from the COG's overviews
# The base (full-resolution) image is level 0; each overview is a coarser level.
base_res = t.a  # pixel width of the base level
levels = [
    {"asset": "0", "transform": {"scale": [1.0, 1.0], "translation": [0.0, 0.0]}},
]
for i, overview in enumerate(geotiff.overviews):
    ov_res = overview.transform.a
    scale_factor = ov_res / base_res
    levels.append(
        {
            "asset": str(i + 1),
            "derived_from": "0",
            "transform": {
                "scale": [scale_factor, scale_factor],
                "translation": [0.0, 0.0],
            },
        }
    )

geozarr_attrs.update(create_multiscales_layout(levels))
geozarr_attrs["zarr_conventions"] = create_zarr_conventions(
    MultiscalesConventionMetadata(),
    ProjConventionMetadata(),
    SpatialConventionMetadata(),
)

print(f"Base resolution: {base_res} m")
print(f"Overview levels: {len(geotiff.overviews)}")
for i, overview in enumerate(geotiff.overviews):
    print(
        f"  Overview {i+1}: {overview.width}x{overview.height} px, {overview.transform.a:.1f} m/px"
    )
print()
print(json.dumps(geozarr_attrs, indent=2))

Base resolution: 10.0 m
Overview levels: 4
  Overview 1: 5490x5490 px, 20.0 m/px
  Overview 2: 2745x2745 px, 40.0 m/px
  Overview 3: 1373x1373 px, 80.0 m/px
  Overview 4: 687x687 px, 159.8 m/px

{
  "proj:code": "EPSG:32612",
  "spatial:dimensions": [
    "Y",
    "X"
  ],
  "spatial:bbox": [
    300000.0,
    3990240.0,
    409800.0,
    4100040.0
  ],
  "spatial:transform_type": "affine",
  "spatial:registration": "pixel",
  "multiscales": {
    "layout": [
      {
        "asset": "0",
        "transform": {
          "scale": [
            1.0,
            1.0
          ],
          "translation": [
            0.0,
            0.0
          ]
        }
      },
      {
        "asset": "1",
        "derived_from": "0",
        "transform": {
          "scale": [
            2.0,
            2.0
          ],
          "translation": [
            0.0,
            0.0
          ]
        }
      },
      {
        "asset": "2",
        "derived_from": "0",
        "transform": {
   

### Step 3: Read and write to Zarr V3 with multiscales

We read the full-resolution image and each overview, writing them as separate child arrays in a Zarr V3 store. Set `USE_S3` above to control the output destination:

- **`USE_S3 = True`**: writes to a remote S3 bucket via obstore's `S3Store`
- **`USE_S3 = False`**: writes to a local directory via Zarr's `LocalStore`

In [5]:
import zarr
from zarr.storage import LocalStore, ObjectStore

bucket = "us-west-2.opendata.source.coop"
prefix = "pangeo/geozarr-examples/TCI.zarr"
local_path = "data/TCI.zarr"

if USE_S3:
    output_store = S3Store(bucket, prefix=prefix, region="us-west-2")
    zarr_store = ObjectStore(output_store)
else:
    zarr_store = LocalStore(local_path)

root: zarr.Group = zarr.open_group(zarr_store, mode="w", zarr_format=3)

# Set convention attributes on the group
root.attrs.update(geozarr_attrs)

# Write the full-resolution image as level "0"
base_array = await geotiff.read()
root.create_array("0", data=base_array.data, chunks=(3, 512, 512))
print(f"Level 0 (base): shape={base_array.data.shape}, dtype={base_array.data.dtype}")

# Write each overview as a separate level
for i, overview in enumerate(geotiff.overviews):
    ov_array = await overview.read()
    root.create_array(str(i + 1), data=ov_array.data, chunks=(3, 512, 512))
    print(f"Level {i+1} (overview): shape={ov_array.data.shape}")

location = f"s3://{bucket}/{prefix}" if USE_S3 else local_path
print(f"\nWrote Zarr V3 store to {location}")

Level 0 (base): shape=(3, 10980, 10980), dtype=uint8
Level 1 (overview): shape=(3, 5490, 5490)
Level 2 (overview): shape=(3, 2745, 2745)
Level 3 (overview): shape=(3, 1373, 1373)
Level 4 (overview): shape=(3, 687, 687)

Wrote Zarr V3 store to data/TCI.zarr


### Step 4: Validate the Zarr store

We reopen the store and use `validate_group` to confirm the conventions are correctly applied.

In [6]:
from geozarr_toolkit import detect_conventions, validate_group

# Reopen and validate
if USE_S3:
    read_store = S3Store(bucket, prefix=prefix, region="us-west-2", skip_signature=True)
    zarr_store = ObjectStore(read_store)
else:
    zarr_store = LocalStore(local_path)

root = zarr.open_group(zarr_store, mode="r")

detected = detect_conventions(dict(root.attrs))
print(f"Detected conventions: {detected}")

results = validate_group(root)
for conv, errors in results.items():
    status = "PASS" if not errors else "FAIL"
    print(f"  [{status}] {conv}")
    for err in errors:
        print(f"         {err}")

print(f"\nStore tree:")
root.tree()

Detected conventions: ['spatial', 'proj', 'multiscales']
  [PASS] spatial
  [PASS] proj
  [PASS] multiscales
  [PASS] zarr_conventions

Store tree:


## Summary

This notebook demonstrated the full workflow from COG to convention-compliant Zarr V3:

1. **Open** a remote COG with async-geotiff (no pixel data read)
2. **Extract** CRS, transform, bounds, and overview structure
3. **Map** these properties to proj:, spatial:, and multiscales convention attributes
4. **Write** the full image and all overview levels to a remote Zarr V3 store on S3
5. **Validate** that the store conforms to all three conventions

The same pattern applies to any georeferenced raster — the convention attributes are derived from standard geospatial properties that every GeoTIFF provides.