# The proj: Convention

The **proj:** convention encodes Coordinate Reference System (CRS) information for geospatial data stored in Zarr format. It answers the question *"what coordinate system is this data in?"* using one of three standard encodings.

This notebook covers:

1. The three CRS encoding methods: EPSG code, WKT2, and PROJJSON
2. Convention registration via `zarr_conventions`
3. Validation
4. Converting between CRS formats with pyproj

See also:
- [Inheritance](inheritance.ipynb) — how CRS metadata propagates from groups to arrays
- [Composition](composition.ipynb) — combining proj: with spatial: and multiscales
- [COG to Zarr](cog-to-zarr.ipynb) — end-to-end conversion of a Cloud-Optimized GeoTIFF

## Example Dataset

Throughout this notebook we use a [Sentinel-2 L2A scene](https://registry.opendata.aws/sentinel-2-l2a-cogs/) from the `sentinel-cogs` bucket on AWS as our running example (following the [async-geotiff demo](https://github.com/developmentseed/async-geotiff#example)).

The scene is tile **12/S/UF** acquired on 2022-06-09. Its key geospatial properties are:

| Property | Value |
|---|---|
| CRS | EPSG:32612 (WGS 84 / UTM zone 12N) |
| Pixel size | 10 m (TCI band) |
| Origin | (300000.0, 4100040.0) |
| Dimensions | 10980 rows x 10980 columns |
| Bounding box | 300000.0, 3990240.0, 409800.0, 4100040.0 |

Sentinel-2 is a good example because it has bands at three native resolutions (10 m, 20 m, 60 m) that all share the same CRS — a natural fit for group-level inheritance and multiscale composition.

In [1]:
import json

from pyproj import CRS

# The CRS for our Sentinel-2 scene
crs = CRS.from_epsg(32612)
print(crs)

EPSG:32612


## Overview

The proj: convention defines three properties, all using the `proj:` namespace prefix:

| Property | Type | Description |
|---|---|---|
| `proj:code` | string | Authority:code identifier (e.g., `EPSG:4326`) |
| `proj:wkt2` | string | WKT2 (ISO 19162) CRS representation |
| `proj:projjson` | object | PROJJSON CRS representation |

**Exactly one** of these must be provided. The convention can be applied to both Zarr groups and arrays.

## Method 1: EPSG Code

The simplest way to specify a CRS is with an authority:code identifier. The `proj:code` string follows the pattern `AUTHORITY:CODE` and must match `^[A-Z]+:[0-9]+$`.

Known projection authorities include:

| Authority | Description |
|---|---|
| EPSG | European Petroleum Survey Group |
| IAU | International Astronomical Union (e.g., `IAU_2015:30100`) |
| OGC | Open Geospatial Consortium |
| ESRI | Esri spatial references |

This is the preferred method when a well-known code exists for the CRS, because it's compact and unambiguous.

In [2]:
from geozarr_toolkit import create_proj_attrs

# Our Sentinel-2 scene uses UTM zone 12N
attrs = create_proj_attrs(code="EPSG:32612")
print(json.dumps(attrs, indent=2))

{
  "proj:code": "EPSG:32612"
}


## Method 2: WKT2

WKT2 ([ISO 19162:2019](http://docs.opengeospatial.org/is/12-063r5/12-063r5.html)) provides a full textual CRS representation. It is useful when:

- No valid authority code exists for the CRS
- You need the full CRS definition to be self-contained in the metadata
- The CRS uses custom parameters not captured by a registered code

Here we use pyproj to obtain the WKT2 string for the same Sentinel-2 CRS.

In [3]:
# The same UTM zone 12N CRS, expressed as WKT2
wkt2_string = crs.to_wkt()

attrs = create_proj_attrs(wkt2=wkt2_string)
print(json.dumps(attrs, indent=2))

{
  "proj:wkt2": "PROJCRS[\"WGS 84 / UTM zone 12N\",BASEGEOGCRS[\"WGS 84\",ENSEMBLE[\"World Geodetic System 1984 ensemble\",MEMBER[\"World Geodetic System 1984 (Transit)\"],MEMBER[\"World Geodetic System 1984 (G730)\"],MEMBER[\"World Geodetic System 1984 (G873)\"],MEMBER[\"World Geodetic System 1984 (G1150)\"],MEMBER[\"World Geodetic System 1984 (G1674)\"],MEMBER[\"World Geodetic System 1984 (G1762)\"],MEMBER[\"World Geodetic System 1984 (G2139)\"],MEMBER[\"World Geodetic System 1984 (G2296)\"],ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]],ENSEMBLEACCURACY[2.0]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433]],ID[\"EPSG\",4326]],CONVERSION[\"UTM zone 12N\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8801]],PARAMETER[\"Longitude of natural origin\",-111,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"Scale factor at natural o

## Method 3: PROJJSON

[PROJJSON](https://proj.org/specifications/projjson.html) is a JSON encoding of CRS definitions following the PROJ specification. Since it's a native JSON object, it integrates naturally with Zarr's JSON-based metadata and can be validated against the [PROJJSON schema](https://proj.org/schemas/v0.7/projjson.schema.json).

In [4]:
# The same UTM zone 12N CRS, expressed as PROJJSON
projjson_obj = crs.to_json_dict()

attrs = create_proj_attrs(projjson=projjson_obj)
print(json.dumps(attrs, indent=2))

{
  "proj:projjson": {
    "$schema": "https://proj.org/schemas/v0.7/projjson.schema.json",
    "type": "ProjectedCRS",
    "name": "WGS 84 / UTM zone 12N",
    "base_crs": {
      "name": "WGS 84",
      "datum_ensemble": {
        "name": "World Geodetic System 1984 ensemble",
        "members": [
          {
            "name": "World Geodetic System 1984 (Transit)",
            "id": {
              "authority": "EPSG",
              "code": 1166
            }
          },
          {
            "name": "World Geodetic System 1984 (G730)",
            "id": {
              "authority": "EPSG",
              "code": 1152
            }
          },
          {
            "name": "World Geodetic System 1984 (G873)",
            "id": {
              "authority": "EPSG",
              "code": 1153
            }
          },
          {
            "name": "World Geodetic System 1984 (G1150)",
            "id": {
              "authority": "EPSG",
              "code": 1154
          

All three methods describe the same CRS — the choice depends on your use case:

| Method | When to use |
|---|---|
| `proj:code` | A well-known authority code exists (most common) |
| `proj:wkt2` | Self-contained text representation needed, or no authority code exists |
| `proj:projjson` | JSON-native representation preferred, or detailed CRS structure needed |

## Convention Registration

Every Zarr convention must be registered in the `zarr_conventions` array in the node's attributes. This array identifies which conventions are in use and provides links to their schemas and specifications.

A convention entry must include at least one of `uuid`, `schema_url`, or `spec_url` to be identifiable.

In [5]:
from geozarr_toolkit import ProjConventionMetadata, create_zarr_conventions

conventions = create_zarr_conventions(ProjConventionMetadata())
print(json.dumps(conventions, indent=2))

[
  {
    "uuid": "f17cb550-5864-4468-aeb7-f3180cfb622f",
    "schema_url": "https://raw.githubusercontent.com/zarr-experimental/geo-proj/refs/tags/v1/schema.json",
    "spec_url": "https://github.com/zarr-experimental/geo-proj/blob/v1/README.md",
    "name": "proj:",
    "description": "Coordinate reference system information for geospatial data"
  }
]


The convention entry contains:

- **uuid** (`f17cb550-...`): Permanent identifier for the proj: convention
- **schema_url**: Link to the JSON Schema used for machine validation
- **spec_url**: Link to the human-readable specification
- **name**: The namespace prefix (`proj:`)
- **description**: Brief summary of the convention's purpose

## Putting It Together

Here's what the complete Zarr V3 metadata looks like for a Sentinel-2 group using the proj: convention. This is the structure that would appear in the group's `zarr.json` file.

In [6]:
# Complete zarr.json metadata for the Sentinel-2 TCI group
full_attrs = create_proj_attrs(code="EPSG:32612")
full_attrs["zarr_conventions"] = create_zarr_conventions(ProjConventionMetadata())

zarr_metadata = {
    "zarr_format": 3,
    "node_type": "group",
    "attributes": full_attrs,
}

print(json.dumps(zarr_metadata, indent=2))

{
  "zarr_format": 3,
  "node_type": "group",
  "attributes": {
    "proj:code": "EPSG:32612",
    "zarr_conventions": [
      {
        "uuid": "f17cb550-5864-4468-aeb7-f3180cfb622f",
        "schema_url": "https://raw.githubusercontent.com/zarr-experimental/geo-proj/refs/tags/v1/schema.json",
        "spec_url": "https://github.com/zarr-experimental/geo-proj/blob/v1/README.md",
        "name": "proj:",
        "description": "Coordinate reference system information for geospatial data"
      }
    ]
  }
}


## Validation

The `validate_proj` helper checks that attributes conform to the convention. It returns a `(is_valid, errors)` tuple. The key rule is that **exactly one** of `proj:code`, `proj:wkt2`, or `proj:projjson` must be present.

In [7]:
from geozarr_toolkit import validate_proj

# Valid: our Sentinel-2 scene's CRS
is_valid, errors = validate_proj({"proj:code": "EPSG:32612"})
print(f"Valid: {is_valid}, Errors: {errors}")

Valid: True, Errors: []


In [8]:
# Invalid: no CRS encoding provided
is_valid, errors = validate_proj({})
print(f"Valid: {is_valid}")
for error in errors:
    print(f"  {error}")

Valid: False
  {'type': 'value_error', 'loc': (), 'msg': 'Value error, At least one of proj:code, proj:wkt2, or proj:projjson must be provided', 'input': {}, 'ctx': {'error': ValueError('At least one of proj:code, proj:wkt2, or proj:projjson must be provided')}, 'url': 'https://errors.pydantic.dev/2.12/v/value_error'}


## Converting Between CRS Formats with pyproj

In practice, [pyproj](https://pyproj4.github.io/pyproj/) makes it easy to start from any CRS representation and produce whichever format the proj: convention requires.

In [9]:
# All three representations of the Sentinel-2 scene's CRS
print("proj:code")
print(f"  EPSG:{crs.to_epsg()}")
print()
print("proj:wkt2 (truncated)")
print(f"  {crs.to_wkt()[:80]}...")
print()
print("proj:projjson (summary)")
pj = crs.to_json_dict()
print(f"  type: {pj['type']}")
print(f"  name: {pj['name']}")
print(f"  keys: {list(pj.keys())}")

proj:code
  EPSG:32612

proj:wkt2 (truncated)
  PROJCRS["WGS 84 / UTM zone 12N",BASEGEOGCRS["WGS 84",ENSEMBLE["World Geodetic Sy...

proj:projjson (summary)
  type: ProjectedCRS
  name: WGS 84 / UTM zone 12N
  keys: ['$schema', 'type', 'name', 'base_crs', 'conversion', 'coordinate_system', 'scope', 'area', 'bbox', 'id']


## Summary

The proj: convention provides three methods for encoding CRS information in Zarr:

| Method | When to use |
|---|---|
| `proj:code` | A well-known authority code exists (most common) |
| `proj:wkt2` | Self-contained text representation needed |
| `proj:projjson` | JSON-native representation preferred |

Each convention entry is registered in `zarr_conventions` with a UUID, schema URL, and spec URL.

Next: [Inheritance](inheritance.ipynb) | [Composition](composition.ipynb) | [COG to Zarr](cog-to-zarr.ipynb)

For the full specification, see the [proj: convention README](https://github.com/zarr-experimental/geo-proj).