Skip to content

feat: Pydantic-zarr V3 model for Sentinel-1 GRD γ0T RTC stores#138

Merged
emmanuelmathot merged 9 commits intos1-rtcfrom
feat/s1-rtc-pydantic-model
Mar 26, 2026
Merged

feat: Pydantic-zarr V3 model for Sentinel-1 GRD γ0T RTC stores#138
emmanuelmathot merged 9 commits intos1-rtcfrom
feat/s1-rtc-pydantic-model

Conversation

@emmanuelmathot
Copy link
Copy Markdown
Contributor

@emmanuelmathot emmanuelmathot commented Mar 23, 2026

What

Pydantic-zarr V3 schema for Sentinel-1 GRD γ0T RTC time-series stores on the MGRS grid. This follows the exact same pattern as the existing S2 model

Why

We're building an S1 GRD RTC pipeline that ingests S1Tiling GeoTIFFs into GeoZarr V3 stores. This model defines the expected store structure so we can validate outputs at write time and in CI.

Store hierarchy

s1-grd-rtc-{tile}.zarr/
├── ascending/                  # orbit-direction group
│   ├── r10m/                   # native resolution (10 980 × 10 980)
│   │   ├── vv, vh              # (time, Y, X) float32 — sharded
│   │   ├── border_mask         # (time, Y, X) uint8
│   │   ├── time, absolute_orbit, relative_orbit, platform
│   ├── r20m … r720m            # overview levels (vv, vh, border_mask only)
│   └── conditions/
│       └── gamma_area_{orbit}  # (Y, X) float32
└── descending/                 # same structure

Key design choices (looking for feedback on)

  1. pyz.v3 GroupSpec/ArraySpec — mirrors the pyz.v2 pattern used by S2 but wraps pydantic_zarr.v3. TypedDict members with closed=True, total=False enforce allowed keys while keeping optional groups flexible.

  2. zarr_conventions UUIDs — orbit-direction groups carry multiscales, geo_proj, and spatial convention UUIDs via zarr_cm. Validated with a model_validator.

  3. Sharding codecs — native arrays use sharding_indexed with inner chunks of 366 (≈ 1 year of acquisitions along time axis). The model validates codec structure is present but doesn't constrain inner chunk sizes.

  4. Conditions as a sub-groupconditions/gamma_area_{orbit} arrays are (Y, X) float32 at native resolution, one per relative orbit. These are static geometric metadata, not time-varying.

  5. Overview levelsr20m through r720m carry only the data arrays (vv, vh, border_mask), not coordinate arrays.

Files

File Description
src/eopf_geozarr/data_api/s1_rtc.py Model classes (316 lines)
tests/_test_data/s1_rtc_examples/s1-grd-rtc-31TCH.json Realistic JSON fixture
tests/test_data_api/test_s1_rtc.py 11 tests (round-trip, structure, 5 negative cases)
tests/conftest.py Added fixture wiring

How to review

Start with s1_rtc.py — the docstring at the top shows the full hierarchy. The JSON fixture is machine-generated but representative of real S1Tiling output over MGRS tile 31TCH.

All existing S2 tests still pass.

@emmanuelmathot emmanuelmathot requested a review from d-v-b March 23, 2026 14:50
@emmanuelmathot emmanuelmathot marked this pull request as ready for review March 24, 2026 07:03
@emmanuelmathot emmanuelmathot force-pushed the feat/s1-rtc-pydantic-model branch from bc5fc0c to 8cc7df1 Compare March 24, 2026 16:11
@emmanuelmathot
Copy link
Copy Markdown
Contributor Author

Addressed reviewer feedback:

  • Replaced dict[str, Any] with typed zcm.MultiscalesS1RtcOrbitGroupAttrs.multiscales now uses from eopf_geozarr.data_api.geozarr.multiscales.zcm import Multiscales instead of inline models or untyped dict
  • Removed inline MultiscalesTransform, MultiscalesScaleLevel, Multiscales classes (no longer needed)
  • Updated test assertions to use Pydantic model attribute access (attrs.multiscales.layout instead of attrs.multiscales['layout'])
  • Rebased on main (now requires Python 3.12, incorporating PR Set the minimum supported python version to 3.12 #141)
  • Kept extra='allow' as discussed

All 11 S1 RTC tests pass. Pre-commit clean (only pre-existing mypy issues in analysis/ scripts).

@emmanuelmathot emmanuelmathot requested review from d-v-b and removed request for d-v-b March 24, 2026 16:21
Comment thread src/eopf_geozarr/data_api/s1_rtc.py Outdated
model_config = {"extra": "allow", "populate_by_name": True, "serialize_by_alias": True}

@model_validator(mode="after")
def validate_shape(self) -> Self:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you annotate spatial_shape as tuple[int, int] instead of list[int], then pydantic will check the length automatically

Comment thread src/eopf_geozarr/data_api/s1_rtc.py Outdated
return self

@model_validator(mode="after")
def validate_transform(self) -> Self:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above -- you can use the type annotation tuple[float, float, float, float, float, float] to declare that it must have length 6

Comment thread src/eopf_geozarr/data_api/s1_rtc.py Outdated
"""One orbit direction (ascending or descending) with multiscale layout."""

@model_validator(mode="after")
def validate_r10m_present(self) -> Self:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if r10m is required, but all the other ones are not, you can use the NotRequired type annotation for the non-required fields of S1RtcOrbitGroupMembers, and remove the total=false from the typeddict definition

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you did that, this validator would not be necessary

def conditions(self) -> S1RtcConditionsGroup | None:
return self.members.get("conditions")

def get_resolution(self, level: ResolutionLevel) -> GroupSpec[Any, Any] | None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably also want a method for listing available resolution levels

Comment thread src/eopf_geozarr/data_api/s1_rtc.py Outdated
Comment on lines +88 to +94
@model_validator(mode="after")
def validate_spatial_dimensions(self) -> Self:
if self.spatial_dimensions != ["y", "x"]:
raise ValueError(
f"spatial:dimensions must be ['y', 'x'], got {self.spatial_dimensions}"
)
return self
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this validator if you make spatial_dimensions: Literal[("y", "x")]

Copy link
Copy Markdown
Contributor

@d-v-b d-v-b Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, Literal[("a", "b")] simplifies to Literal["a", "b"] which is not what we want here. Instead we want tuple[Literal["y"], Literal["x"]]

- Add src/eopf_geozarr/data_api/s1_rtc.py — Zarr V3 Pydantic models for
  S1 GRD γ0T RTC GeoZarr stores, using pyz.v3 GroupSpec/ArraySpec with
  TypedDict members (same pattern as s2.py uses pyz.v2)
- Models: S1RtcRoot, S1RtcOrbitGroup, S1RtcNativeResolutionDataset,
  S1RtcOverviewResolutionDataset, S1RtcConditionsGroup
- Validation: convention UUIDs, spatial:dimensions, multiscales layout,
  required data arrays (vv/vh/border_mask), gamma_area presence
- Add tests/_test_data/s1_rtc_examples/s1-grd-rtc-31TCH.json — realistic
  fixture with 3 timesteps, 6 overview levels, 3 gamma_area conditions
- Add tests/test_data_api/test_s1_rtc.py — 11 tests: round-trip, structure
  validation, negative cases (missing orbit, r10m, UUIDs, etc.)
- Add conftest fixture s1_rtc_json_example parametrized over all fixtures
- Replace dict[str, Any] multiscales field with zcm.Multiscales import
- Remove inline MultiscalesTransform/ScaleLevel/Multiscales classes
- Update test assertions for Pydantic model attribute access
- Add [tool.ruff.lint.flake8-type-checking] runtime-evaluated-base-classes
  for pydantic.BaseModel so Pydantic field type imports aren't flagged
- Remove 4 stale noqa comments auto-fixed by ruff
- spatial_dimensions: tuple[Literal['y'], Literal['x']] (removes validator)
- spatial_bbox: tuple[float, float, float, float] (removes validator)
- spatial_shape: tuple[int, int] (removes validator)
- spatial_transform: tuple[float, ...] x6 (removes validator)
- S1RtcOrbitGroupMembers: r10m required, others NotRequired (removes validator)
- Add resolution_levels() method to S1RtcOrbitGroup
- Apply same tuple types to S1RtcConditionsAttrs
@emmanuelmathot emmanuelmathot force-pushed the feat/s1-rtc-pydantic-model branch from f466006 to a5bd1e1 Compare March 26, 2026 11:11
@emmanuelmathot emmanuelmathot changed the base branch from main to s1-rtc March 26, 2026 14:35
@emmanuelmathot emmanuelmathot merged commit c675b6d into s1-rtc Mar 26, 2026
4 checks passed
emmanuelmathot added a commit that referenced this pull request Apr 2, 2026
* phase 1: S1 RTC Pydantic models aligned with S2 pattern

- Add src/eopf_geozarr/data_api/s1_rtc.py — Zarr V3 Pydantic models for
  S1 GRD γ0T RTC GeoZarr stores, using pyz.v3 GroupSpec/ArraySpec with
  TypedDict members (same pattern as s2.py uses pyz.v2)
- Models: S1RtcRoot, S1RtcOrbitGroup, S1RtcNativeResolutionDataset,
  S1RtcOverviewResolutionDataset, S1RtcConditionsGroup
- Validation: convention UUIDs, spatial:dimensions, multiscales layout,
  required data arrays (vv/vh/border_mask), gamma_area presence
- Add tests/_test_data/s1_rtc_examples/s1-grd-rtc-31TCH.json — realistic
  fixture with 3 timesteps, 6 overview levels, 3 gamma_area conditions
- Add tests/test_data_api/test_s1_rtc.py — 11 tests: round-trip, structure
  validation, negative cases (missing orbit, r10m, UUIDs, etc.)
- Add conftest fixture s1_rtc_json_example parametrized over all fixtures

* refactor: improve Pydantic model definitions and streamline imports in S1 RTC module

* fix: standardize spatial dimensions to lowercase in S1 RTC models and test cases

* refactor: use zcm.Multiscales typed model per reviewer feedback

- Replace dict[str, Any] multiscales field with zcm.Multiscales import
- Remove inline MultiscalesTransform/ScaleLevel/Multiscales classes
- Update test assertions for Pydantic model attribute access

* fix: configure ruff TC001 for Pydantic runtime-evaluated base classes

- Add [tool.ruff.lint.flake8-type-checking] runtime-evaluated-base-classes
  for pydantic.BaseModel so Pydantic field type imports aren't flagged
- Remove 4 stale noqa comments auto-fixed by ruff

* refactor: replace validators with precise type annotations per review

- spatial_dimensions: tuple[Literal['y'], Literal['x']] (removes validator)
- spatial_bbox: tuple[float, float, float, float] (removes validator)
- spatial_shape: tuple[int, int] (removes validator)
- spatial_transform: tuple[float, ...] x6 (removes validator)
- S1RtcOrbitGroupMembers: r10m required, others NotRequired (removes validator)
- Add resolution_levels() method to S1RtcOrbitGroup
- Apply same tuple types to S1RtcConditionsAttrs

* ci: disable temporarly pre-commit job in ci.yml

Comment out pre-commit job in CI workflow

* ci: enable pre-commit checks in CI workflow

* ci: transitive actions/cache@v4 dependency

---------

Co-authored-by: Loïc Houpert <10154151+lhoupert@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants