fix: add support for forecast datasets #84

0xSwego · 2025-11-26T15:22:46Z

Summary by CodeRabbit

New Features
- Added configurable retry with exponential backoff for transient shard load failures.
Bug Fixes
- Improved filtering of coordinate/metadata keys to avoid misclassification.
- Resize now only triggers for valid primary-array shape changes; metadata lacking shape no longer forces resize.
- Metadata deletion made idempotent (repeated deletes do not error).
Tests
- Added tests for forecast coordinate handling, idempotent metadata deletion, and non-resize when shape is absent.
Chores
- Version bumped to 3.3.1.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-26T15:22:56Z

Walkthrough

Expanded excluded coordinate keys, made metadata-triggered resize conservative and thread-safe (delegating to resize_store), added retry with exponential backoff to full-shard fetch, made metadata deletion idempotent, added tests for forecast/step coords and idempotent delete, and bumped version to 3.3.1.

Changes

Cohort / File(s)	Change Summary
Core store enhancements `py_hamt/sharded_zarr_store.py`	Added `forecast_reference_time` and `step` to `_parse_chunk_key` exclusions; tightened `set` resize trigger to only run for `zarr.json` updates (not equal to `zarr.json`), extract shape, verify dimensionality, use lock + double-check, then call `resize_store(new_shape=...)`; metadata deletion is idempotent (no KeyError if missing; mark dirty only when item existed); `_fetch_and_cache_full_shard(...)` now accepts `max_retries` and `retry_delay`, implements retry with exponential backoff and raises `RuntimeError` after exhaustion; minor comment/formatting tweaks.
Tests — additions `tests/test_sharded_zarr_store.py`	Added async tests: `test_sharded_zarr_store_forecast_step_coordinates` (4D dataset with `forecast_reference_time` & `step` coords), `test_coordinate_metadata_delete_idempotent` (repeated metadata delete is idempotent), `test_metadata_without_shape_does_not_resize` (metadata missing shape does not trigger resize). Added `json` and `math` imports.
Tests — removals / expectations changed `tests/test_sharded_store_resizing.py`, `tests/test_sharded_zarr_store_coverage.py`, `tests/test_sharded_store_deleting.py`	Removed/updated assertions that previously expected errors when metadata lacked a shape or when deleting nonexistent metadata; tests now accept no-resize and idempotent-delete behavior.
Version bump `pyproject.toml`	Project version bumped from `3.3.0` to `3.3.1`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Focus areas:
- _fetch_and_cache_full_shard: retry/backoff correctness, exception handling, and behavior when retries are exhausted.
- set resize path: correct shape extraction, dimensionality checks, lock + double-check semantics, and resize_store(new_shape=...) invocation.
- delete metadata path: idempotence, _dirty signaling, and compatibility with other delete paths.
- Updated/added async tests: ensure assertions reflect new behavior and are robust.

Possibly related PRs

feat: Sharding #68 — Overlaps sharded_zarr_store.py changes (parsing, set/delete, shard fetching); likely semantic overlap or merge conflicts.
Bug fixes #40 — Similar change making metadata deletion idempotent by swallowing KeyError when target is missing.

Suggested reviewers

TheGreatAlgo

Poem

🐰
I nibbled keys through shard and byte,
Backoff hums beneath the night,
Steps and forecasts tucked in rows,
Deletes that shrug and softly close,
A tiny hop for zarr delight. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: add support for forecast datasets' directly addresses the main purpose of the changeset, which adds support for forecast datasets by filtering forecast_reference_time and step coordinates, adding comprehensive tests, and ensuring metadata handling works correctly.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch forecast-ds-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

py_hamt/sharded_zarr_store.py (2)

299-337: Critical: Event cleanup missing on retry exhaustion.

The retry logic is well-designed with exponential backoff, but there's a critical flaw: if all retries fail, the _pending_shard_loads event is never cleaned up (lines 323-325 only execute on success). This leaves waiting coroutines blocked indefinitely on line 426.

Apply this diff to ensure cleanup on all paths:

     ) -> None:
         """
         Fetch a shard from CAS and cache it, with retry logic for transient errors.
 
         Args:
             shard_idx: The index of the shard to fetch.
             shard_cid: The CID of the shard.
             max_retries: Maximum number of retry attempts for transient errors.
             retry_delay: Delay between retry attempts in seconds.
         """
+        try:
-        for attempt in range(max_retries):
-            try:
-                shard_data_bytes = await self.cas.load(shard_cid)
-                decoded_shard = dag_cbor.decode(shard_data_bytes)
-                if not isinstance(decoded_shard, list):
-                    raise TypeError(f"Shard {shard_idx} did not decode to a list.")
-                await self._shard_data_cache.put(shard_idx, decoded_shard)
-                # Always set the Event to unblock waiting coroutines
-                if shard_idx in self._pending_shard_loads:
-                    self._pending_shard_loads[shard_idx].set()
-                    del self._pending_shard_loads[shard_idx]
-                return  # Success
-            except (ConnectionError, TimeoutError) as e:
-                # Handle transient errors (e.g., network issues)
-                if attempt < max_retries - 1:
-                    await asyncio.sleep(
-                        retry_delay * (2**attempt)
-                    )  # Exponential backoff
-                    continue
-                else:
-                    raise RuntimeError(
-                        f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
-                    )
+            for attempt in range(max_retries):
+                try:
+                    shard_data_bytes = await self.cas.load(shard_cid)
+                    decoded_shard = dag_cbor.decode(shard_data_bytes)
+                    if not isinstance(decoded_shard, list):
+                        raise TypeError(f"Shard {shard_idx} did not decode to a list.")
+                    await self._shard_data_cache.put(shard_idx, decoded_shard)
+                    return  # Success
+                except (ConnectionError, TimeoutError) as e:
+                    # Handle transient errors (e.g., network issues)
+                    if attempt < max_retries - 1:
+                        await asyncio.sleep(
+                            retry_delay * (2**attempt)
+                        )  # Exponential backoff
+                        continue
+                    else:
+                        raise RuntimeError(
+                            f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
+                        )
+        finally:
+            # Always clean up pending load, whether success or failure
+            if shard_idx in self._pending_shard_loads:
+                self._pending_shard_loads[shard_idx].set()
+                del self._pending_shard_loads[shard_idx]

1-1: Run ruff format to fix formatting issues.

The pipeline reports that ruff-format reformatted 2 files. Run pre-commit run --all-files locally to apply formatting changes and re-commit.

#!/bin/bash
# Apply ruff formatting to fix the pipeline failure
ruff format py_hamt/sharded_zarr_store.py tests/test_sharded_zarr_store.py

Based on pipeline failure logs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d7b6e9 and a4f6a49.

📒 Files selected for processing (2)

py_hamt/sharded_zarr_store.py (3 hunks)
tests/test_sharded_zarr_store.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

tests/test_sharded_zarr_store.py
py_hamt/sharded_zarr_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

tests/test_sharded_zarr_store.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

tests/test_sharded_zarr_store.py
py_hamt/sharded_zarr_store.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

📚 Learning: 2025-09-15T21:16:14.768Z

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/{store,hamt,zarr_hamt_store,encryption_hamt_store}.py : Make all storage-related operations async to accommodate IPFS/network calls

Applied to files:

tests/test_sharded_zarr_store.py

📚 Learning: 2025-09-15T21:16:14.768Z

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

Applied to files:

py_hamt/sharded_zarr_store.py

🪛 GitHub Actions: Triggered on push from 0xSwego to branch/tag forecast-ds-support

tests/test_sharded_zarr_store.py

[error] 1-1: pre-commit: ruff-format reformatted 2 files. The hook reported a formatting change and the commit/tests may fail until changes are staged. Run 'pre-commit run --all-files' locally to apply and re-commit.

py_hamt/sharded_zarr_store.py

[error] 1-1: pre-commit: ruff-format reformatted 2 files. The hook reported a formatting change and the commit/tests may fail until changes are staged. Run 'pre-commit run --all-files' locally to apply and re-commit.

🪛 Ruff (0.14.5)

tests/test_sharded_zarr_store.py

138-138: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (4)

tests/test_sharded_zarr_store.py (2)

2-2: LGTM!

The math import is used appropriately at line 138 for the math.ceil calculation.

80-151: Good test coverage for forecast coordinate handling.

The test properly exercises the new forecast coordinate support:

Creates a dataset with forecast_reference_time and step coordinates

Verifies that the primary array geometry remains unchanged after writing (lines 136-139)

Confirms round-trip data integrity

Validates that coordinate arrays are stored and retrievable

This addresses the PR objective of adding support for forecast datasets.

py_hamt/sharded_zarr_store.py (2)

343-351: Appropriate expansion of coordinate exclusion list.

Adding forecast_reference_time and step to the excluded array prefixes correctly treats them as coordinate/metadata arrays rather than primary chunk data, enabling proper handling of forecast datasets.

630-656: Well-designed resize guard with proper TOCTOU protection.

The tightened resize logic correctly:

Filters out root zarr.json from resize consideration (line 632)

Validates dimensional compatibility before attempting resize (lines 639-641, 645-648)

Double-checks conditions after lock acquisition to prevent race conditions

Ensures _resize_complete event is always set via try/finally (lines 651-655)

The dual validation (before and after lock acquisition) properly handles concurrent resize attempts.

tests/test_sharded_zarr_store.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tests/test_sharded_zarr_store.py (1)
135-137: Add strict=True to zip() for safety.

This was flagged in a previous review. The zip() call should use strict=True to catch dimension mismatches between array_shape_tuple and chunk_shape_tuple.

Apply this diff:
         assert store_write._chunks_per_dim == tuple(
-            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
+            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple, strict=True)
         )
Based on static analysis hints from Ruff (B905).

🧹 Nitpick comments (1)

py_hamt/sharded_zarr_store.py (1)
630-653: Resize logic improvements look solid.

The tightened conditions and dimension validation prevent inappropriate resizes and race conditions. The double-check pattern after lock acquisition is a good defensive practice.

For consistency, consider using != instead of not ==:
-        if key.endswith("zarr.json") and not key == "zarr.json":
+        if key.endswith("zarr.json") and key != "zarr.json":

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4f6a49 and bf44cae.

📒 Files selected for processing (2)

py_hamt/sharded_zarr_store.py (3 hunks)
tests/test_sharded_zarr_store.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

tests/test_sharded_zarr_store.py
py_hamt/sharded_zarr_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

tests/test_sharded_zarr_store.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

tests/test_sharded_zarr_store.py
py_hamt/sharded_zarr_store.py

🧠 Learnings (2)

📚 Learning: 2025-09-15T21:16:14.768Z

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/{store,hamt,zarr_hamt_store,encryption_hamt_store}.py : Make all storage-related operations async to accommodate IPFS/network calls

Applied to files:

tests/test_sharded_zarr_store.py

📚 Learning: 2025-09-15T21:16:14.768Z

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

Applied to files:

py_hamt/sharded_zarr_store.py

🧬 Code graph analysis (1)

tests/test_sharded_zarr_store.py (2)

py_hamt/sharded_zarr_store.py (2)

flush (523-566)

exists (687-698)

py_hamt/zarr_hamt_store.py (2)

read_only (108-111)

exists (193-199)

🪛 Ruff (0.14.5)

tests/test_sharded_zarr_store.py

136-136: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

py_hamt/sharded_zarr_store.py

634-634: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Create project environment, run all checks (3.12)

🔇 Additional comments (3)

py_hamt/sharded_zarr_store.py (1)

343-351: LGTM! Forecast coordinate support added correctly.

The addition of forecast_reference_time and step to the exclusion list ensures these coordinate arrays are properly treated as metadata rather than chunked data, which aligns with the PR objective to support forecast datasets.

tests/test_sharded_zarr_store.py (2)

2-2: LGTM! Import necessary for test assertion.

The math import is appropriately used in the new test for calculating expected chunks per dimension.

80-149: Comprehensive test for forecast dataset support.

This test effectively validates the PR objective by:

Creating a dataset with forecast-specific coordinates (forecast_reference_time and step)

Verifying data integrity through write/read cycles

Confirming coordinate arrays are properly stored and retrievable

The test structure is clear and the assertions are appropriate.

codecov-commenter · 2025-11-26T15:30:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (5d7b6e9) to head (c8dadf6).

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #84   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            7         7           
  Lines         1310      1308    -2     
=========================================
- Hits          1310      1308    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tests/test_sharded_zarr_store.py (1)
80-149: Use zip(..., strict=True) in the chunk-geometry assertion to catch shape mismatches.

The new forecast/step test looks good and nicely validates that writing a 4D forecast dataset doesn’t perturb the primary array geometry. To tighten the assertion and satisfy Ruff’s B905 recommendation, make the zip explicit about dimensional agreement:
-        assert store_write._array_shape == array_shape_tuple
-        assert store_write._chunks_per_dim == tuple(
-            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
-        )
+        assert store_write._array_shape == array_shape_tuple
+        assert store_write._chunks_per_dim == tuple(
+            math.ceil(a / c)
+            for a, c in zip(array_shape_tuple, chunk_shape_tuple, strict=True)
+        )
That way, any accidental mismatch in the length of array_shape_tuple and chunk_shape_tuple will raise instead of silently truncating.
#!/bin/bash
# Quick grep to ensure all zip(array_shape_tuple, chunk_shape_tuple) calls
# in the repo use strict=True.
rg -n "zip\(array_shape_tuple,\s*chunk_shape_tuple" --type=py -C2

🧹 Nitpick comments (3)

py_hamt/sharded_zarr_store.py (3)
299-337: Clean up _pending_shard_loads on terminal fetch failure to avoid unnecessary timeouts.

The retry/backoff around cas.load is good, but when the final ConnectionError/TimeoutError occurs you raise RuntimeError without clearing _pending_shard_loads[shard_idx]. Callers coming through _load_or_initialize_shard_cache will then block up to 60 seconds on an event that is never set before hitting the timeout path that finally cleans it up.

You can unblock waiters immediately and avoid relying on the timeout branch by clearing the pending entry before raising on the last attempt:
@@
-        for attempt in range(max_retries):
+        for attempt in range(max_retries):
             try:
@@
-                return  # Success
+                return  # Success
             except (ConnectionError, TimeoutError) as e:
                 # Handle transient errors (e.g., network issues)
-                if attempt < max_retries - 1:
+                if attempt < max_retries - 1:
                     await asyncio.sleep(
                         retry_delay * (2**attempt)
                     )  # Exponential backoff
                     continue
-                else:
-                    raise RuntimeError(
-                        f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
-                    )
+                else:
+                    # Ensure waiters on this shard index are unblocked on terminal failure.
+                    if shard_idx in self._pending_shard_loads:
+                        self._pending_shard_loads[shard_idx].set()
+                        del self._pending_shard_loads[shard_idx]
+                    raise RuntimeError(
+                        f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
+                    )
343-373: Expanded excluded_array_prefixes correctly captures forecast coordinate arrays.

Including "forecast_reference_time" and "step" (plus "latitude"/"longitude") in excluded_array_prefixes matches the forecast/step coordinate behavior exercised in the new tests and keeps those coordinate arrays in the metadata namespace rather than the main chunk grid, which is what exists()/delete() expect.

If you ever profile this hot, you could consider hoisting excluded_array_prefixes to a module- or class-level constant instead of rebuilding the set on every _parse_chunk_key call, but that’s purely a micro-optimisation.

630-653: Resize trigger narrowing in set() looks correct; consider DRYing the shape check.

Restricting the resize path to keys ending with "zarr.json" but not the root "zarr.json", and only when the new shape has the same rank as _array_shape and actually differs, is a good way to prevent coordinate metadata (e.g., 1D coords) from resharding the main index. The double-check inside _resize_lock is also a nice concurrency guard.

To reduce duplication and make the intent a bit clearer, you could factor the repeated condition into a small helper or local needs_resize boolean:
-            if not new_array_shape:
+            if not new_array_shape:
                 raise ValueError("Shape not found in metadata.")
-            # Only resize when the metadata shape represents the primary array.
-            if (
-                len(new_array_shape) == len(self._array_shape)
-                and tuple(new_array_shape) != self._array_shape
-            ):
+            # Only resize when the metadata shape represents the primary array.
+            needs_resize = (
+                len(new_array_shape) == len(self._array_shape)
+                and tuple(new_array_shape) != self._array_shape
+            )
+            if needs_resize:
                 async with self._resize_lock:
@@
-                    if (
-                        len(new_array_shape) == len(self._array_shape)
-                        and tuple(new_array_shape) != self._array_shape
-                    ):
+                    if needs_resize:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13ffbda and 970ed9b.

📒 Files selected for processing (2)

py_hamt/sharded_zarr_store.py (4 hunks)
tests/test_sharded_zarr_store.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

py_hamt/sharded_zarr_store.py
tests/test_sharded_zarr_store.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

py_hamt/sharded_zarr_store.py
tests/test_sharded_zarr_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

tests/test_sharded_zarr_store.py

🧠 Learnings (1)

📚 Learning: 2025-09-15T21:16:14.768Z

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

Applied to files:

py_hamt/sharded_zarr_store.py

🧬 Code graph analysis (1)

tests/test_sharded_zarr_store.py (1)

py_hamt/sharded_zarr_store.py (5)

ShardedZarrStore (142-874)

open (210-241)

flush (523-566)

exists (687-698)

delete (712-731)

🪛 GitHub Actions: Triggered on push from 0xSwego to branch/tag forecast-ds-support

tests/test_sharded_zarr_store.py

[error] 165-178: ruff-format reformatted the file during pre-commit and the hook failed. Run 'pre-commit run --all-files' to re-check formatting.

🪛 Ruff (0.14.6)

py_hamt/sharded_zarr_store.py

634-634: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_sharded_zarr_store.py

136-136: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (3)

py_hamt/sharded_zarr_store.py (1)

716-721: Idempotent metadata/coordinate delete behavior is a good fit for callers.

Making the metadata branch of delete() use pop(key, None) and only toggling _dirty_root when something was actually removed gives you idempotent semantics for coordinate/metadata deletions, which matches how the new tests exercise repeated deletes without errors and is in line with common store APIs.

tests/test_sharded_zarr_store.py (2)

151-197: Good targeted test for coordinate-metadata delete idempotence.

test_coordinate_metadata_delete_idempotent nicely exercises repeated deletes for "forecast_reference_time/c/0" and "step/c/0", matching the new _parse_chunk_key exclusions and idempotent metadata delete logic in the store. This should guard against regressions where coordinate deletions start raising again.

1-10: The file is already properly formatted with no uncommitted changes required.

The file tests/test_sharded_zarr_store.py currently has proper formatting that complies with ruff-format standards:

Imports are correctly organized (stdlib → third-party → local with blank line separators)

No trailing whitespace or formatting issues detected

Working tree is clean with no uncommitted changes

The review comment's suggestion to run pre-commit and commit diffs is unnecessary, as the file already meets all formatting requirements.

…the primary array rank

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tests/test_sharded_zarr_store.py (1)
81-138: Tighten zip() with strict=True in chunk geometry assertion

This test is a solid coverage of 4D forecast datasets and correctly keeps the store geometry aligned with the Dataset. The only concern is the zip() call in the _chunks_per_dim assertion:
assert store_write._chunks_per_dim == tuple(
    math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
)
To satisfy Ruff B905 and also surface any future dimension mismatches between array_shape_tuple and chunk_shape_tuple, make the zip strict:
-        assert store_write._chunks_per_dim == tuple(
-            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
-        )
+        assert store_write._chunks_per_dim == tuple(
+            math.ceil(a / c)
+            for a, c in zip(array_shape_tuple, chunk_shape_tuple, strict=True)
+        )
This keeps the test aligned with the intended invariants and avoids silent truncation if the tuples ever diverge. The async usage here also matches the fully async storage APIs for IPFS-backed stores. Based on learnings, ...

🧹 Nitpick comments (1)

tests/test_sharded_zarr_store.py (1)

152-197: Idempotent delete test is good; optional: assert post-conditions

The test correctly exercises coordinate delete idempotence (double-deleting the same keys without raising), which matches the store’s intended behavior for metadata/chunk deletions.

If you want to make this stronger, you could optionally assert that the coordinate chunks are actually gone after the first delete (e.g., via await store.exists("forecast_reference_time/c/0") is False) before re-deleting, but that’s not strictly necessary for catching regressions in idempotence itself.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 970ed9b and 1927b60.

📒 Files selected for processing (4)

py_hamt/sharded_zarr_store.py (4 hunks)
tests/test_sharded_store_resizing.py (0 hunks)
tests/test_sharded_zarr_store.py (2 hunks)
tests/test_sharded_zarr_store_coverage.py (1 hunks)

💤 Files with no reviewable changes (1)

tests/test_sharded_store_resizing.py

🚧 Files skipped from review as they are similar to previous changes (1)

py_hamt/sharded_zarr_store.py

🧰 Additional context used

📓 Path-based instructions (3)

{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

tests/test_sharded_zarr_store_coverage.py
tests/test_sharded_zarr_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

tests/test_sharded_zarr_store_coverage.py
tests/test_sharded_zarr_store.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

tests/test_sharded_zarr_store_coverage.py
tests/test_sharded_zarr_store.py

🧠 Learnings (1)

📚 Learning: 2025-09-15T21:16:14.768Z

Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/{store,hamt,zarr_hamt_store,encryption_hamt_store}.py : Make all storage-related operations async to accommodate IPFS/network calls

Applied to files:

tests/test_sharded_zarr_store.py

🧬 Code graph analysis (1)

tests/test_sharded_zarr_store.py (2)

py_hamt/sharded_zarr_store.py (3)

flush (523-566)

exists (689-700)

delete (714-733)

py_hamt/zarr_hamt_store.py (3)

read_only (108-111)

exists (193-199)

delete (236-248)

🪛 Ruff (0.14.6)

tests/test_sharded_zarr_store.py

137-137: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Create project environment, run all checks (3.12)

🔇 Additional comments (3)

tests/test_sharded_zarr_store_coverage.py (1)

208-211: Coverage for metadata-without-shape path looks good

Exercising store.set() with shape-less metadata here is enough for this coverage test, since the geometry invariants are asserted in test_metadata_without_shape_does_not_resize.

tests/test_sharded_zarr_store.py (2)

2-3: New imports are appropriate and used

json and math are both exercised in the new tests below (metadata encoding and chunk-geometry assertions), so these additions are correct and won’t trip unused-import linting.

198-220: Metadata-without-shape resize guard is well covered

This test cleanly verifies the new behavior: writing a zarr.json metadata blob without a shape field no longer triggers resizing, and _array_shape / _chunks_per_dim stay unchanged. Using JSON encoding via json.dumps and the buffer prototype matches how other metadata paths operate, so this test is aligned with the implementation.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/test_sharded_zarr_store.py (3)

81-152: Forecast/step coordinate test covers the new dataset shape & metadata behavior well

This test does a good job exercising a realistic 4‑D forecast layout, verifying both round‑trip integrity and that forecast_reference_time/step coordinate chunks are persisted as metadata. The geometry assertions using _array_shape and _chunks_per_dim (with zip(..., strict=True)) give strong protection against regressions in chunk layout.

If you expect the internal representation of _array_shape / _chunks_per_dim to evolve, consider adding a small public helper or factory in ShardedZarrStore for these invariants so tests depend less on private attributes. As per coding guidelines, this keeps tests resilient to internal refactors.

153-197: Idempotent coordinate metadata delete behavior is validated; consider asserting final state

The test correctly verifies that repeated deletes of forecast_reference_time and step coordinate metadata complete without error, matching the intended idempotent delete implementation.

For a slightly stronger check, you could also assert that await store.exists("forecast_reference_time/c/0") and await store.exists("step/c/0") are False after the deletions (optionally after a flush() and reopen) to confirm state as well as behavior.

199-221: Metadata-without-shape regression is covered; note reliance on private internals

This test cleanly captures the regression condition that metadata JSON lacking a shape must not trigger a resize or error, and the assertions on _array_shape and _chunks_per_dim make the intent explicit.

If _array_shape / _chunks_per_dim are likely to change or be encapsulated further, consider asserting via a small public accessor or helper instead of private attributes so the test remains stable across internal refactors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 052442e and c8dadf6.

📒 Files selected for processing (1)

tests/test_sharded_zarr_store.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

tests/test_sharded_zarr_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

tests/test_sharded_zarr_store.py

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

tests/test_sharded_zarr_store.py

🧬 Code graph analysis (1)

tests/test_sharded_zarr_store.py (1)

py_hamt/sharded_zarr_store.py (5)

ShardedZarrStore (142-876)

open (210-241)

flush (523-566)

exists (689-700)

delete (714-733)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Create project environment, run all checks (3.12)

🔇 Additional comments (1)

tests/test_sharded_zarr_store.py (1)

2-3: New stdlib imports are appropriate and used correctly

json and math are both used in the new tests and placed correctly with other standard-library imports; no issues here.

fix: add support for forecast datasets

a4f6a49

0xSwego requested a review from TheGreatAlgo November 26, 2025 15:22

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

tests/test_sharded_zarr_store.py Outdated Show resolved Hide resolved

0xSwego added 2 commits November 26, 2025 15:25

fix: formatting

bf44cae

chore: bump version

13ffbda

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

fix: do not raise error when trying to delete nonexistent metadata

970ed9b

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

0xSwego added 3 commits November 27, 2025 15:24

fix: only attempt a resize when a shape field is present and matches …

a1e68d5

…the primary array rank

fix: format

1927b60

fix: update tests. Now deleting nonexistent metadata key is idempotent

052442e

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

TheGreatAlgo approved these changes Nov 28, 2025

View reviewed changes

fix: add coderabbit suggestion

c8dadf6

0xSwego merged commit 9de8375 into main Dec 1, 2025
2 of 3 checks passed

0xSwego deleted the forecast-ds-support branch December 1, 2025 13:27

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add support for forecast datasets #84

fix: add support for forecast datasets #84

0xSwego commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov-commenter commented Nov 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: add support for forecast datasets #84

fix: add support for forecast datasets #84

Conversation

0xSwego commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0xSwego commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

codecov-commenter commented Nov 26, 2025 •

edited

Loading