Skip to content

Conversation

@0xSwego
Copy link
Collaborator

@0xSwego 0xSwego commented Nov 26, 2025

Summary by CodeRabbit

  • New Features

    • Added configurable retry with exponential backoff for transient shard load failures.
  • Bug Fixes

    • Improved filtering of coordinate/metadata keys to avoid misclassification.
    • Resize now only triggers for valid primary-array shape changes; metadata lacking shape no longer forces resize.
    • Metadata deletion made idempotent (repeated deletes do not error).
  • Tests

    • Added tests for forecast coordinate handling, idempotent metadata deletion, and non-resize when shape is absent.
  • Chores

    • Version bumped to 3.3.1.

✏️ Tip: You can customize this high-level summary in your review settings.

@0xSwego 0xSwego requested a review from TheGreatAlgo November 26, 2025 15:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

Expanded excluded coordinate keys, made metadata-triggered resize conservative and thread-safe (delegating to resize_store), added retry with exponential backoff to full-shard fetch, made metadata deletion idempotent, added tests for forecast/step coords and idempotent delete, and bumped version to 3.3.1.

Changes

Cohort / File(s) Change Summary
Core store enhancements
py_hamt/sharded_zarr_store.py
Added forecast_reference_time and step to _parse_chunk_key exclusions; tightened set resize trigger to only run for zarr.json updates (not equal to zarr.json), extract shape, verify dimensionality, use lock + double-check, then call resize_store(new_shape=...); metadata deletion is idempotent (no KeyError if missing; mark dirty only when item existed); _fetch_and_cache_full_shard(...) now accepts max_retries and retry_delay, implements retry with exponential backoff and raises RuntimeError after exhaustion; minor comment/formatting tweaks.
Tests — additions
tests/test_sharded_zarr_store.py
Added async tests: test_sharded_zarr_store_forecast_step_coordinates (4D dataset with forecast_reference_time & step coords), test_coordinate_metadata_delete_idempotent (repeated metadata delete is idempotent), test_metadata_without_shape_does_not_resize (metadata missing shape does not trigger resize). Added json and math imports.
Tests — removals / expectations changed
tests/test_sharded_store_resizing.py, tests/test_sharded_zarr_store_coverage.py, tests/test_sharded_store_deleting.py
Removed/updated assertions that previously expected errors when metadata lacked a shape or when deleting nonexistent metadata; tests now accept no-resize and idempotent-delete behavior.
Version bump
pyproject.toml
Project version bumped from 3.3.0 to 3.3.1.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • _fetch_and_cache_full_shard: retry/backoff correctness, exception handling, and behavior when retries are exhausted.
    • set resize path: correct shape extraction, dimensionality checks, lock + double-check semantics, and resize_store(new_shape=...) invocation.
    • delete metadata path: idempotence, _dirty signaling, and compatibility with other delete paths.
    • Updated/added async tests: ensure assertions reflect new behavior and are robust.

Possibly related PRs

  • feat: Sharding #68 — Overlaps sharded_zarr_store.py changes (parsing, set/delete, shard fetching); likely semantic overlap or merge conflicts.
  • Bug fixes #40 — Similar change making metadata deletion idempotent by swallowing KeyError when target is missing.

Suggested reviewers

  • TheGreatAlgo

Poem

🐰
I nibbled keys through shard and byte,
Backoff hums beneath the night,
Steps and forecasts tucked in rows,
Deletes that shrug and softly close,
A tiny hop for zarr delight. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: add support for forecast datasets' directly addresses the main purpose of the changeset, which adds support for forecast datasets by filtering forecast_reference_time and step coordinates, adding comprehensive tests, and ensuring metadata handling works correctly.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch forecast-ds-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
py_hamt/sharded_zarr_store.py (2)

299-337: Critical: Event cleanup missing on retry exhaustion.

The retry logic is well-designed with exponential backoff, but there's a critical flaw: if all retries fail, the _pending_shard_loads event is never cleaned up (lines 323-325 only execute on success). This leaves waiting coroutines blocked indefinitely on line 426.

Apply this diff to ensure cleanup on all paths:

     ) -> None:
         """
         Fetch a shard from CAS and cache it, with retry logic for transient errors.
 
         Args:
             shard_idx: The index of the shard to fetch.
             shard_cid: The CID of the shard.
             max_retries: Maximum number of retry attempts for transient errors.
             retry_delay: Delay between retry attempts in seconds.
         """
+        try:
-        for attempt in range(max_retries):
-            try:
-                shard_data_bytes = await self.cas.load(shard_cid)
-                decoded_shard = dag_cbor.decode(shard_data_bytes)
-                if not isinstance(decoded_shard, list):
-                    raise TypeError(f"Shard {shard_idx} did not decode to a list.")
-                await self._shard_data_cache.put(shard_idx, decoded_shard)
-                # Always set the Event to unblock waiting coroutines
-                if shard_idx in self._pending_shard_loads:
-                    self._pending_shard_loads[shard_idx].set()
-                    del self._pending_shard_loads[shard_idx]
-                return  # Success
-            except (ConnectionError, TimeoutError) as e:
-                # Handle transient errors (e.g., network issues)
-                if attempt < max_retries - 1:
-                    await asyncio.sleep(
-                        retry_delay * (2**attempt)
-                    )  # Exponential backoff
-                    continue
-                else:
-                    raise RuntimeError(
-                        f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
-                    )
+            for attempt in range(max_retries):
+                try:
+                    shard_data_bytes = await self.cas.load(shard_cid)
+                    decoded_shard = dag_cbor.decode(shard_data_bytes)
+                    if not isinstance(decoded_shard, list):
+                        raise TypeError(f"Shard {shard_idx} did not decode to a list.")
+                    await self._shard_data_cache.put(shard_idx, decoded_shard)
+                    return  # Success
+                except (ConnectionError, TimeoutError) as e:
+                    # Handle transient errors (e.g., network issues)
+                    if attempt < max_retries - 1:
+                        await asyncio.sleep(
+                            retry_delay * (2**attempt)
+                        )  # Exponential backoff
+                        continue
+                    else:
+                        raise RuntimeError(
+                            f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
+                        )
+        finally:
+            # Always clean up pending load, whether success or failure
+            if shard_idx in self._pending_shard_loads:
+                self._pending_shard_loads[shard_idx].set()
+                del self._pending_shard_loads[shard_idx]

1-1: Run ruff format to fix formatting issues.

The pipeline reports that ruff-format reformatted 2 files. Run pre-commit run --all-files locally to apply formatting changes and re-commit.

#!/bin/bash
# Apply ruff formatting to fix the pipeline failure
ruff format py_hamt/sharded_zarr_store.py tests/test_sharded_zarr_store.py

Based on pipeline failure logs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d7b6e9 and a4f6a49.

📒 Files selected for processing (2)
  • py_hamt/sharded_zarr_store.py (3 hunks)
  • tests/test_sharded_zarr_store.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

  • tests/test_sharded_zarr_store.py
  • py_hamt/sharded_zarr_store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

  • tests/test_sharded_zarr_store.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

  • tests/test_sharded_zarr_store.py
  • py_hamt/sharded_zarr_store.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface
📚 Learning: 2025-09-15T21:16:14.768Z
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/{store,hamt,zarr_hamt_store,encryption_hamt_store}.py : Make all storage-related operations async to accommodate IPFS/network calls

Applied to files:

  • tests/test_sharded_zarr_store.py
📚 Learning: 2025-09-15T21:16:14.768Z
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

Applied to files:

  • py_hamt/sharded_zarr_store.py
🪛 GitHub Actions: Triggered on push from 0xSwego to branch/tag forecast-ds-support
tests/test_sharded_zarr_store.py

[error] 1-1: pre-commit: ruff-format reformatted 2 files. The hook reported a formatting change and the commit/tests may fail until changes are staged. Run 'pre-commit run --all-files' locally to apply and re-commit.

py_hamt/sharded_zarr_store.py

[error] 1-1: pre-commit: ruff-format reformatted 2 files. The hook reported a formatting change and the commit/tests may fail until changes are staged. Run 'pre-commit run --all-files' locally to apply and re-commit.

🪛 Ruff (0.14.5)
tests/test_sharded_zarr_store.py

138-138: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (4)
tests/test_sharded_zarr_store.py (2)

2-2: LGTM!

The math import is used appropriately at line 138 for the math.ceil calculation.


80-151: Good test coverage for forecast coordinate handling.

The test properly exercises the new forecast coordinate support:

  • Creates a dataset with forecast_reference_time and step coordinates
  • Verifies that the primary array geometry remains unchanged after writing (lines 136-139)
  • Confirms round-trip data integrity
  • Validates that coordinate arrays are stored and retrievable

This addresses the PR objective of adding support for forecast datasets.

py_hamt/sharded_zarr_store.py (2)

343-351: Appropriate expansion of coordinate exclusion list.

Adding forecast_reference_time and step to the excluded array prefixes correctly treats them as coordinate/metadata arrays rather than primary chunk data, enabling proper handling of forecast datasets.


630-656: Well-designed resize guard with proper TOCTOU protection.

The tightened resize logic correctly:

  • Filters out root zarr.json from resize consideration (line 632)
  • Validates dimensional compatibility before attempting resize (lines 639-641, 645-648)
  • Double-checks conditions after lock acquisition to prevent race conditions
  • Ensures _resize_complete event is always set via try/finally (lines 651-655)

The dual validation (before and after lock acquisition) properly handles concurrent resize attempts.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_sharded_zarr_store.py (1)

135-137: Add strict=True to zip() for safety.

This was flagged in a previous review. The zip() call should use strict=True to catch dimension mismatches between array_shape_tuple and chunk_shape_tuple.

Apply this diff:

         assert store_write._chunks_per_dim == tuple(
-            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
+            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple, strict=True)
         )

Based on static analysis hints from Ruff (B905).

🧹 Nitpick comments (1)
py_hamt/sharded_zarr_store.py (1)

630-653: Resize logic improvements look solid.

The tightened conditions and dimension validation prevent inappropriate resizes and race conditions. The double-check pattern after lock acquisition is a good defensive practice.

For consistency, consider using != instead of not ==:

-        if key.endswith("zarr.json") and not key == "zarr.json":
+        if key.endswith("zarr.json") and key != "zarr.json":
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4f6a49 and bf44cae.

📒 Files selected for processing (2)
  • py_hamt/sharded_zarr_store.py (3 hunks)
  • tests/test_sharded_zarr_store.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

  • tests/test_sharded_zarr_store.py
  • py_hamt/sharded_zarr_store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

  • tests/test_sharded_zarr_store.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

  • tests/test_sharded_zarr_store.py
  • py_hamt/sharded_zarr_store.py
🧠 Learnings (2)
📚 Learning: 2025-09-15T21:16:14.768Z
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/{store,hamt,zarr_hamt_store,encryption_hamt_store}.py : Make all storage-related operations async to accommodate IPFS/network calls

Applied to files:

  • tests/test_sharded_zarr_store.py
📚 Learning: 2025-09-15T21:16:14.768Z
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

Applied to files:

  • py_hamt/sharded_zarr_store.py
🧬 Code graph analysis (1)
tests/test_sharded_zarr_store.py (2)
py_hamt/sharded_zarr_store.py (2)
  • flush (523-566)
  • exists (687-698)
py_hamt/zarr_hamt_store.py (2)
  • read_only (108-111)
  • exists (193-199)
🪛 Ruff (0.14.5)
tests/test_sharded_zarr_store.py

136-136: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

py_hamt/sharded_zarr_store.py

634-634: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Create project environment, run all checks (3.12)
🔇 Additional comments (3)
py_hamt/sharded_zarr_store.py (1)

343-351: LGTM! Forecast coordinate support added correctly.

The addition of forecast_reference_time and step to the exclusion list ensures these coordinate arrays are properly treated as metadata rather than chunked data, which aligns with the PR objective to support forecast datasets.

tests/test_sharded_zarr_store.py (2)

2-2: LGTM! Import necessary for test assertion.

The math import is appropriately used in the new test for calculating expected chunks per dimension.


80-149: Comprehensive test for forecast dataset support.

This test effectively validates the PR objective by:

  • Creating a dataset with forecast-specific coordinates (forecast_reference_time and step)
  • Verifying data integrity through write/read cycles
  • Confirming coordinate arrays are properly stored and retrievable

The test structure is clear and the assertions are appropriate.

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (5d7b6e9) to head (c8dadf6).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #84   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            7         7           
  Lines         1310      1308    -2     
=========================================
- Hits          1310      1308    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_sharded_zarr_store.py (1)

80-149: Use zip(..., strict=True) in the chunk-geometry assertion to catch shape mismatches.

The new forecast/step test looks good and nicely validates that writing a 4D forecast dataset doesn’t perturb the primary array geometry. To tighten the assertion and satisfy Ruff’s B905 recommendation, make the zip explicit about dimensional agreement:

-        assert store_write._array_shape == array_shape_tuple
-        assert store_write._chunks_per_dim == tuple(
-            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
-        )
+        assert store_write._array_shape == array_shape_tuple
+        assert store_write._chunks_per_dim == tuple(
+            math.ceil(a / c)
+            for a, c in zip(array_shape_tuple, chunk_shape_tuple, strict=True)
+        )

That way, any accidental mismatch in the length of array_shape_tuple and chunk_shape_tuple will raise instead of silently truncating.

#!/bin/bash
# Quick grep to ensure all zip(array_shape_tuple, chunk_shape_tuple) calls
# in the repo use strict=True.
rg -n "zip\(array_shape_tuple,\s*chunk_shape_tuple" --type=py -C2
🧹 Nitpick comments (3)
py_hamt/sharded_zarr_store.py (3)

299-337: Clean up _pending_shard_loads on terminal fetch failure to avoid unnecessary timeouts.

The retry/backoff around cas.load is good, but when the final ConnectionError/TimeoutError occurs you raise RuntimeError without clearing _pending_shard_loads[shard_idx]. Callers coming through _load_or_initialize_shard_cache will then block up to 60 seconds on an event that is never set before hitting the timeout path that finally cleans it up.

You can unblock waiters immediately and avoid relying on the timeout branch by clearing the pending entry before raising on the last attempt:

@@
-        for attempt in range(max_retries):
+        for attempt in range(max_retries):
             try:
@@
-                return  # Success
+                return  # Success
             except (ConnectionError, TimeoutError) as e:
                 # Handle transient errors (e.g., network issues)
-                if attempt < max_retries - 1:
+                if attempt < max_retries - 1:
                     await asyncio.sleep(
                         retry_delay * (2**attempt)
                     )  # Exponential backoff
                     continue
-                else:
-                    raise RuntimeError(
-                        f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
-                    )
+                else:
+                    # Ensure waiters on this shard index are unblocked on terminal failure.
+                    if shard_idx in self._pending_shard_loads:
+                        self._pending_shard_loads[shard_idx].set()
+                        del self._pending_shard_loads[shard_idx]
+                    raise RuntimeError(
+                        f"Failed to fetch shard {shard_idx} after {max_retries} attempts: {e}"
+                    )

343-373: Expanded excluded_array_prefixes correctly captures forecast coordinate arrays.

Including "forecast_reference_time" and "step" (plus "latitude"/"longitude") in excluded_array_prefixes matches the forecast/step coordinate behavior exercised in the new tests and keeps those coordinate arrays in the metadata namespace rather than the main chunk grid, which is what exists()/delete() expect.

If you ever profile this hot, you could consider hoisting excluded_array_prefixes to a module- or class-level constant instead of rebuilding the set on every _parse_chunk_key call, but that’s purely a micro-optimisation.


630-653: Resize trigger narrowing in set() looks correct; consider DRYing the shape check.

Restricting the resize path to keys ending with "zarr.json" but not the root "zarr.json", and only when the new shape has the same rank as _array_shape and actually differs, is a good way to prevent coordinate metadata (e.g., 1D coords) from resharding the main index. The double-check inside _resize_lock is also a nice concurrency guard.

To reduce duplication and make the intent a bit clearer, you could factor the repeated condition into a small helper or local needs_resize boolean:

-            if not new_array_shape:
+            if not new_array_shape:
                 raise ValueError("Shape not found in metadata.")
-            # Only resize when the metadata shape represents the primary array.
-            if (
-                len(new_array_shape) == len(self._array_shape)
-                and tuple(new_array_shape) != self._array_shape
-            ):
+            # Only resize when the metadata shape represents the primary array.
+            needs_resize = (
+                len(new_array_shape) == len(self._array_shape)
+                and tuple(new_array_shape) != self._array_shape
+            )
+            if needs_resize:
                 async with self._resize_lock:
@@
-                    if (
-                        len(new_array_shape) == len(self._array_shape)
-                        and tuple(new_array_shape) != self._array_shape
-                    ):
+                    if needs_resize:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13ffbda and 970ed9b.

📒 Files selected for processing (2)
  • py_hamt/sharded_zarr_store.py (4 hunks)
  • tests/test_sharded_zarr_store.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

  • py_hamt/sharded_zarr_store.py
  • tests/test_sharded_zarr_store.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

  • py_hamt/sharded_zarr_store.py
  • tests/test_sharded_zarr_store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

  • tests/test_sharded_zarr_store.py
🧠 Learnings (1)
📚 Learning: 2025-09-15T21:16:14.768Z
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/zarr_hamt_store.py : ZarrHAMTStore must implement the zarr.abc.store.Store interface

Applied to files:

  • py_hamt/sharded_zarr_store.py
🧬 Code graph analysis (1)
tests/test_sharded_zarr_store.py (1)
py_hamt/sharded_zarr_store.py (5)
  • ShardedZarrStore (142-874)
  • open (210-241)
  • flush (523-566)
  • exists (687-698)
  • delete (712-731)
🪛 GitHub Actions: Triggered on push from 0xSwego to branch/tag forecast-ds-support
tests/test_sharded_zarr_store.py

[error] 165-178: ruff-format reformatted the file during pre-commit and the hook failed. Run 'pre-commit run --all-files' to re-check formatting.

🪛 Ruff (0.14.6)
py_hamt/sharded_zarr_store.py

634-634: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_sharded_zarr_store.py

136-136: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (3)
py_hamt/sharded_zarr_store.py (1)

716-721: Idempotent metadata/coordinate delete behavior is a good fit for callers.

Making the metadata branch of delete() use pop(key, None) and only toggling _dirty_root when something was actually removed gives you idempotent semantics for coordinate/metadata deletions, which matches how the new tests exercise repeated deletes without errors and is in line with common store APIs.

tests/test_sharded_zarr_store.py (2)

151-197: Good targeted test for coordinate-metadata delete idempotence.

test_coordinate_metadata_delete_idempotent nicely exercises repeated deletes for "forecast_reference_time/c/0" and "step/c/0", matching the new _parse_chunk_key exclusions and idempotent metadata delete logic in the store. This should guard against regressions where coordinate deletions start raising again.


1-10: The file is already properly formatted with no uncommitted changes required.

The file tests/test_sharded_zarr_store.py currently has proper formatting that complies with ruff-format standards:

  • Imports are correctly organized (stdlib → third-party → local with blank line separators)
  • No trailing whitespace or formatting issues detected
  • Working tree is clean with no uncommitted changes

The review comment's suggestion to run pre-commit and commit diffs is unnecessary, as the file already meets all formatting requirements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/test_sharded_zarr_store.py (1)

81-138: Tighten zip() with strict=True in chunk geometry assertion

This test is a solid coverage of 4D forecast datasets and correctly keeps the store geometry aligned with the Dataset. The only concern is the zip() call in the _chunks_per_dim assertion:

assert store_write._chunks_per_dim == tuple(
    math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
)

To satisfy Ruff B905 and also surface any future dimension mismatches between array_shape_tuple and chunk_shape_tuple, make the zip strict:

-        assert store_write._chunks_per_dim == tuple(
-            math.ceil(a / c) for a, c in zip(array_shape_tuple, chunk_shape_tuple)
-        )
+        assert store_write._chunks_per_dim == tuple(
+            math.ceil(a / c)
+            for a, c in zip(array_shape_tuple, chunk_shape_tuple, strict=True)
+        )

This keeps the test aligned with the intended invariants and avoids silent truncation if the tuples ever diverge. The async usage here also matches the fully async storage APIs for IPFS-backed stores. Based on learnings, ...

🧹 Nitpick comments (1)
tests/test_sharded_zarr_store.py (1)

152-197: Idempotent delete test is good; optional: assert post-conditions

The test correctly exercises coordinate delete idempotence (double-deleting the same keys without raising), which matches the store’s intended behavior for metadata/chunk deletions.

If you want to make this stronger, you could optionally assert that the coordinate chunks are actually gone after the first delete (e.g., via await store.exists("forecast_reference_time/c/0") is False) before re-deleting, but that’s not strictly necessary for catching regressions in idempotence itself.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 970ed9b and 1927b60.

📒 Files selected for processing (4)
  • py_hamt/sharded_zarr_store.py (4 hunks)
  • tests/test_sharded_store_resizing.py (0 hunks)
  • tests/test_sharded_zarr_store.py (2 hunks)
  • tests/test_sharded_zarr_store_coverage.py (1 hunks)
💤 Files with no reviewable changes (1)
  • tests/test_sharded_store_resizing.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • py_hamt/sharded_zarr_store.py
🧰 Additional context used
📓 Path-based instructions (3)
{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

  • tests/test_sharded_zarr_store_coverage.py
  • tests/test_sharded_zarr_store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

  • tests/test_sharded_zarr_store_coverage.py
  • tests/test_sharded_zarr_store.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

  • tests/test_sharded_zarr_store_coverage.py
  • tests/test_sharded_zarr_store.py
🧠 Learnings (1)
📚 Learning: 2025-09-15T21:16:14.768Z
Learnt from: CR
Repo: dClimate/py-hamt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T21:16:14.768Z
Learning: Applies to py_hamt/{store,hamt,zarr_hamt_store,encryption_hamt_store}.py : Make all storage-related operations async to accommodate IPFS/network calls

Applied to files:

  • tests/test_sharded_zarr_store.py
🧬 Code graph analysis (1)
tests/test_sharded_zarr_store.py (2)
py_hamt/sharded_zarr_store.py (3)
  • flush (523-566)
  • exists (689-700)
  • delete (714-733)
py_hamt/zarr_hamt_store.py (3)
  • read_only (108-111)
  • exists (193-199)
  • delete (236-248)
🪛 Ruff (0.14.6)
tests/test_sharded_zarr_store.py

137-137: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Create project environment, run all checks (3.12)
🔇 Additional comments (3)
tests/test_sharded_zarr_store_coverage.py (1)

208-211: Coverage for metadata-without-shape path looks good

Exercising store.set() with shape-less metadata here is enough for this coverage test, since the geometry invariants are asserted in test_metadata_without_shape_does_not_resize.

tests/test_sharded_zarr_store.py (2)

2-3: New imports are appropriate and used

json and math are both exercised in the new tests below (metadata encoding and chunk-geometry assertions), so these additions are correct and won’t trip unused-import linting.


198-220: Metadata-without-shape resize guard is well covered

This test cleanly verifies the new behavior: writing a zarr.json metadata blob without a shape field no longer triggers resizing, and _array_shape / _chunks_per_dim stay unchanged. Using JSON encoding via json.dumps and the buffer prototype matches how other metadata paths operate, so this test is aligned with the implementation.

@0xSwego 0xSwego merged commit 9de8375 into main Dec 1, 2025
2 of 3 checks passed
@0xSwego 0xSwego deleted the forecast-ds-support branch December 1, 2025 13:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test_sharded_zarr_store.py (3)

81-152: Forecast/step coordinate test covers the new dataset shape & metadata behavior well

This test does a good job exercising a realistic 4‑D forecast layout, verifying both round‑trip integrity and that forecast_reference_time/step coordinate chunks are persisted as metadata. The geometry assertions using _array_shape and _chunks_per_dim (with zip(..., strict=True)) give strong protection against regressions in chunk layout.

If you expect the internal representation of _array_shape / _chunks_per_dim to evolve, consider adding a small public helper or factory in ShardedZarrStore for these invariants so tests depend less on private attributes. As per coding guidelines, this keeps tests resilient to internal refactors.


153-197: Idempotent coordinate metadata delete behavior is validated; consider asserting final state

The test correctly verifies that repeated deletes of forecast_reference_time and step coordinate metadata complete without error, matching the intended idempotent delete implementation.

For a slightly stronger check, you could also assert that await store.exists("forecast_reference_time/c/0") and await store.exists("step/c/0") are False after the deletions (optionally after a flush() and reopen) to confirm state as well as behavior.


199-221: Metadata-without-shape regression is covered; note reliance on private internals

This test cleanly captures the regression condition that metadata JSON lacking a shape must not trigger a resize or error, and the assertions on _array_shape and _chunks_per_dim make the intent explicit.

If _array_shape / _chunks_per_dim are likely to change or be encapsulated further, consider asserting via a small public accessor or helper instead of private attributes so the test remains stable across internal refactors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 052442e and c8dadf6.

📒 Files selected for processing (1)
  • tests/test_sharded_zarr_store.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
{py_hamt,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{py_hamt,tests}/**/*.py: Use type hints throughout the codebase (mypy enforced)
Python code must pass ruff linting (ruff check)

Files:

  • tests/test_sharded_zarr_store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use Hypothesis property-based testing in the test suite
Gate IPFS-dependent tests behind the --ipfs pytest flag so they can be skipped when IPFS isn’t available

Place all tests within the /tests directory

Files:

  • tests/test_sharded_zarr_store.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Ensure all Python functions are fully typed and mypy-compliant
Prefer simple, functional, easy-to-understand functions
Use meaningful variable and function names
Add comments for complex logic to aid readability and maintenance

Files:

  • tests/test_sharded_zarr_store.py
🧬 Code graph analysis (1)
tests/test_sharded_zarr_store.py (1)
py_hamt/sharded_zarr_store.py (5)
  • ShardedZarrStore (142-876)
  • open (210-241)
  • flush (523-566)
  • exists (689-700)
  • delete (714-733)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Create project environment, run all checks (3.12)
🔇 Additional comments (1)
tests/test_sharded_zarr_store.py (1)

2-3: New stdlib imports are appropriate and used correctly

json and math are both used in the new tests and placed correctly with other standard-library imports; no issues here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants