Skip to content

Conversation

@abidsikder
Copy link
Contributor

@abidsikder abidsikder commented Mar 24, 2025

Add Zarr v3 by reimplementing the Zarr abstract store for normal zarr write as well as encryption

Summary by CodeRabbit

  • New Features

    • Introduced the IPFSZarr3 class for enhanced compatibility with Zarr v3 storage, supporting operations such as reading, writing, appending, and deleting data.
    • Updated metadata handling to specify that "zarr.json" files are ignored during the encryption process.
  • Tests

    • Revamped asynchronous tests to validate data integrity and accurately measure performance improvements during data operations.
  • Chores

    • Updated dependencies to improve overall compatibility and support for asynchronous testing.
    • Revised documentation for clarity and updated usage examples in the README.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2025

Walkthrough

This pull request introduces a new asynchronous class, IPFSZarr3, that integrates a HAMT-based data storage system into the module. The public interface is updated to export IPFSZarr3. The changes also update transformation logic for zarr metadata files, adjust dependency requirements in pyproject.toml, and replace a synchronous test with an asynchronous version to validate write, append, and read operations.

Changes

Files Change Summary
py_hamt/__init__.py, py_hamt/ipfszarr3.py Added new class IPFSZarr3 (extending zarr.abc.store.Store) with asynchronous methods for data operations using a HAMT, and updated exports.
py_hamt/zarr_encryption_transformers.py Removed the zarr v2 metadata file list and updated logic to ignore "zarr.json" metadata for zarr v3 during encryption.
pyproject.toml Updated dependencies: added "zarr" and "pytest-asyncio>=0.25.3", changed "xarray" version from a fixed version to a minimum version, and removed fixed "zarr" version.
tests/test_zarr_ipfs.py Replaced synchronous test (test_upload_then_read) with an asynchronous test_write_read using IPFSZarr3, updated time measurement, and enhanced assertions.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Function (test_write_read)
    participant Store as IPFSZarr3 Instance
    participant HAMT as HAMT Structure

    Test->>Store: Initiate write operation (set/set_if_not_exists)
    Store->>HAMT: Asynchronously store value
    HAMT-->>Store: Confirm storage
    Store-->>Test: Acknowledge write completion
    Test->>Store: Initiate read operation (get)
    Store->>HAMT: Asynchronously retrieve value
    HAMT-->>Store: Return stored value
    Store-->>Test: Deliver retrieved value
Loading

Possibly related PRs

  • Add authenticated IPFS RPC connections to IPFSStore #35: The changes in the main PR, which involve adding the IPFSZarr3 class and its import in __init__.py, are related to the retrieved PR as both involve modifications to the IPFSStore class and its interaction with IPFS, specifically in the context of handling data storage and retrieval.
  • Add network stats to ipfs store #31: The changes in the main PR, which involve adding the IPFSZarr3 class and its import in __init__.py, are related to the modifications in the retrieved PR that also involve the IPFSStore class, as both are part of the same module and contribute to the overall functionality of the IPFS storage system.
  • Version 2 #7: The changes in the main PR, which involve adding the IPFSZarr3 class and its import in __init__.py, are directly related to the modifications in the retrieved PR that also introduce the IPFSStore class, as both pertain to the storage mechanisms for handling data in the context of IPFS.

Suggested reviewers

  • TheGreatAlgo

Poem

Oh, I’m a hopping coder rabbit, so spry and smart,
With asynchronous leaps, I play my part.
IPFSZarr3 joins my burrow with a joyful boast,
In a game of keys and values, I love to hop and toast!
Carrots of code crunch as I celebrate this new art.
🥕🐇 Happy coding from my heart!

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
pyproject.toml (1)

13-13: Consider pinning or specifying a minimum version for Zarr.
Allowing a free-floating version may cause unexpected issues if breaking changes are introduced.

-    "zarr",
+    "zarr>=2.18.0",
py_hamt/ipfszarr3.py (2)

54-60: Consider clarifying or removing this unimplemented method.
If partial reads are not on the roadmap, removing or adding a clear docstring can avoid confusion.


79-83: Same consideration as for partial reads—clarify or remove if not planned.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f5d0c9 and 577e368.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • py_hamt/__init__.py (2 hunks)
  • py_hamt/ipfszarr3.py (1 hunks)
  • py_hamt/zarr_encryption_transformers.py (1 hunks)
  • pyproject.toml (2 hunks)
  • tests/test_zarr_ipfs.py (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
py_hamt/__init__.py (1)
py_hamt/ipfszarr3.py (1)
  • IPFSZarr3 (9-113)
🪛 Gitleaks (8.21.2)
tests/test_zarr_ipfs.py

128-128: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


133-133: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

210-210: pytest.raises(Exception) should be considered evil

(B017)

🪛 GitHub Actions: Triggered on push from abidsikder to branch/tag zarr-v3
tests/test_zarr_ipfs.py

[error] 231-231: AssertionError: assert False in test_authenticated_gateway. The write_and_check function returned False.


[warning] 203-203: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.


[warning] 207-207: PytestDeprecationWarning: The configuration option 'asyncio_default_fixture_loop_scope' is unset. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope.

🔇 Additional comments (32)
py_hamt/__init__.py (2)

4-4: New import looks good.
No issues with importing IPFSZarr3.


13-13: Exporting IPFSZarr3 in __all__ is appropriate.
This clearly indicates that IPFSZarr3 is part of the public API.

pyproject.toml (2)

31-31: Ensure the upgraded xarray version is fully compatible.
Raising the lower bound to >=2025.1.2 may introduce unexpected incompatibilities if your users rely on older xarray versions.

Would you like me to generate a script to check for xarray usage throughout the codebase and confirm compatibility?


32-32: Validate the pytest-asyncio version for potential deprecation warnings.
Pytest is warning about the asyncio_default_fixture_loop_scope option. Make sure your configuration is updated to avoid breakage in future releases of pytest-asyncio.

py_hamt/zarr_encryption_transformers.py (3)

22-25: Doc changes supporting Zarr v3 are consistent with ignoring zarr.json.
This approach allows reading metadata without requiring decryption keys.


33-35: Corner case check for empty path.
If key is ever an empty string or malformed, p.parts[0] could raise an IndexError. Ensure the calling code always provides a non-empty path.


38-39: Safely ignoring zarr.json is correct for Zarr v3.
This effectively maintains unencrypted metadata for structural introspection.

tests/test_zarr_ipfs.py (10)

4-4: New imports appear essential for async testing and buffer usage.
No immediate issues.

Also applies to: 10-10, 12-12


67-69: Transition to async for the test method is coherent.
Since IPFSZarr3 provides asynchronous operations, this ensures matching async usage. Verify that all relevant I/O is properly awaited or handled if future async additions are needed.


70-107: Writing and reading the dataset via IPFSZarr3 looks good.
The logic to append data along a dimension and then compare halves is solid. Synchronous Xarray calls against an async store are acceptable as long as Xarray calls remain synchronous.


115-123: Coverage enhancements for metadata checks are useful.
Ensures exists, equality checks, and read-only flags are tested.


124-139: Iterating over keys for listing verification.
No issues identified. The async iteration and the set comparison look correct for validating store consistency.

🧰 Tools
🪛 Gitleaks (8.21.2)

128-128: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


133-133: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


140-147: Partial set operations are properly guarded with NotImplementedError.
This keeps the API consistent until partial writes are supported.


148-168: Removing and re-inserting zarr.json tests store mutability well.
Confirms that removing a critical metadata file changes the HAMT key set, then set it back again.


169-175: Re-checking zarr.json after set-if-not-exists.
The logic ensures metadata remains consistent.


177-211: Partial encryption test for precipitation variable.

  1. Excluding coordinate vars allows reading them even without a decryption key.
  2. Consider tightening the exception type in pytest.raises(Exception).

Would you like to specify a more precise exception (e.g., ValueError, CryptoError, etc.) to avoid broad exception catches?

🧰 Tools
🪛 Ruff (0.8.2)

210-210: pytest.raises(Exception) should be considered evil

(B017)

🪛 GitHub Actions: Triggered on push from abidsikder to branch/tag zarr-v3

[warning] 203-203: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.


[warning] 207-207: PytestDeprecationWarning: The configuration option 'asyncio_default_fixture_loop_scope' is unset. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope.


215-216: Authenticated gateway test is a solid approach.
However, the pipeline indicates a failure in assert write_and_check(api_key_store). Ensure local environment or test configuration is correct for successful authentication.

py_hamt/ipfszarr3.py (15)

1-5: Imports look correct and necessary.


9-19: Validate the compatibility of the read_only argument in the base class constructor.
If zarr.abc.store.Store.__init__ does not accept this parameter, this could cause runtime errors.

Please confirm whether the base class indeed supports read_only or remove this argument if not supported.


24-28: Equality check is straightforward and clear.


29-40: Asynchronous get method looks correct.
Partial read logic is clearly documented as unsupported. No further concerns here.


61-63: Simple existence check is appropriate and aligns with store interface.


64-67: Read/write property logic is correct and mirrors the HAMT read-only state.


68-71: Partial writes are not supported, which is explicitly stated.


72-74: Storing bytes in the HAMT is straightforward.


75-78: Beware of potential race conditions.
If multiple tasks call set_if_not_exists concurrently for the same key, both might set the data. Confirm or implement synchronization if needed.


84-87: Delete support is properly tied to read-only mode.


88-90: Delete logic is straightforward.


91-94: Listing support is explicitly indicated.


95-98: Key iteration is correct for the base listing functionality.


99-103: Prefix-based iteration is clear and meets expectations.


104-114: Directory-style listing logic appears correct, but verify duplicate handling.
If multiple keys share the same leading segment, the yielded values may repeat. Confirm the desired behavior.

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3f5d0c9) to head (8f02031).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #37   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         5    +1     
  Lines          461       528   +67     
=========================================
+ Hits           461       528   +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_zarr_ipfs.py (1)

209-210: Use a specific exception type instead of generic Exception.

Using pytest.raises(Exception) is considered poor practice as it might catch unintended exceptions. Specify the expected exception type for more accurate testing.

- with pytest.raises(Exception):
+ with pytest.raises((ValueError, TypeError, zarr.errors.ArrayNotFoundError)):
      ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

209-209: pytest.raises(Exception) should be considered evil

(B017)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 577e368 and 3e783e6.

📒 Files selected for processing (1)
  • tests/test_zarr_ipfs.py (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
tests/test_zarr_ipfs.py (3)
py_hamt/store.py (1)
  • IPFSStore (64-205)
py_hamt/ipfszarr3.py (6)
  • IPFSZarr3 (9-113)
  • supports_writes (65-66)
  • read_only (21-22)
  • exists (61-62)
  • list_prefix (99-102)
  • delete (88-89)
py_hamt/zarr_encryption_transformers.py (3)
  • create_zarr_encryption_transformers (10-65)
  • encrypt (44-53)
  • decrypt (55-63)
🪛 Gitleaks (8.21.2)
tests/test_zarr_ipfs.py

127-127: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


132-132: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


137-137: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

209-209: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (6)
tests/test_zarr_ipfs.py (6)

4-12: The updated imports look good.

The addition of time for performance measurement, zarr.core.buffer for buffer operations, and IPFSZarr3 from py_hamt are all appropriate for the new test implementation.


67-91: Good implementation of async Zarr v3 write operations.

The new test function properly sets up the HAMT-based Zarr v3 storage, performs both initial write and append operations, and captures performance metrics using a more accurate time.perf_counter() instead of time.time().

I appreciate the detailed performance tracking and the assertion to verify write capabilities before proceeding.


114-174: Comprehensive test coverage for IPFSZarr3 API.

This test section thoroughly verifies the IPFSZarr3 API behavior including:

  • Existence checks for metadata files
  • Equality operations
  • Feature support flags
  • Key listing and iteration
  • Error handling for unimplemented features
  • Metadata file manipulation

The tests are well-structured and provide excellent coverage of the new functionality.

🧰 Tools
🪛 Gitleaks (8.21.2)

127-127: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


132-132: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


137-137: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


127-137: False positive: These are not actual API keys.

The static analysis tool flagged these lines as containing API keys, but this is a false positive. These lines are simply gathering keys from the HAMT data structure as part of testing.

🧰 Tools
🪛 Gitleaks (8.21.2)

127-127: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


132-132: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


137-137: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


183-196: Good update to the encryption test.

The encryption test has been properly updated to work with IPFSZarr3, including:

  • Expanded exclude_vars list to protect more variables from encryption
  • Correct implementation of the Zarr encryption transformers
  • Proper verification of encrypted and non-encrypted data

The test effectively verifies that encryption works as expected with the new Zarr v3 implementation.


213-224: Successfully adapted authentication tests to IPFSZarr3.

The authenticated gateway test has been properly updated to use the new IPFSZarr3 class while maintaining the same authentication validation logic for API keys, bearer tokens, and basic auth.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/test_zarr_ipfs.py (2)

130-138: Remove duplicate test code.

There are duplicate blocks testing the same functionality. The key listing test at lines 130-133 is identical to the one at lines 124-128, and the list_prefix test at lines 135-138 tests the same functionality.

-    ipfszarr3_keys: set[str] = set()
-    async for k in ipfszarr3.list():
-        ipfszarr3_keys.add(k)
-    assert hamt_keys == ipfszarr3_keys
-
-    ipfszarr3_keys: set[str] = set()
-    async for k in ipfszarr3.list_prefix(""):
-        ipfszarr3_keys.add(k)
-    assert hamt_keys == ipfszarr3_keys
+    # Test list_prefix with empty prefix (should match list())
+    ipfszarr3_keys_prefix: set[str] = set()
+    async for k in ipfszarr3.list_prefix(""):
+        ipfszarr3_keys_prefix.add(k)
+    assert hamt_keys == ipfszarr3_keys_prefix
🧰 Tools
🪛 Gitleaks (8.21.2)

133-133: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


210-211: Use a more specific exception type in the test.

Using a generic Exception in pytest.raises() is considered a bad practice as it could catch unintended exceptions, making the test pass when it should fail.

-    with pytest.raises(Exception):
+    with pytest.raises(ValueError, match="Unable to decode encrypted data"):
         ds.precip.sum()
🧰 Tools
🪛 Ruff (0.8.2)

210-210: pytest.raises(Exception) should be considered evil

(B017)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e783e6 and 9ce6840.

📒 Files selected for processing (1)
  • tests/test_zarr_ipfs.py (2 hunks)
🧰 Additional context used
🪛 Gitleaks (8.21.2)
tests/test_zarr_ipfs.py

128-128: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


133-133: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


138-138: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py

210-210: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (10)
tests/test_zarr_ipfs.py (10)

4-4: Import changes look good.

The added imports for time, zarr.core.buffer, and IPFSZarr3 support the new async test functionality and performance measurements in the updated test.

Also applies to: 10-10, 12-12


67-80: Well-structured asynchronous test setup.

The test has been properly converted to use async/await pattern with the correct pytest decorator. The setup is clear with appropriate performance measurement using time.perf_counter() which is more precise than other timing methods.

Good job on verifying write support before performing operations and adding detailed print statements for debugging.


80-82: Good testing of both initial write and append operations.

Testing both initial write and append operations provides better coverage of real-world usage patterns. The append operation using mode="a" and append_dim="time" is an excellent addition to verify that the Zarr v3 implementation handles dataset growth correctly.


102-106: Thorough data verification approach.

Splitting the dataset and verifying both halves is an excellent way to test the integrity of the appended data. This ensures that both the initial write and append operations are working correctly.


109-111: Elapsed time calculation is now correct.

Thanks for fixing the elapsed time calculation for the read operation by properly using the read operation's start and end times.


115-128: Good test coverage for metadata and store operation fundamentals.

These assertions verify key functionality of the IPFSZarr3 class including metadata existence, equality comparison, operation support flags, and key listing. This increases confidence in the implementation's correctness.

🧰 Tools
🪛 Gitleaks (8.21.2)

128-128: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


140-146: Good test coverage for unsupported operations.

Testing that unsupported operations correctly raise NotImplementedError is important for ensuring the class properly communicates its capabilities.


148-174: Comprehensive testing of metadata file operations.

The test thoroughly covers getting and setting metadata files, including edge cases like setting a file that already exists and deleting and recreating metadata files. This ensures the Zarr v3 store maintains data integrity.


177-198: Encryption test updated correctly.

The encryption test has been properly updated to use the new IPFSZarr3 class and now excludes more variables from encryption, providing a more comprehensive test of selective encryption.


214-253: Authentication tests updated correctly.

The authenticated gateway tests have been successfully updated to use the new IPFSZarr3 class while maintaining the original test logic for various authentication methods.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
README.md (4)

75-75: Fix missing backtick in class name reference.

There's a missing backtick in the class name reference.

-For an example on how to read and write Zarr v3, check the API documentation and look at the ``IPFSZarr3` class.
+For an example on how to read and write Zarr v3, check the API documentation and look at the `IPFSZarr3` class.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~75-~75: The preposition “of” seems more likely in this position.
Context: ...hamt key/values ... ``` For an example on how to read and write Zarr v3, check th...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)


21-21: Add article "an" before "example."

The sentence is missing the article "an" before "example."

-Below are some examples, but for more information see the [API documentation](https://dclimate.github.io/py-hamt/py_hamt.html). Each major item has example code. You can also see this library used in [Jupyter notebooks for data analysis](https://github.com/dClimate/jupyter-notebooks).
+Below are some examples, but for more information see the [API documentation](https://dclimate.github.io/py-hamt/py_hamt.html). Each major item has an example code. You can also see this library used in [Jupyter notebooks for data analysis](https://github.com/dClimate/jupyter-notebooks).
🧰 Tools
🪛 LanguageTool

[uncategorized] ~21-~21: You might be missing the article “an” here.
Context: ...hamt/py_hamt.html). Each major item has example code. You can also see this library use...

(AI_EN_LECTOR_MISSING_DETERMINER_AN)


75-75: Consider using preposition "of" instead of "on."

The preposition "of" seems more appropriate in this context.

-For an example on how to read and write Zarr v3, check the API documentation and look at the ``IPFSZarr3` class.
+For an example of how to read and write Zarr v3, check the API documentation and look at the ``IPFSZarr3` class.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~75-~75: The preposition “of” seems more likely in this position.
Context: ...hamt key/values ... ``` For an example on how to read and write Zarr v3, check th...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)


123-123: Add a comma after "For more information."

For better readability, add a comma after "For more information".

-Use `uv add` and `uv remove`, e.g. `uv add numpy` or `uv add pytest --group dev`. For more information please see the [uv documentation](https://docs.astral.sh/uv/guides/projects/).
+Use `uv add` and `uv remove`, e.g. `uv add numpy` or `uv add pytest --group dev`. For more information, please see the [uv documentation](https://docs.astral.sh/uv/guides/projects/).
🧰 Tools
🪛 LanguageTool

[typographical] ~123-~123: Consider adding a comma here.
Context: ...ytest --group dev`. For more information please see the [uv documentation](https://docs...

(PLEASE_COMMA)

py_hamt/ipfszarr3.py (4)

83-88: Add documentation for not implemented method.

The get_partial_values method raises NotImplementedError but doesn't explain why it's not implemented or when it might be available.

    async def get_partial_values(
        self,
        prototype: zarr.core.buffer.BufferPrototype,
        key_ranges: Iterable[tuple[str, zarr.abc.store.ByteRequest | None]],
    ) -> list[zarr.core.buffer.Buffer | None]:
-        raise NotImplementedError
+        raise NotImplementedError("Partial value reads are not currently supported in IPFSZarr3")

108-111: Add documentation for not implemented method.

Similar to the get_partial_values method, the set_partial_values method should provide context for why it's not implemented.

    async def set_partial_values(
        self, key_start_values: Iterable[tuple[str, int, BytesLike]]
    ) -> None:
-        raise NotImplementedError
+        raise NotImplementedError("Partial value writes are not currently supported in IPFSZarr3")

128-132: Add specific return type to the list_prefix method.

The return type annotation for list_prefix lacks the specific type parameter.

-    async def list_prefix(self, prefix: str) -> AsyncIterator:
+    async def list_prefix(self, prefix: str) -> AsyncIterator[str]:
         for key in self.hamt:
             if key.startswith(prefix):
                 yield key

133-142: Add specific return type to the list_dir method.

The return type annotation for list_dir lacks the specific type parameter.

-    async def list_dir(self, prefix: str) -> AsyncIterator:
+    async def list_dir(self, prefix: str) -> AsyncIterator[str]:
         for key in self.hamt:
             if key.startswith(prefix):
                 suffix = key[len(prefix) :]
                 first_slash = suffix.find("/")
                 if first_slash == -1:
                     yield suffix
                 else:
                     name = suffix[0:first_slash]
                     yield name
py_hamt/zarr_encryption_transformers.py (1)

31-31: Use a more secure example for the encryption key.

While the code includes a comment to change the encryption key before using, it might be better to use a more secure example that doesn't potentially encourage insecure practices.

-    encryption_key = bytes(32) # change before using, only for demonstration purposes!
+    encryption_key = get_random_bytes(32) # Generate a secure random key for production use
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ce6840 and 8f02031.

📒 Files selected for processing (3)
  • README.md (4 hunks)
  • py_hamt/ipfszarr3.py (1 hunks)
  • py_hamt/zarr_encryption_transformers.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~21-~21: You might be missing the article “an” here.
Context: ...hamt/py_hamt.html). Each major item has example code. You can also see this library use...

(AI_EN_LECTOR_MISSING_DETERMINER_AN)


[uncategorized] ~75-~75: The preposition “of” seems more likely in this position.
Context: ...hamt key/values ... ``` For an example on how to read and write Zarr v3, check th...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)


[typographical] ~123-~123: Consider adding a comma here.
Context: ...ytest --group dev`. For more information please see the [uv documentation](https://docs...

(PLEASE_COMMA)

🔇 Additional comments (12)
README.md (3)

10-13: LGTM! Great clarification of the project's purpose.

The updated description clearly explains the relationship to the original IAMap project and the purpose of the library for storing Zarr data on IPFS.


26-29: LGTM! Import updated to reflect new storage mechanism.

The change from DictStore to IPFSStore properly aligns with the new Zarr v3 functionality being introduced.


75-77: LGTM! Good documentation addition for Zarr v3 functionality.

The references to the API documentation for Zarr v3 and encrypted Zarrs functionality provide clear direction for users.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~75-~75: The preposition “of” seems more likely in this position.
Context: ...hamt key/values ... ``` For an example on how to read and write Zarr v3, check th...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)

py_hamt/ipfszarr3.py (6)

9-48: LGTM! Well-documented class with clear examples.

The IPFSZarr3 class is well-documented with comprehensive examples for both reading and writing Zarr data. The initialization logic correctly handles the read-only mode setting.


58-88: LGTM! Good implementation of the get method with partial read comments.

The get method correctly implements the Zarr v3 interface for retrieving values. The commented-out code for partial reads serves as documentation for potential future implementation.


93-95: LGTM! Clear implementation of the property.

The supports_writes property correctly checks if writes are supported based on the read-only state of the HAMT.


101-106: LGTM! Well-implemented set methods.

The set and set_if_not_exists methods are correctly implemented to handle storing values in the HAMT.


113-118: LGTM! Correct support for deletion.

The supports_deletes property and delete method are properly implemented to handle key removal when not in read-only mode.


120-142: LGTM! Good implementation of listing methods.

The supports_listing property and the three listing methods (list, list_prefix, and list_dir) are well-implemented to provide different key enumeration options.

py_hamt/zarr_encryption_transformers.py (3)

22-23: LGTM! Updated documentation for Zarr v3 metadata handling.

The comment clearly explains that zarr.json metadata files in Zarr v3 are ignored during encryption, allowing structure calculation without the encryption key.


24-50: LGTM! Excellent documentation with example code.

The expanded documentation provides clear guidance on using encryption transformers, particularly for creating partially encrypted Zarrs. The example code is comprehensive and illustrates the functionality well.


56-66: LGTM! Updated logic for Zarr v3 directory structure.

The updated _should_transform function correctly handles the Zarr v3 directory structure by checking the first part of the path for exclusions and specifically looking for "zarr.json" metadata files.

@abidsikder abidsikder merged commit d8ea6e1 into main Mar 25, 2025
2 checks passed
@abidsikder abidsikder deleted the zarr-v3 branch March 25, 2025 21:10
@coderabbitai coderabbitai bot mentioned this pull request Jun 12, 2025
@coderabbitai coderabbitai bot mentioned this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants