Skip to content

Conversation

@Faolain
Copy link
Contributor

@Faolain Faolain commented Jun 2, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a fully asynchronous HAMT implementation with improved modularity, concurrency, and async APIs for all operations.
    • Added ZarrHAMTStore for native Zarr v3 storage using HAMT, supporting async key-value operations and metadata caching.
    • Added SimpleEncryptedZarrHAMTStore for end-to-end encrypted Zarr v3 storage with authenticated encryption.
    • Refactored IPFS integration with a new async KuboCAS content-addressed store using aiohttp for efficient concurrent access.
  • Bug Fixes

    • Improved error handling and validation in content-addressed stores and encryption key management.
  • Documentation

    • Condensed and clarified the README, focusing on external documentation and streamlined usage instructions.
  • Tests

    • Refactored and expanded the test suite for full async support, including new tests for encryption, caching, performance, and IPFS integration.
    • Added utilities and fixtures for robust IPFS and async testing environments.
  • Chores

    • Updated dependencies and project metadata, including support for Python async features and removal of deprecated modules.
    • Improved workflow automation and configuration for CI, pre-commit hooks, and test environments.
  • Refactor

    • Major internal redesign of HAMT, content-addressed store interfaces, and Zarr storage adapters for clarity, modularity, and async operation.
    • Removed deprecated or redundant code and legacy interfaces.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 2, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update is a major refactor and feature release. It replaces the synchronous HAMT and IPFS store implementations with fully asynchronous, modular versions, introduces new encrypted and plain Zarr v3 store backends using HAMT, and overhauls the test suite for async operation. Several legacy files and partial encryption support are removed, and new performance and coverage tests are added. Project metadata and CI workflows are updated accordingly.

Changes

File(s) Change Summary
.github/workflows/pages-main.yaml, .github/workflows/run-checks.yaml Updated CI workflows: explicit permissions, job split, action version bumps, removed Nginx setup, separated validation and testing, removed linting/formatting checks.
.gitignore Added .DS_Store to ignore list.
.pre-commit-config.yaml, pytest.ini Added pre-commit and pytest async configuration files.
README.md Condensed, removed usage examples, focused on API docs, updated dev/testing instructions.
pyproject.toml Version bump to 3.0.0, updated description, dependencies (add aiohttp, upgrade zarr, remove requests).
py_hamt/init.py Updated exports: removed IPFSZarr3 and encryption transformers, added ZarrHAMTStore and SimpleEncryptedZarrHAMTStore, updated imports.
py_hamt/hamt.py Full async refactor: new node structure, async APIs, modular node storage, improved concurrency, removed sync code.
py_hamt/store.py Refactored: Store → ContentAddressedStore (async), DictStore → InMemoryCAS, IPFSStore → KuboCAS (async, aiohttp-based), removed sync methods.
py_hamt/zarr_hamt_store.py, py_hamt/encryption_hamt_store.py New: ZarrHAMTStore and SimpleEncryptedZarrHAMTStore classes for plain/encrypted Zarr v3 storage on HAMT.
performance_tests/cache_with_ipfsstore.py, py_hamt/ipfszarr3.py, py_hamt/zarr_encryption_transformers.py Deleted: legacy performance test, IPFSZarr3, and partial encryption transformers.
tests/conftest.py, tests/testing_utils.py Added: global aiohttp client session fixture, IPFS test helpers, Hypothesis strategies, ephemeral IPFS daemon fixture.
tests/performance_tests.py New async performance test for large HAMT insertions.
tests/test_branch_anchors.py, tests/test_extract_bits.py, tests/test_hamt.py Added/updated: async branch/caching tests, type annotations, full async HAMT test suite.
tests/test_kubo_cas.py, tests/test_kubocas_auth.py, tests/test_kubocas_session.py Added: async tests for KuboCAS, session/auth handling, error cases.
tests/test_zarr_ipfs.py, tests/test_zarr_ipfs_encrypted.py Refactored/added: async Zarr-on-IPFS tests using new stores, encryption coverage, removed old test infra.

Sequence Diagram(s)

ZarrHAMTStore: Key Operations

sequenceDiagram
    participant User
    participant ZarrHAMTStore
    participant HAMT
    participant CAS

    User->>ZarrHAMTStore: set(key, value)
    ZarrHAMTStore->>HAMT: set(key, value)
    HAMT->>CAS: save(node/bytes)
    CAS-->>HAMT: CID
    HAMT-->>ZarrHAMTStore: (ack)
    ZarrHAMTStore-->>User: (ack)

    User->>ZarrHAMTStore: get(key)
    ZarrHAMTStore->>HAMT: get(key)
    HAMT->>CAS: load(CID)
    CAS-->>HAMT: bytes
    HAMT-->>ZarrHAMTStore: value
    ZarrHAMTStore-->>User: value
Loading

SimpleEncryptedZarrHAMTStore: Encrypted Read/Write

sequenceDiagram
    participant User
    participant SimpleEncryptedZarrHAMTStore
    participant HAMT
    participant CAS

    User->>SimpleEncryptedZarrHAMTStore: set(key, value)
    SimpleEncryptedZarrHAMTStore->>SimpleEncryptedZarrHAMTStore: encrypt(value)
    SimpleEncryptedZarrHAMTStore->>HAMT: set(key, encrypted_value)
    HAMT->>CAS: save(node/bytes)
    CAS-->>HAMT: CID
    HAMT-->>SimpleEncryptedZarrHAMTStore: (ack)
    SimpleEncryptedZarrHAMTStore-->>User: (ack)

    User->>SimpleEncryptedZarrHAMTStore: get(key)
    SimpleEncryptedZarrHAMTStore->>HAMT: get(key)
    HAMT->>CAS: load(CID)
    CAS-->>HAMT: encrypted_bytes
    HAMT-->>SimpleEncryptedZarrHAMTStore: encrypted_bytes
    SimpleEncryptedZarrHAMTStore->>SimpleEncryptedZarrHAMTStore: decrypt(encrypted_bytes)
    SimpleEncryptedZarrHAMTStore-->>User: value
Loading

Possibly related PRs

  • dClimate/py-hamt#34: Removes create_zarr_encryption_transformers, which is also removed here; both PRs are directly related to encryption transformer support.
  • dClimate/py-hamt#37: Adds IPFSZarr3, which is removed and replaced by ZarrHAMTStore in this PR; the changes are directly connected.
  • dClimate/py-hamt#42: Introduces ephemeral IPFS daemon and legacy test infra, which are removed and replaced by async test infra in this PR; both PRs are closely related in test infrastructure.

Suggested reviewers

  • TheGreatAlgo

Poem

In burrows deep, with bytes and keys,
The HAMT now hops with async ease!
No more old tunnels, just shiny new stores—
Zarr and IPFS, with encryption galore.
Tests now scamper, performance in tow,
This rabbit’s code garden continues to grow!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Faolain Faolain changed the base branch from main to refactor June 2, 2025 06:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue with the default authentication session handling in KuboCAS. Key changes include:

  • Adding tests in tests/test_kubocas_auth.py to validate behavior with both a user-supplied and an internally created session.
  • Updating KuboCAS in py_hamt/store.py to properly store and use default headers and authentication when creating an internal session.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/test_kubocas_auth.py Adds tests to ensure that user-supplied session and default headers/auth are used correctly.
py_hamt/store.py Updates KuboCAS to correctly handle default auth and header parameters when creating an internal session.

@Faolain
Copy link
Contributor Author

Faolain commented Jun 2, 2025

@abidsikder looks like twhen this ran it has a test failure seemingly unrelated to the code that was added. Would you be able to check to see if this is related to any timing issues or something deeper?

=========================== short test summary info ============================
FAILED tests/test_kubo_cas.py::test_kubo_cas - hypothesis.errors.FlakyFailure: Hypothesis test_kubo_cas(create_ipfs=('http://127.0.0.1:40017/', 'http://127.0.0.1:41317/'), data=1.1148856933881808e-147) produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
Falsifying example: test_kubo_cas(
    create_ipfs=('http://127.0.0.1:40017/', 'http://127.0.0.1:41317/'),
    data=1.1148856933881808e-147,
)
Unreliable test timings! On an initial run, this test took 739.10ms, which exceeded the deadline of 500.00ms, but on a subsequent run it took 13.46 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.

You can reproduce this example by temporarily adding @reproduce_failure('6.131.30', b'AEEDKCFsgumntIxl') as a decorator on your test case
================== 1 failed, 18 passed, 3 warnings in 43.01s ===================

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 10

🔭 Outside diff range comments (1)
tests/test_zarr_ipfs_encrypted.py (1)

76-206: 🛠️ Refactor suggestion

Split this large test into smaller, focused test functions.

This test function is doing too much (64 statements, 24 local variables). Consider splitting it into separate test functions for better maintainability and clarity:

  • test_encrypted_write_read_success
  • test_encrypted_read_with_wrong_key
  • test_encrypted_read_without_decryption
  • test_encrypted_store_capabilities
  • test_encrypted_store_unsupported_operations
🧰 Tools
🪛 Ruff (0.11.9)

201-201: Do not assert blind exception: Exception

(B017)


204-204: Do not assert blind exception: Exception

(B017)

🪛 Pylint (3.3.7)

[refactor] 76-76: Too many local variables (24/15)

(R0914)


[warning] 78-78: Redefining name 'random_zarr_dataset' from outer scope (line 14)

(W0621)


[refactor] 106-106: Redundant comparison - ezhs_write == ezhs_write

(R0124)


[refactor] 177-177: Redundant comparison - ezhs_read_ok == ezhs_read_ok

(R0124)


[refactor] 76-76: Too many statements (64/50)

(R0915)

🧹 Nitpick comments (25)
tests/conftest.py (1)

1-6: Consider adding module docstring for clarity.

While the re-exports are correctly marked with noqa and serve their purpose, adding a brief module docstring would help clarify this file's role in the test infrastructure.

+"""Shared test fixtures and utilities for the py_hamt test suite."""
+
 import aiohttp
 import pytest
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 1-1: Unable to import 'aiohttp'

(E0401)


[warning] 5-5: Unused ipld_strategy imported from testing_utils

(W0611)


[warning] 5-5: Unused create_ipfs imported from testing_utils

(W0611)

tests/performance_tests.py (2)

7-12: Fix import order and line length issues.

Standard library imports should come before third-party imports, and some lines exceed the 100-character limit.

Apply this diff to fix the import order:

-import pytest
-import time
-import asyncio
+import asyncio
+import time

+import pytest
🧰 Tools
🪛 Pylint (3.3.7)

[error] 11-11: Unable to import 'py_hamt'

(E0401)


[error] 12-12: Unable to import 'py_hamt.store'

(E0401)


[convention] 8-8: standard import "time" should be placed before third party import "pytest"

(C0411)


[convention] 9-9: standard import "asyncio" should be placed before third party import "pytest"

(C0411)


20-20: Consider using snake_case for variable naming.

The variable LARGE_KV_SET_SIZE doesn't follow Python's snake_case convention for local variables. Constants at module level typically use UPPER_CASE, but local variables should use snake_case.

-    LARGE_KV_SET_SIZE: int = 1_000_000
+    large_kv_set_size: int = 1_000_000

And update the references accordingly:

-        *[hamt.set(str(k_int), k_int) for k_int in range(LARGE_KV_SET_SIZE)]
+        *[hamt.set(str(k_int), k_int) for k_int in range(large_kv_set_size)]
-        == LARGE_KV_SET_SIZE
+        == large_kv_set_size
-    for k_int in range(LARGE_KV_SET_SIZE):
+    for k_int in range(large_kv_set_size):
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 20-20: Variable name "LARGE_KV_SET_SIZE" doesn't conform to snake_case naming style

(C0103)

README.md (2)

43-43: Fix grammar issue.

The phrase "both...as well as" is grammatically incorrect. Use "both...and" instead.

-We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`.
+We use `pytest` with 100% code coverage, and with test inputs that are both handwritten and generated by `hypothesis`.
🧰 Tools
🪛 LanguageTool

[grammar] ~43-~43: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)


45-50: Fix blockquote formatting.

Remove the blank line inside the blockquote to fix the markdown formatting issue.

> [!NOTE]
> Due to the randomized test inputs, it is possible sometimes to get 99% or lower test coverage by pure chance. Rerun the tests to get back complete code coverage. If this happens on a GitHub action, try rerunning the action.
-
> [!NOTE]
> Due to the restricted performance on GitHub actions runners, you may also sometimes see hypothesis tests running with errors because they exceeded test deadlines. Rerun the action if this happens.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

47-47: Blank line inside blockquote
null

(MD028, no-blanks-blockquote)

tests/test_kubocas_auth.py (1)

8-9: Consider using asyncio.iscoroutine for better specificity.

The _maybe_await helper function works but could be more specific about what it's checking for.

-async def _maybe_await(x):
-    return await x if inspect.isawaitable(x) else x
+import asyncio
+
+async def _maybe_await(x):
+    return await x if asyncio.iscoroutine(x) else x
tests/test_kubo_cas.py (1)

111-132: Remove commented out code.

This large block of commented code should be removed to keep the codebase clean.

-# @given(data=ipld_strategy())
-# @settings(deadline=1000)
-# @pytest.mark.asyncio
-# async def test_kubo_default_urls(data: IPLDKind):
-#     try:
-#         async with KuboCAS() as kubo_cas:
-#             for codec in ("raw", "dag-cbor"):
-#                 cid = await kubo_cas.save(dag_cbor.encode(data), codec=codec)
-#                 result = dag_cbor.decode(await kubo_cas.load(cid))
-#                 assert data == result
-
-#         async with KuboCAS(gateway_base_url=None, rpc_base_url=None) as kubo_cas:
-#             for codec in ("raw", "dag-cbor"):
-#                 cid = await kubo_cas.save(dag_cbor.encode(data), codec=codec)
-#                 result = dag_cbor.decode(await kubo_cas.load(cid))
-#                 assert data == result
-#     finally:
-#         # if Hypothesis cancels early, make sure every open CAS is closed
-#         for obj in list(globals().values()):
-#             if isinstance(obj, KuboCAS):
-#                 await obj.aclose()
tests/test_kubocas_session.py (3)

1-8: Add module docstring to describe the test module's purpose.

The module is missing documentation that explains what it tests.

+"""Tests for KuboCAS session management across different asyncio event loops."""
 import asyncio
 import inspect
 from threading import Thread, Event
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 7-7: Unable to import 'py_hamt.store'

(E0401)


26-27: Consider the implications of testing protected members.

The tests are accessing protected members (_loop_session, _session_per_loop) of the KuboCAS class. While this provides thorough testing of internal behavior, it creates tight coupling between tests and implementation details. Consider whether these should be exposed as public API or if the tests should focus on observable behavior through public interfaces.

Also applies to: 49-49, 68-68, 90-90

🧰 Tools
🪛 Pylint (3.3.7)

[warning] 26-26: Access to a protected member _loop_session of a client class

(W0212)


[warning] 27-27: Access to a protected member _loop_session of a client class

(W0212)


56-79: Consider improving thread management and error handling.

The helper function works correctly but could benefit from more robust error handling and cleanup.

 async def _session_in_new_loop(kubo: KuboCAS):
     """Spawn a *separate* thread with its own event loop, obtain a session for
     that loop and hand both the session and the thread‑local loop object
     back to the caller."""
     ready = Event()
-    container: list = []  # will receive (session, loop)
+    result = []
+    error = []
 
     def _worker():
         new_loop = asyncio.new_event_loop()
         asyncio.set_event_loop(new_loop)
 
         async def _get():
             return await _maybe_await(kubo._loop_session())
 
         try:
             session = new_loop.run_until_complete(_get())
-            container.append((session, new_loop))
+            result.append((session, new_loop))
+        except Exception as e:
+            error.append(e)
         finally:
             ready.set()
 
     t = Thread(target=_worker, daemon=True)
     t.start()
     ready.wait()
-    return container[0]
+    
+    if error:
+        raise error[0]
+    return result[0]
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 68-68: Access to a protected member _loop_session of a client class

(W0212)

tests/test_zarr_ipfs_encrypted.py (3)

1-11: Add module docstring.

+"""Tests for SimpleEncryptedZarrHAMTStore functionality."""
 import json
 import numpy as np
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 2-2: Unable to import 'numpy'

(E0401)


[error] 3-3: Unable to import 'pandas'

(E0401)


[error] 4-4: Unable to import 'xarray'

(E0401)


[error] 6-6: Unable to import 'zarr'

(E0401)


[error] 7-7: Unable to import 'zarr.core.buffer'

(E0401)


[error] 8-8: Unable to import 'Crypto.Random'

(E0401)


[error] 9-9: Unable to import 'py_hamt'

(E0401)


[error] 10-10: Unable to import 'py_hamt.zarr_hamt_store'

(E0401)


41-41: Remove print statements from tests.

Print statements in tests can clutter test output. Consider removing them or using proper logging if debugging information is needed.

-            print("\nTesting with a 31-byte key...")
             SimpleEncryptedZarrHAMTStore(hamt_write, False, short_key, header)
-            print("Failed as expected.")

         # --- Test with a key that is too long (33 bytes) ---
         long_key = get_random_bytes(33)
         with pytest.raises(
             ValueError, match="Encryption key must be exactly 32 bytes long."
         ):
-            print("Testing with a 33-byte key...")
             SimpleEncryptedZarrHAMTStore(hamt_write, False, long_key, header)
-            print("Failed as expected.")

         # --- Test with an empty key ---
         empty_key = b""
         with pytest.raises(
             ValueError, match="Encryption key must be exactly 32 bytes long."
         ):
-            print("Testing with an empty key...")
             SimpleEncryptedZarrHAMTStore(hamt_write, False, empty_key, header)
-            print("Failed as expected.")

         # --- Test with a correct key (should NOT raise) ---
         correct_key = get_random_bytes(32)
         try:
-            print("Testing with a 32-byte key...")
             SimpleEncryptedZarrHAMTStore(hamt_write, False, correct_key, header)
-            print("Initialized successfully (as expected).")
         except ValueError as e:
             pytest.fail(f"Should NOT raise ValueError for a 32-byte key, but got: {e}")

Also applies to: 43-43, 50-50, 52-52, 59-59, 61-61, 66-66, 68-68


99-99: Remove print statements from tests.

Consider using pytest's -s flag or proper logging instead of print statements for debugging.

Also applies to: 114-114, 121-121, 130-130, 132-132, 149-149, 172-172

tests/testing_utils.py (4)

1-16: Add module docstring.

+"""Testing utilities for IPFS-related tests including Hypothesis strategies and IPFS daemon fixtures."""
 import json
 import os
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 13-13: Unable to import 'hypothesis'

(E0401)


[error] 14-14: Unable to import 'hypothesis.strategies'

(E0401)


[error] 15-15: Unable to import 'multiformats'

(E0401)


33-44: Add function docstring.

 def ipld_strategy() -> SearchStrategy:
+    """Generate random IPLD data structures for testing."""
     return st.one_of(
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 33-33: Missing function or method docstring

(C0116)


57-60: Add function docstring.

 def find_free_port() -> int:
+    """Find and return an available TCP port on the host."""
     with socket.socket() as s:
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 57-57: Missing function or method docstring

(C0116)


63-64: Add fixture docstring.

 @pytest.fixture(scope="module")
 def create_ipfs() -> Generator[Tuple[str, str], None, None]:
+    """Create a temporary IPFS daemon instance for testing.
+    
+    Yields:
+        Tuple of (rpc_base_url, gateway_base_url) for the IPFS daemon.
+    """
     # Create temporary directory, set it as the IPFS Path
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 64-64: Missing function or method docstring

(C0116)

tests/test_hamt.py (1)

194-199: Minor style improvement: avoid Yoda conditions.

Rewrite the conditions in a more natural order for better readability.

-    assert b"bar" == await hamt.get("foo")
+    assert await hamt.get("foo") == b"bar"
     assert (await hamt.len()) == 1

     await hamt.set("foo", bytes("something else", "utf-8"))
     assert (await hamt.len()) == 1
-    assert b"something else" == (await hamt.get("foo"))
+    assert (await hamt.get("foo")) == b"something else"
🧰 Tools
🪛 Ruff (0.11.9)

194-194: Yoda condition detected

Rewrite as await hamt.get("foo") == b"bar"

(SIM300)


199-199: Yoda condition detected

Rewrite as (await hamt.get("foo")) == b"something else"

(SIM300)

py_hamt/encryption_hamt_store.py (1)

1-7: Consider reorganizing imports according to PEP 8 convention.

The imports should be ordered as: standard library, third-party, then local imports.

+from typing import cast
+
 import zarr.abc.store
 import zarr.core.buffer
-from py_hamt.hamt import HAMT
 from Crypto.Cipher import ChaCha20_Poly1305
 from Crypto.Random import get_random_bytes
+
+from py_hamt.hamt import HAMT
 from py_hamt.zarr_hamt_store import ZarrHAMTStore
-from typing import cast
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 1-1: Unable to import 'zarr.abc.store'

(E0401)


[error] 2-2: Unable to import 'zarr.core.buffer'

(E0401)


[error] 4-4: Unable to import 'Crypto.Cipher'

(E0401)


[error] 5-5: Unable to import 'Crypto.Random'

(E0401)


[convention] 4-4: third party import "Crypto.Cipher.ChaCha20_Poly1305" should be placed before first party import "py_hamt.hamt.HAMT"

(C0411)


[convention] 5-5: third party import "Crypto.Random.get_random_bytes" should be placed before first party import "py_hamt.hamt.HAMT"

(C0411)


[convention] 7-7: standard import "typing.cast" should be placed before third party imports "zarr.abc.store", "zarr.core.buffer", "Crypto.Cipher.ChaCha20_Poly1305", "Crypto.Random.get_random_bytes" and first party imports "py_hamt.hamt.HAMT", "py_hamt.zarr_hamt_store.ZarrHAMTStore"

(C0411)


[convention] 6-6: Imports from package py_hamt are not grouped

(C0412)

py_hamt/zarr_hamt_store.py (1)

153-153: Remove unnecessary parentheses.

-if not (await self.exists(key)):
+if not await self.exists(key):
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 153-153: Unnecessary parens after 'not' keyword

(C0325)

py_hamt/store.py (1)

257-260: Combine nested context managers for cleaner code.

-async with self._sem:  # throttle gateway
-    async with self._loop_session().get(url) as resp:
-        resp.raise_for_status()
-        return await resp.read()
+async with self._sem, self._loop_session().get(url) as resp:  # throttle gateway
+    resp.raise_for_status()
+    return await resp.read()
🧰 Tools
🪛 Ruff (0.11.9)

257-258: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

py_hamt/hamt.py (4)

156-156: Use a more specific exception type.

Consider using a more specific exception like RuntimeError or creating a custom exception class for better error handling.

Apply this diff:

-        raise Exception("Node was attempted to be written to the read cache")
+        raise RuntimeError("Node was attempted to be written to the read cache")
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 156-156: Raising too general exception: Exception

(W0719)


495-500: Simplify conditional assignment with ternary operator.

The if-else block can be simplified for better readability.

Apply this diff:

-        data: bytes
-        if self.values_are_bytes:
-            data = cast(
-                bytes, val
-            )  # let users get an exception if they pass in a non bytes when they want to skip encoding
-        else:
-            data = dag_cbor.encode(val)
+        data: bytes = cast(bytes, val) if self.values_are_bytes else dag_cbor.encode(val)
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 498-498: Line too long (103/100)

(C0301)

🪛 Ruff (0.11.9)

495-500: Use ternary operator data = cast(bytes, val) if self.values_are_bytes else dag_cbor.encode(val) instead of if-else-block

(SIM108)


556-557: Use a more specific exception type for consistency.

Similar to the previous comment, use a more specific exception type.

Apply this diff:

-        if self.read_only:
-            raise Exception("Cannot call delete on a read only HAMT")
+        if self.read_only:
+            raise RuntimeError("Cannot call delete on a read only HAMT")
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 557-557: Raising too general exception: Exception

(W0719)


491-492: Use a more specific exception type for consistency.

Use RuntimeError instead of generic Exception for better error handling.

Apply this diff:

-        if self.read_only:
-            raise Exception("Cannot call set on a read only HAMT")
+        if self.read_only:
+            raise RuntimeError("Cannot call set on a read only HAMT")
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 492-492: Raising too general exception: Exception

(W0719)

🛑 Comments failed to post (10)
py_hamt/__init__.py (1)

2-4: 💡 Verification agent

🧩 Analysis chain

Verify syntax error in store module.

The static analysis indicates a syntax error in py_hamt.store at line 23, which would prevent these imports from working correctly.


🏁 Script executed:

#!/bin/bash
# Description: Check the syntax of the store module and identify the specific error at line 23

# Check if the store.py file exists and examine line 23
if fd -t f -e py store.py py_hamt/; then
    echo "Found store.py files:"
    fd -t f -e py store.py py_hamt/ --exec echo "File: {}"
    
    # Show content around line 23 to identify syntax error
    fd -t f -e py store.py py_hamt/ --exec sed -n '20,26p' {} \; --exec echo "--- Line 23 context for: {} ---"
fi

# Try to validate Python syntax
fd -t f -e py store.py py_hamt/ --exec python -m py_compile {} \;

Length of output: 2035


py_hamt/store.py: fix invalid type-alias syntax

The import refactor is correct, but there’s a Python syntax error at line 23 in py_hamt/store.py. Python doesn’t use the type keyword for aliases—this prevents the module from compiling.

• In py_hamt/store.py (around line 23), change:

-    type CodecInput = Literal["raw", "dag-cbor"]
+    CodecInput = Literal["raw", "dag-cbor"]

• Ensure you have from typing import Literal (or, if you wish to use a TypeAlias, also import TypeAlias and write

CodecInput: TypeAlias = Literal["raw", "dag-cbor"]
```)
• Re-run the static checker (`python -m py_compile`) to confirm no further syntax errors.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Pylint (3.3.7)</summary>

[error] 2-2: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'

(E0001)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In py_hamt/store.py around line 23, there is a syntax error due to incorrect use
of the 'type' keyword for type aliasing. Replace the invalid syntax by importing
'Literal' from 'typing' and define the alias as 'CodecInput = Literal["raw",
"dag-cbor"]'. Alternatively, import 'TypeAlias' from 'typing' and define it as
'CodecInput: TypeAlias = Literal["raw", "dag-cbor"]'. After making this change,
run 'python -m py_compile' to ensure the syntax error is resolved.


</details>

</blockquote></details>
<details>
<summary>pyproject.toml (1)</summary><blockquote>

8-8: _💡 Verification agent_

<details>
<summary>🧩 Analysis chain</summary>

**Verify the aiohttp version requirement.**

The specific version requirement `aiohttp>=3.11.18` should be verified to ensure it exists and doesn't have known security vulnerabilities.

---

🌐 Web query:

```web
What is the latest stable version of aiohttp and are there any known security issues with version 3.11.18?

💡 Result:

As of June 2, 2025, the latest stable version of aiohttp is 3.11.14, released on March 16, 2025. (docs.aiohttp.org)

Regarding version 3.11.18, it does not appear in the available release history. The most recent release in the 3.11.x series is 3.11.14. (docs.aiohttp.org)

Additionally, according to the Safety DB, the latest version of aiohttp with no known security vulnerabilities is 4.0.0a1. (data.safetycli.com)

Therefore, if you're using version 3.11.18, it would be advisable to upgrade to the latest stable release, 3.11.14, or consider upgrading to version 4.0.0a1 for improved security.

Citations:


Update invalid aiohttp version requirement

The specifier aiohttp>=3.11.18 references a non-existent release. As of June 2025, the latest stable 3.11.x is 3.11.14 (no known security issues), so please adjust accordingly:

• File: pyproject.toml
• Line: 8

- aiohttp>=3.11.18
+ aiohttp>=3.11.14
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    "aiohttp>=3.11.14",
🤖 Prompt for AI Agents
In pyproject.toml at line 8, the aiohttp version specifier "aiohttp>=3.11.18"
references a non-existent release. Update this to "aiohttp>=3.11.14" to reflect
the latest stable 3.11.x version available as of June 2025, ensuring
compatibility and avoiding invalid version requirements.
tests/test_zarr_ipfs_encrypted.py (1)

201-205: 🛠️ Refactor suggestion

Avoid catching bare Exception - be more specific.

Catching bare Exception can hide unexpected errors. Be more specific about what exceptions you expect.

-        with pytest.raises(Exception):
+        with pytest.raises(NotImplementedError):
             await ezhs_read_ok.set("new_key", np.array([b"a"], dtype=np.bytes_))
 
-        with pytest.raises(Exception):
+        with pytest.raises(NotImplementedError):
             await ezhs_read_ok.delete("zarr.json")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        with pytest.raises(NotImplementedError):
            await ezhs_read_ok.set("new_key", np.array([b"a"], dtype=np.bytes_))

        with pytest.raises(NotImplementedError):
            await ezhs_read_ok.delete("zarr.json")
🧰 Tools
🪛 Ruff (0.11.9)

201-201: Do not assert blind exception: Exception

(B017)


204-204: Do not assert blind exception: Exception

(B017)

🤖 Prompt for AI Agents
In tests/test_zarr_ipfs_encrypted.py around lines 201 to 205, the code catches a
bare Exception which can obscure unexpected errors. Identify the specific
exception types that the set and delete operations are expected to raise and
replace the generic Exception in pytest.raises with those specific exceptions to
improve error handling clarity.
tests/testing_utils.py (2)

82-83: 🛠️ Refactor suggestion

Specify encoding when opening files.

Always specify encoding to avoid platform-dependent behavior.

     config_path: Path = temp_dir / "config"
     config: dict[str, Any]
-    with open(config_path, "r") as f:
+    with open(config_path, "r", encoding="utf-8") as f:
         config = json.load(f)

     # ... other code ...

-    with open(config_path, "w") as f:
+    with open(config_path, "w", encoding="utf-8") as f:
         json.dump(config, f, indent=2)

Also applies to: 99-100

🧰 Tools
🪛 Pylint (3.3.7)

[warning] 82-82: Using open without explicitly specifying an encoding

(W1514)

🤖 Prompt for AI Agents
In tests/testing_utils.py at lines 82-83 and also lines 99-100, the open() calls
do not specify the file encoding, which can lead to platform-dependent behavior.
Fix this by adding the encoding='utf-8' parameter to all open() calls in these
lines to ensure consistent file reading across different environments.

106-115: 💡 Verification agent

🧩 Analysis chain

Improve process management and startup detection.

The current implementation has two issues:

  1. subprocess.Popen should use a context manager or proper cleanup
  2. Fixed time.sleep(5) is fragile and may be insufficient on slow systems
-    ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
-        ["ipfs", "daemon"], env=custom_env
-    )
-    # Should be enough time for the ipfs daemon process to start up
-    time.sleep(5)
+    ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
+        ["ipfs", "daemon"], env=custom_env
+    )
+    
+    # Wait for daemon to be ready by polling the API
+    max_attempts = 30
+    for i in range(max_attempts):
+        try:
+            # Check if API is responsive
+            import urllib.request
+            with urllib.request.urlopen(f"{rpc_uri_stem}/api/v0/version", timeout=1):
+                break
+        except Exception:
+            if i == max_attempts - 1:
+                ipfs_process.kill()
+                raise RuntimeError("IPFS daemon failed to start within 30 seconds")
+            time.sleep(1)

Would you like me to provide a more robust implementation using subprocess context managers and proper health checking?


🏁 Script executed:

#!/bin/bash
set -e
rg --context 3 "rpc_uri_stem" -n tests/testing_utils.py

Length of output: 543


🏁 Script executed:

#!/bin/bash
set -e
sed -n '1,200p' tests/testing_utils.py

Length of output: 3573


Enhance IPFS daemon startup detection and ensure proper cleanup

The current fixture in tests/testing_utils.py (around lines 106–115) uses a fixed time.sleep(5) and doesn’t wrap the daemon in a context or guarantee termination on errors. Both of these can lead to flakes or orphaned processes.

Please update as follows:

  • Wrap the daemon in a context (or a try/finally) so it’s always terminated.
  • Replace the fixed sleep with a health-check loop against {rpc_uri_stem}/api/v0/version.

Suggested diff:

-    ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
-        ["ipfs", "daemon"], env=custom_env
-    )
-    # Should be enough time for the ipfs daemon process to start up
-    time.sleep(5)
-
-    yield rpc_uri_stem, gateway_uri_stem
-
-    # Close the daemon
-    ipfs_process.kill()
+    # Start IPFS daemon and ensure cleanup
+    with subprocess.Popen(["ipfs", "daemon"], env=custom_env) as ipfs_process:
+        # Poll the API until it responds or timeout
+        import urllib.request
+        max_attempts = 30
+        for attempt in range(max_attempts):
+            try:
+                with urllib.request.urlopen(f"{rpc_uri_stem}/api/v0/version", timeout=1):
+                    break
+            except Exception:
+                if attempt == max_attempts - 1:
+                    ipfs_process.kill()
+                    raise RuntimeError("IPFS daemon failed to start within 30 seconds")
+                time.sleep(1)
+
+        yield rpc_uri_stem, gateway_uri_stem
+    # Exiting the context will terminate the daemon

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 106-108: Consider using 'with' for resource-allocating operations

(R1732)

🤖 Prompt for AI Agents
In tests/testing_utils.py around lines 106 to 115, replace the fixed
time.sleep(5) with a loop that repeatedly checks the IPFS daemon's health by
querying {rpc_uri_stem}/api/v0/version until it responds or a timeout occurs.
Also, wrap the subprocess.Popen call in a try/finally block or use a context
manager to ensure the IPFS daemon process is always properly terminated, even if
errors occur during startup or test execution.
.github/workflows/run-checks.yaml (1)

32-34: 💡 Verification agent

🧩 Analysis chain

Verify the pinned uv version is stable and secure.

The action is pinned to a specific commit hash, which is good for security. However, ensure this commit represents a stable release.


🌐 Web query:

What is the latest stable version of astral-sh/setup-uv GitHub Action and are there any security advisories?

💡 Result:

The latest stable version of the astral-sh/setup-uv GitHub Action is v5.3.0, released on June 1, 2025. This release introduces support for MUSL-based systems, including Alpine Linux, and adds compatibility for s390x and powerpc architectures. Additionally, it enhances debugging capabilities with more detailed logs. (newreleases.io)

As of now, there are no published security advisories for this action. The project's security overview indicates that no security policy has been set up, and there are no published security advisories. (github.com)

For more details on the latest release, you can refer to the GitHub repository's releases page. (github.com)

Citations:


Ensure you’re using a stable release of setup-uv

The workflow currently pins astral-sh/setup-uv to a commit on the unreleased “v6” branch, but the latest stable release is v5.3.0 (no security advisories exist for that version). Please either:

  • Update the workflow to:
        uses: astral-sh/setup-uv@v5.3.0
        with:
          version: "latest"
  • Or add a comment justifying the use of the v6 commit (e.g., necessary features or patches) and confirm its stability with the project maintainers.

• File: .github/workflows/run-checks.yaml (lines 32–34)

🤖 Prompt for AI Agents
In .github/workflows/run-checks.yaml around lines 32 to 34, the action
astral-sh/setup-uv is pinned to a commit on the unreleased v6 branch, which may
not be stable. Update the workflow to use the latest stable release by changing
the uses line to "astral-sh/setup-uv@v5.3.0" while keeping the version input as
"latest". Alternatively, if v6 is required, add a comment explaining why and
confirm its stability with the maintainers.
tests/test_zarr_ipfs.py (1)

114-115: ⚠️ Potential issue

Remove redundant self-comparison.

The comparison zhs == zhs is redundant and doesn't test meaningful equality.

-        assert zhs == zhs
         assert zhs != hamt
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        assert zhs != hamt
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 114-114: Redundant comparison - zhs == zhs

(R0124)

🤖 Prompt for AI Agents
In tests/test_zarr_ipfs.py at lines 114 to 115, remove the redundant assertion
`assert zhs == zhs` as it compares the variable to itself and does not provide
any meaningful test. Keep the assertion `assert zhs != hamt` which tests useful
inequality.
py_hamt/encryption_hamt_store.py (1)

172-173: 🛠️ Refactor suggestion

Use a more specific exception type.

Raising a generic Exception is discouraged. Consider using a more specific exception type.

 if self.read_only:
-    raise Exception("Cannot write to a read-only store.")
+    raise ValueError("Cannot write to a read-only store.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        if self.read_only:
            raise ValueError("Cannot write to a read-only store.")
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 173-173: Raising too general exception: Exception

(W0719)

🤖 Prompt for AI Agents
In py_hamt/encryption_hamt_store.py at lines 172-173, replace the generic
Exception raised when attempting to write to a read-only store with a more
specific exception type, such as RuntimeError or a custom exception class that
clearly indicates the operation is not allowed due to read-only status. This
improves error clarity and handling.
py_hamt/store.py (1)

23-23: ⚠️ Potential issue

Fix type alias syntax for Python compatibility.

The type alias syntax is causing a parsing error. Use the compatible syntax:

-type CodecInput = Literal["raw", "dag-cbor"]
+CodecInput = Literal["raw", "dag-cbor"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    CodecInput = Literal["raw", "dag-cbor"]
🧰 Tools
🪛 Pylint (3.3.7)

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)

🤖 Prompt for AI Agents
In py_hamt/store.py at line 23, the type alias syntax using '=' is causing a
parsing error in Python. Replace the current type alias declaration with the
compatible syntax using 'TypeAlias' from the typing module, ensuring the alias
is defined as 'CodecInput: TypeAlias = Literal["raw", "dag-cbor"]' to fix the
syntax issue.
py_hamt/hamt.py (1)

122-124: ⚠️ Potential issue

Chain the exception for better error context.

When re-raising exceptions, use from exc to preserve the original exception chain for better debugging.

Apply this diff:

-        except:  # noqa: E722
-            raise Exception(
-                "Invalid dag-cbor encoded data from the store was attempted to be decoded"
-            )
+        except Exception as exc:
+            raise Exception(
+                "Invalid dag-cbor encoded data from the store was attempted to be decoded"
+            ) from exc
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        except Exception as exc:
            raise Exception(
                "Invalid dag-cbor encoded data from the store was attempted to be decoded"
            ) from exc
🧰 Tools
🪛 Pylint (3.3.7)

[warning] 122-124: Consider explicitly re-raising using 'except Exception as exc' and 'raise Exception('Invalid dag-cbor encoded data from the store was attempted to be decoded') from exc'

(W0707)


[warning] 122-124: Raising too general exception: Exception

(W0719)

🪛 Ruff (0.11.9)

122-124: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In py_hamt/hamt.py around lines 122 to 124, the exception is raised without
chaining the original exception, which loses the error context. Modify the raise
statement to use 'raise Exception(...) from exc' where 'exc' is the caught
exception variable, so the original exception is preserved and debugging is
improved.

@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (abcedaf) to head (c11324f).

Additional details and impacted files
@@            Coverage Diff             @@
##           refactor       #55   +/-   ##
==========================================
  Coverage    100.00%   100.00%           
==========================================
  Files             5         5           
  Lines           591       593    +2     
==========================================
+ Hits            591       593    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@abidsikder abidsikder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea that interops with the current aiohttp session management 👍

@Faolain Faolain merged commit b843127 into refactor Jun 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants