refactor: HAMT to use async and other performance optimizations #44

abidsikder · 2025-05-19T22:25:08Z

This includes many breaking changes, hence the major version update to 3.0.0. The high level overview is that this library was rewritten for better performance for large key-value mapping sets. Major changes to note include:

smaller internal HAMT node size and simpler Node structure
async everywhere, which allows for concurrent writes of large data blocks before actually modifying the tree
an in memory HAMT that can be flushed at the end of many writes
removing zarr encryption transformers in favor of letting users setting unserialized bytes directly. This lets users handle any of their own transformations and encryption if they want
kept the read cache
zarr store interface with the HAMT that takes advantage of all of these optimizations

Addresses #12

Summary by CodeRabbit

New Features
- Introduced asynchronous HAMT implementation with improved concurrency, memory management, and simplified internal structure.
- Added ZarrHAMTStore and SimpleEncryptedZarrHAMTStore for async Zarr v3 storage backed by HAMT, including full encryption support.
- Added async content-addressed store interfaces: ContentAddressedStore, KuboCAS (IPFS-backed), and InMemoryCAS.
- Enhanced GitHub Actions workflows for deployment and CI with improved permissions and job structuring.
Bug Fixes
- Improved error handling and removed legacy/deprecated features.
Refactor
- Major API changes: replaced and renamed several public classes and methods for clarity and async support.
- Internal codebase modernized to async/await paradigm; removed synchronous interfaces and legacy transformer hooks.
- Simplified README to focus on high-level descriptions and external documentation.
Chores
- Updated dependencies: added aiohttp, updated zarr and xarray specs, added mypy and pandas-stubs.
- Added .DS_Store to .gitignore.
- Added pre-commit configuration for code quality enforcement.
- Improved test setup with external IPFS fixture and shared HTTP session.
Tests
- Converted all tests to async, added new performance and integration tests including encrypted Zarr storage.
- Improved test coverage for concurrency, error cases, and store implementations.
- Added utilities for generating IPLD test data and managing IPFS daemon lifecycle.
- Added tests for session management and authentication handling in IPFS client.
- Removed manual IPFS daemon management in tests, relying on Docker or existing daemons.
Revert
- Removed encryption utilities and old Zarr integration classes (IPFSZarr3, encryption transformers).

…ory tree, incomplete

…ed nodes

…th testing

… store

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

py_hamt/store.py (1)
23-23: Fix the type alias syntax to resolve parsing error.

The current type alias syntax is causing a Pylint parsing error. Use the standard typing syntax instead.
-    type CodecInput = Literal["raw", "dag-cbor"]
+    CodecInput = Literal["raw", "dag-cbor"]
🧰 Tools

🪛 Pylint (3.3.7)

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)
tests/test_zarr_ipfs.py (1)

125-138: Consider consolidating duplicate test logic.

The test logic for verifying list() functionality is duplicated across lines 125-133. While testing robustness is valuable, this duplication could be streamlined.

The pattern of collecting keys from zhs.list() multiple times is redundant. Consider consolidating or adding comments explaining why multiple identical assertions are necessary.
py_hamt/zarr_hamt_store.py (1)
91-118: ⚠️ Potential issue

The byte_range parameter is ignored – prevents efficient partial reads.

Zarr may request slices of a chunk via byte_range. Returning the full buffer forces callers to load and allocate the entire blob, negating the benefit of the HAMT backing store for very large chunks.

At minimum, honor the range when values are bytes:
-            return prototype.buffer.from_bytes(val)
+            if byte_range:
+                start, stop = byte_range
+                val = val[start:stop]
+            return prototype.buffer.from_bytes(val)
If partial reads are intentionally unsupported, expose supports_partial_reads = False (mirroring how supports_partial_writes is declared).

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 100-100: Line too long (138/100)

(C0301)

[convention] 110-110: Line too long (103/100)

(C0301)

[convention] 116-116: Line too long (102/100)

(C0301)

[warning] 95-95: Unused argument 'byte_range'

(W0613)

🧹 Nitpick comments (4)

py_hamt/store.py (1)
266-267: Consider combining nested context managers.

The static analysis suggests combining the nested async with statements for cleaner code.
-        async with self._sem:  # throttle gateway
-            async with self._loop_session().get(url) as resp:
+        async with self._sem, self._loop_session().get(url) as resp:  # throttle gateway
🧰 Tools

🪛 Ruff (0.11.9)

266-267: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)
README.md (1)
50-50: Fix grammar: use "and" instead of "as well as" after "both".
-We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`.
+We use `pytest` with 100% code coverage, and with test inputs that are both handwritten and generated by `hypothesis`.
🧰 Tools

🪛 LanguageTool

[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)
tests/test_zarr_ipfs.py (1)
61-181: Consider breaking down the test function for better maintainability.

This test function contains 80+ statements and 23+ local variables, making it difficult to maintain and debug. Consider splitting it into focused test functions:
@pytest.mark.asyncio(loop_scope="session")
async def test_zarr_write_to_ipfs(create_ipfs, random_zarr_dataset):
    """Test writing zarr dataset to IPFS and getting CID."""
    # Write logic here
    
@pytest.mark.asyncio(loop_scope="session") 
async def test_zarr_read_from_ipfs(create_ipfs, random_zarr_dataset):
    """Test reading zarr dataset from IPFS using CID."""
    # Read logic here
    
@pytest.mark.asyncio(loop_scope="session")
async def test_zarr_store_capabilities(create_ipfs):
    """Test store capability flags and listing operations."""
    # Store capability tests here
This would improve test isolation, debugging, and maintenance.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 62-62: Missing function or method docstring

(C0116)

[refactor] 62-62: Too many local variables (23/15)

(R0914)

[warning] 64-64: Redefining name 'random_zarr_dataset' from outer scope (line 18)

(W0621)

[refactor] 115-115: Redundant comparison - zhs == zhs

(R0124)

[refactor] 62-62: Too many statements (80/50)

(R0915)
py_hamt/zarr_hamt_store.py (1)
153-153: Remove unnecessary parentheses.
-        if not (await self.exists(key)):
+        if not await self.exists(key):
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 153-153: Unnecessary parens after 'not' keyword

(C0325)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abcedaf and 43fa124.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

README.md (2 hunks)
py_hamt/store.py (1 hunks)
py_hamt/zarr_hamt_store.py (1 hunks)
pyproject.toml (2 hunks)
run-checks.sh (1 hunks)
tests/test_kubocas_auth.py (1 hunks)
tests/test_zarr_ipfs.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

🧰 Additional context used

🧠 Learnings (1)

py_hamt/zarr_hamt_store.py (1)

Learnt from: abidsikder
PR: dClimate/py-hamt#44
File: py_hamt/zarr_hamt_store.py:180-192
Timestamp: 2025-05-30T15:30:59.357Z
Learning: In the ZarrHAMTStore.list_dir method in py_hamt/zarr_hamt_store.py, there's a potential for duplicate directory names to be yielded when multiple files share the same subdirectory. While this could be fixed with a set to track yielded names, this approach would consume significant memory for large datasets (potentially gigabytes for millions of keys like ERA5 zarr). The memory usage concern outweighs the fix unless it's confirmed that zarr actually calls list_dir on data variable directories in practice, rather than using assumed chunk keys from zarr.json metadata.

🧬 Code Graph Analysis (3)

tests/test_zarr_ipfs.py (4)

py_hamt/hamt.py (7)

HAMT (287-696)

build (385-394)

make_read_only (397-408)

keys (664-678)

get (592-599)

enable_write (410-417)

delete (552-590)

py_hamt/store.py (2)

KuboCAS (74-269)

InMemoryCAS (37-71)

py_hamt/zarr_hamt_store.py (8)

ZarrHAMTStore (10-251)

read_only (81-83)

exists (127-133)

list_prefix (189-193)

get (91-117)

set_if_not_exists (151-154)

delete (167-177)

list_dir (195-251)

tests/testing_utils.py (1)

create_ipfs (64-118)

tests/test_kubocas_auth.py (1)

py_hamt/store.py (3)

KuboCAS (74-269)

_loop_session (214-226)

aclose (231-235)

py_hamt/zarr_hamt_store.py (2)

py_hamt/hamt.py (4)

HAMT (287-696)

get (592-599)

len (686-696)

keys (664-678)

py_hamt/encryption_hamt_store.py (1)

get (144-168)

🪛 Pylint (3.3.7)

tests/test_zarr_ipfs.py

[error] 7-7: Unable to import 'zarr'

(E0401)

[error] 8-8: Unable to import 'zarr.core.buffer'

(E0401)

[error] 9-9: Unable to import 'dag_cbor.ipld'

(E0401)

[error] 11-11: Unable to import 'py_hamt'

(E0401)

[error] 13-13: Unable to import 'py_hamt.zarr_hamt_store'

(E0401)

[error] 14-14: Unable to import 'py_hamt.store'

(E0401)

[convention] 62-62: Missing function or method docstring

(C0116)

[refactor] 62-62: Too many local variables (23/15)

(R0914)

[warning] 64-64: Redefining name 'random_zarr_dataset' from outer scope (line 18)

(W0621)

[refactor] 115-115: Redundant comparison - zhs == zhs

(R0124)

[refactor] 62-62: Too many statements (80/50)

(R0915)

[convention] 184-184: Missing function or method docstring

(C0116)

tests/test_kubocas_auth.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 2-2: Unable to import 'aiohttp'

(E0401)

[error] 5-5: Unable to import 'py_hamt.store'

(E0401)

[convention] 13-13: Missing function or method docstring

(C0116)

[warning] 16-16: Access to a protected member _loop_session of a client class

(W0212)

[convention] 25-25: Missing function or method docstring

(C0116)

[warning] 27-27: Access to a protected member _loop_session of a client class

(W0212)

[warning] 44-44: Access to a protected member _loop_session of a client class

(W0212)

[warning] 67-67: Access to a protected member _loop_session of a client class

(W0212)

[convention] 4-4: standard import "inspect" should be placed before third party imports "aiohttp", "pytest"

(C0411)

py_hamt/zarr_hamt_store.py

[convention] 20-20: Line too long (195/100)

(C0301)

[convention] 25-25: Line too long (249/100)

(C0301)

[convention] 58-58: Line too long (251/100)

(C0301)

[convention] 59-59: Line too long (106/100)

(C0301)

[convention] 62-62: Line too long (110/100)

(C0301)

[convention] 65-65: Line too long (434/100)

(C0301)

[convention] 74-74: Line too long (113/100)

(C0301)

[convention] 100-100: Line too long (138/100)

(C0301)

[convention] 110-110: Line too long (103/100)

(C0301)

[convention] 116-116: Line too long (102/100)

(C0301)

[convention] 153-153: Unnecessary parens after 'not' keyword

(C0325)

[convention] 171-171: Line too long (142/100)

(C0301)

[convention] 175-175: Line too long (142/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'zarr.abc.store'

(E0401)

[error] 4-4: Unable to import 'zarr.core.buffer'

(E0401)

[error] 5-5: Unable to import 'zarr.core.common'

(E0401)

[warning] 95-95: Unused argument 'byte_range'

(W0613)

py_hamt/store.py

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)

🪛 Ruff (0.11.9)

py_hamt/store.py

266-267: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

🪛 LanguageTool

README.md

[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)

🪛 markdownlint-cli2 (0.17.2)

README.md

54-54: Blank line inside blockquote
null

(MD028, no-blanks-blockquote)

🔇 Additional comments (17)

run-checks.sh (1)

9-10: LGTM! Good consolidation of code quality checks.

Replacing separate ruff check and ruff format commands with pre-commit provides better consistency between local development and CI workflows.
py_hamt/store.py (5)

1-9: LGTM! Proper async imports added.

The imports correctly reflect the shift to async programming with asyncio and aiohttp replacing the synchronous requests library.

37-71: Excellent async conversion with robust error handling.

The InMemoryCAS implementation demonstrates good practices:

Proper async method signatures

Defensive type checking with clear error messages

Good use of cast with detailed explanation for Liskov substitution

Improved exception handling with chained exceptions

74-210: Well-designed session management for async operations.

The KuboCAS class successfully addresses the previous blocking requests issue by:

Using aiohttp.ClientSession for true async HTTP operations

Implementing per-event-loop session management to handle zarr's separate thread execution

Providing flexible authentication via user-supplied sessions or constructor parameters

Including comprehensive documentation with clear usage examples

246-269: Excellent async HTTP implementation with proper throttling.

The save and load methods demonstrate best practices:

Proper use of aiohttp for non-blocking operations

Semaphore-based concurrency throttling

Correct form data handling for IPFS uploads

Error handling with raise_for_status()

The implementation successfully resolves the previous issue of blocking requests in async methods.

🧰 Tools

🪛 Ruff (0.11.9)

266-267: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

231-241:
✅ Verification successful

Verify async context manager test coverage.

The async context manager methods (aclose, __aenter__, __aexit__) provide proper session lifecycle management. However, past review comments indicated these methods lacked test coverage.

🏁 Script executed:
#!/bin/bash
# Description: Check if async context manager methods are now covered by tests
# Expected: Find test coverage for these methods in the new test files

echo "Searching for async context manager tests..."
rg -A 5 -B 5 "async with.*KuboCAS|aclose|__aenter__|__aexit__" tests/

echo -e "\nSearching for session cleanup tests..."
rg -A 5 -B 5 "_owns_session|_session_per_loop" tests/
Length of output: 15874
Async context manager methods are now fully covered by tests

I’ve confirmed that the __aenter__, __aexit__, and aclose() behaviors are exercised across multiple test modules:

tests/test_kubocas_session.py

tests/test_kubocas_auth.py

tests/test_zarr_ipfs.py

tests/test_zarr_ipfs_encrypted.py

tests/test_kubo_cas.py

No further changes are needed here.
README.md (2)

33-41: Good addition of pre-commit setup instructions.

The explicit commands for activating the virtual environment and installing pre-commit hooks align well with the consolidated approach in run-checks.sh.

52-56: Good use of blockquotes for important notes.

The highlighted notes about test coverage variability and GitHub Actions timeouts provide helpful context for developers and CI troubleshooting.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

54-54: Blank line inside blockquote
null

(MD028, no-blanks-blockquote)

tests/test_kubocas_auth.py (3)

8-9: Smart helper function for handling async variations.

The _maybe_await utility elegantly handles both awaitable and non-awaitable returns, making the tests more robust.

13-31: Excellent test coverage for session management scenarios.

These tests thoroughly verify the critical authentication and session lifecycle behaviors:

User-supplied sessions are reused without modification

External sessions are not closed by aclose()

Internal sessions are properly configured and closed

This addresses the test coverage gaps mentioned in previous reviews.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 13-13: Missing function or method docstring

(C0116)

[warning] 16-16: Access to a protected member _loop_session of a client class

(W0212)

[convention] 25-25: Missing function or method docstring

(C0116)

[warning] 27-27: Access to a protected member _loop_session of a client class

(W0212)

34-73: Comprehensive authentication testing.

The test suite covers both aiohttp.BasicAuth scenarios:

External sessions preserve authentication unchanged

Internal sessions correctly apply constructor-provided auth

Proper session ownership and cleanup behavior

This provides confidence in the authentication handling refactor.

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 44-44: Access to a protected member _loop_session of a client class

(W0212)

[warning] 67-67: Access to a protected member _loop_session of a client class

(W0212)

tests/test_zarr_ipfs.py (2)

71-89: Excellent async refactoring with proper resource management.

The migration to async context management with KuboCAS and the new HAMT.build() pattern is well implemented. The resource cleanup via context manager and proper session handling addresses the session management challenges mentioned in the PR objectives.

183-192: Good addition of deduplication test.

The new test_list_dir_dedup function properly tests the directory listing deduplication functionality, which was a known concern from previous reviews.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 184-184: Missing function or method docstring

(C0116)

py_hamt/zarr_hamt_store.py (4)

10-51: Excellent documentation and API design.

The class documentation is comprehensive with clear usage examples and important notes about read-only mode handling. The sample code snippet effectively demonstrates the write/read workflow.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 20-20: Line too long (195/100)

(C0301)

[convention] 25-25: Line too long (249/100)

(C0301)

53-78: Good validation in constructor with helpful documentation.

The assertions ensuring HAMT read-only state matches the store state and values_are_bytes=True prevent common configuration errors. The metadata cache explanation is valuable for understanding performance optimizations.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 58-58: Line too long (251/100)

(C0301)

[convention] 59-59: Line too long (106/100)

(C0301)

[convention] 62-62: Line too long (110/100)

(C0301)

[convention] 65-65: Line too long (434/100)

(C0301)

[convention] 74-74: Line too long (113/100)

(C0301)

195-251: list_dir deduplication implemented with memory considerations.

Based on retrieved learnings, this implementation correctly addresses duplicate directory name issues but may consume significant memory for large datasets (potentially gigabytes for millions of keys like ERA5 zarr).

The current approach using a set for deduplication is appropriate unless confirmed that zarr calls list_dir on data variable directories in practice rather than using assumed chunk keys from zarr.json metadata. The thorough documentation explains the behavior clearly.

145-149: Efficient metadata cache management in write operations.

The implementation correctly updates the metadata cache when writing to metadata files, maintaining consistency between the cache and HAMT storage.

…kip integration tests

chore: add sort to ruff to standardize import order

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (9)

tests/test_kubo_cas.py (1)
46-109: ⚠️ Potential issue

Test still requires external IPFS daemon - use isolated fixture instead.

This test has the same brittleness issue identified in the previous review. It assumes a Kubo daemon is running on default ports 127.0.0.1:5001/8080, which will cause failures in CI or developer environments without a running daemon.

Use the existing create_ipfs fixture to ensure an isolated daemon:
 @pytest.mark.asyncio(loop_scope="session")
 @given(data=ipld_strategy())
 @settings(
     deadline=1000, print_blob=True
 )  # Increased deadline, print_blob for debugging
-async def test_kubo_default_urls(
-    global_client_session, data: IPLDKind
-):  # Inject the session fixture
+async def test_kubo_default_urls(
+    create_ipfs, global_client_session, data: IPLDKind
+):  # Inject both fixtures
     """
-    Tests KuboCAS using its default URLs and when None is passed for URLs,
+    Tests KuboCAS using isolated IPFS daemon URLs and when None is passed for URLs,
     leveraging a globally managed aiohttp.ClientSession.
     """
+    rpc_base_url, gateway_base_url = create_ipfs
     # Test Case 1: KuboCAS instantiated without explicit URLs (should use its defaults)
     # We pass the managed global_client_session to it.
-    # KuboCAS itself is responsible for having default URLs if none are provided.
-    async with KuboCAS(session=global_client_session) as kubo_cas_default:
+    async with KuboCAS(
+        rpc_base_url=rpc_base_url, 
+        gateway_base_url=gateway_base_url,
+        session=global_client_session
+    ) as kubo_cas_default:
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 62-62: Line too long (127/100)

(C0301)

[convention] 92-92: Line too long (128/100)

(C0301)

[warning] 82-82: Catching too general exception Exception

(W0718)

[warning] 107-107: Catching too general exception Exception

(W0718)
tests/testing_utils.py (1)
56-60: ⚠️ Potential issue

Critical TOCTOU race condition needs fixing.

This function has a Time-of-Check-Time-of-Use race where the port can be taken by another process between when the socket closes and when IPFS daemon starts.

Apply this fix to eliminate the race condition:
-def find_free_port() -> int:
-    with socket.socket() as s:
-        s.bind(("", 0))  # Bind to a free port provided by the host.
-        return int(s.getsockname()[1])  # Return the port number assigned.
+def find_free_port() -> tuple[int, socket.socket]:
+    """Reserve a free TCP port and keep the socket open until the caller
+    explicitly closes it, eliminating the TOCTOU window."""
+    s = socket.socket()
+    s.bind(("", 0))
+    port = s.getsockname()[1]
+    return port, s
You'll need to update the callers to handle the returned socket and close it after the IPFS daemon starts.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 56-56: Missing function or method docstring

(C0116)
tests/test_hamt.py (1)
52-83: Remove misleading comment about lock acquisition.

The comment about callers handling lock acquisition is incorrect for this context since the test operates in read-only mode where no locking is needed.
-    # Make sure all pointers actually exist in the store, this should not raise any exceptions
+    # Verify that all pointers exist in the store (no locking needed in read-only mode)
     for k, _ in kvs:
         pointer: bytes = cast(
             bytes, await hamt.get_pointer(k)
         )  # we know that InMemoryCAS only returns bytes for its IDs
         await cas.load(pointer)
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 52-52: Line too long (170/100)

(C0301)

[convention] 55-55: Line too long (116/100)

(C0301)
py_hamt/encryption_hamt_store.py (2)

32-32: Documentation inconsistency: metadata caching is implemented.

The documentation states "There is no metadata caching", but the implementation actually includes metadata caching (see lines 111, 156, 167, 179).

77-79: Sample code uses incorrect HAMT instantiation.

The sample shows direct HAMT instantiation, but based on the async architecture, it should use await HAMT.build().

py_hamt/zarr_hamt_store.py (1)

92-119: byte_range parameter is ignored – prevents efficient partial reads

Zarr may request slices of a chunk via byte_range. Returning the full buffer forces callers to load and allocate the entire blob, negating the benefit of the HAMT backing store for very large chunks.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 101-101: Line too long (138/100)

(C0301)

[convention] 111-111: Line too long (103/100)

(C0301)

[convention] 117-117: Line too long (102/100)

(C0301)

[warning] 96-96: Unused argument 'byte_range'

(W0613)

py_hamt/store.py (2)

23-23: Fix the type alias syntax.

The type alias syntax on line 23 is causing a parsing error. Use the traditional type alias syntax for compatibility.

🧰 Tools

🪛 Pylint (3.3.7)

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)

231-241: Add test coverage for async context manager methods.

The async context manager functionality needs proper test coverage to ensure session cleanup works correctly.
py_hamt/hamt.py (1)
121-124: Fix exception handling with explicit chaining.

This issue was flagged in past reviews and needs to be addressed. The bare except clause masks the original error and hampers debugging.

Apply the previously suggested fix:
-        except:  # noqa: E722
-            raise Exception(
-                "Invalid dag-cbor encoded data from the store was attempted to be decoded"
-            )
+        except Exception as err:
+            raise ValueError(
+                "Invalid DAG-CBOR data returned by CAS (unable to decode Node)"
+            ) from err
🧰 Tools

🪛 Pylint (3.3.7)

[warning] 122-124: Consider explicitly re-raising using 'except Exception as exc' and 'raise Exception('Invalid dag-cbor encoded data from the store was attempted to be decoded') from exc'

(W0707)

[warning] 122-124: Raising too general exception: Exception

(W0719)

🧹 Nitpick comments (15)

py_hamt/__init__.py (1)
6-13: Consider adding a module docstring.

Given this is a major version update with breaking changes, a module docstring explaining the new API structure would be helpful for users upgrading from v2.x.
+"""
+py-hamt v3.0.0 - Async Hash Array Mapped Trie with Content-Addressed Storage
+
+Major changes in v3.0.0:
+- Full async/await support throughout the library
+- Content-addressed storage interface
+- Performance optimizations for large datasets
+- Breaking changes from v2.x synchronous API
+"""
+
 from .encryption_hamt_store import SimpleEncryptedZarrHAMTStore
tests/conftest.py (1)
1-1: Consider adding a module docstring.

A docstring explaining the purpose of this test configuration module would be helpful.
+"""
+Test configuration and shared fixtures for py-hamt test suite.
+
+Provides async HTTP session management and re-exports testing utilities
+for backward compatibility with existing test imports.
+"""
+
 import aiohttp
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 1-1: Unable to import 'aiohttp'

(E0401)
tests/performance_tests.py (1)
16-40: Strong performance test implementation with minor style improvements needed.

The test effectively validates the performance goals mentioned in the PR objectives. The async design using asyncio.gather should demonstrate the concurrency benefits of the new implementation.

However, please address these style issues:
-    LARGE_KV_SET_SIZE: int = 1_000_000
+    large_kv_set_size: int = 1_000_000

-    """This test is meant for finding whether the HAMT performance scales linearly with increasing set size, an issue with HAMT v2.
-    Feel free to tune and run the LARGE_KV_SET_SIZE variable as needed for gathering the different timepoints.
+    """Test HAMT performance scaling with large datasets.
+    
+    Validates that HAMT performance scales linearly with increasing set size,
+    addressing performance issues from HAMT v2. Tune large_kv_set_size as needed.
     """

-        *[hamt.set(str(k_int), k_int) for k_int in range(LARGE_KV_SET_SIZE)]
+        *[hamt.set(str(k_int), k_int) for k_int in range(large_kv_set_size)]

-        == LARGE_KV_SET_SIZE
+        == large_kv_set_size

-    for k_int in range(LARGE_KV_SET_SIZE):
+    for k_int in range(large_kv_set_size):
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 18-18: Line too long (131/100)

(C0301)

[convention] 19-19: Line too long (110/100)

(C0301)

[convention] 21-21: Variable name "LARGE_KV_SET_SIZE" doesn't conform to snake_case naming style

(C0103)
tests/test_zarr_ipfs_encrypted.py (1)
75-118: Good test structure but consider reducing complexity.

The test correctly sets up encrypted write operations and validates basic store properties. However, the function is quite complex.

Consider extracting setup logic into helper functions:
+async def setup_encrypted_store(kubo_cas, encryption_key, header, read_only=False):
+    """Helper to create encrypted store."""
+    hamt = await HAMT.build(cas=kubo_cas, values_are_bytes=True)
+    if read_only:
+        # Assuming we have a root_cid from write phase
+        pass
+    store = SimpleEncryptedZarrHAMTStore(hamt, read_only, encryption_key, header)
+    return hamt, store

 async def test_encrypted_write_read(
     create_ipfs,
     random_zarr_dataset: xr.Dataset,
 ):
     # ... existing code ...
-    hamt_write = await HAMT.build(cas=kubo_cas, values_are_bytes=True)
-    ezhs_write = SimpleEncryptedZarrHAMTStore(
-        hamt_write, False, correct_key, header
-    )
+    hamt_write, ezhs_write = await setup_encrypted_store(
+        kubo_cas, correct_key, header
+    )
🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 78-78: Too many local variables (24/15)

(R0914)

[warning] 80-80: Redefining name 'random_zarr_dataset' from outer scope (line 16)

(W0621)

[refactor] 108-108: Redundant comparison - ezhs_write == ezhs_write

(R0124)

[refactor] 78-78: Too many statements (64/50)

(R0915)
py_hamt/encryption_hamt_store.py (1)
174-176: Use a more specific exception type.

Consider using a more specific exception instead of the generic Exception for better error handling:
-        if self.read_only:
-            raise Exception("Cannot write to a read-only store.")
+        if self.read_only:
+            raise ValueError("Cannot write to a read-only store.")
Alternatively, you could define a custom ReadOnlyError exception for even clearer semantics.

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 175-175: Raising too general exception: Exception

(W0719)
py_hamt/store.py (1)

194-227: Clever solution to the event loop session management issue.

The per-event-loop session management elegantly solves the problem mentioned in the PR comments about zarr running its own event loop in a separate thread. This prevents the "session used in different event loops" errors while maintaining connection pooling benefits.
py_hamt/hamt.py (9)
74-76: Use enumerate for better readability.

The static analysis correctly suggests using enumerate here for better Pythonic code.
-    def iter_link_indices(self) -> Iterator[int]:
-        for i in range(len(self.data)):
-            if isinstance(self.data[i], list):
-                yield i
+    def iter_link_indices(self) -> Iterator[int]:
+        for i, item in enumerate(self.data):
+            if isinstance(item, list):
+                yield i
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 74-76: Consider using enumerate instead of iterating with range and len

(C0200)

105-108: Use enumerate for better readability.

Similar to the previous suggestion, using enumerate would be more Pythonic here.
-    def is_empty(self) -> bool:
-        for i in range(len(self.data)):
-            item: IPLDKind = self.data[i]
-            if (isinstance(item, dict) and len(item) > 0) or isinstance(item, list):
-                return False
+    def is_empty(self) -> bool:
+        for item in self.data:
+            if (isinstance(item, dict) and len(item) > 0) or isinstance(item, list):
+                return False
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 105-108: Consider using enumerate instead of iterating with range and len

(C0200)

133-138: Add meaningful docstrings to abstract methods.

The abstract methods have empty docstrings. These should describe the contract and behavior expected from implementations.
    @abstractmethod
    async def save(self, original_id: IPLDKind, node: Node) -> IPLDKind:
-        """"""
+        """Save a node to storage and return its ID. If original_id is not None, this may be an update."""

    @abstractmethod
    async def load(self, id: IPLDKind) -> Node:
-        """"""
+        """Load a node from storage by its ID."""
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 133-133: Empty method docstring

(C0112)

[convention] 137-137: Empty method docstring

(C0112)

[warning] 137-137: Redefining built-in 'id'

(W0622)

156-156: Use more specific exception types.

Using generic Exception makes error handling more difficult for callers.
-        raise Exception("Node was attempted to be written to the read cache")
+        raise RuntimeError("Cannot write to read-only cache")
🧰 Tools

🪛 Pylint (3.3.7)

[warning] 156-156: Raising too general exception: Exception

(W0719)

171-173: Use .items() for dictionary iteration.

Static analysis correctly suggests using .items() when you need both key and value.
    def size(self) -> int:
        total: int = 0
-        for k in self.cache:
-            node: Node = self.cache[k]
+        for k, node in self.cache.items():
            total += len(node.serialize())
        return total
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 171-173: Consider iterating with .items()

(C0206)

316-324: Consider parameter object pattern for constructor.

The constructor has 7 parameters, which exceeds typical limits. Consider using a configuration object or builder pattern.
@dataclass
class HAMTConfig:
    hash_fn: Callable[[bytes], bytes] = blake3_hashfn
    root_node_id: IPLDKind | None = None
    read_only: bool = False
    max_bucket_size: int = 4
    values_are_bytes: bool = False

def __init__(self, cas: ContentAddressedStore, config: HAMTConfig = None):
    config = config or HAMTConfig()
    # ... use config attributes
🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 316-316: Too many arguments (7/5)

(R0913)

[refactor] 316-316: Too many positional arguments (7/5)

(R0917)

492-492: Use more specific exception types.

Generic Exception should be replaced with more specific types for better error handling.
        if self.read_only:
-            raise Exception("Cannot call set on a read only HAMT")
+            raise RuntimeError("Cannot modify read-only HAMT")

        if self.read_only:
-            raise Exception("Cannot call delete on a read only HAMT")
+            raise RuntimeError("Cannot modify read-only HAMT")
Also applies to: 557-557

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 492-492: Raising too general exception: Exception

(W0719)

572-582: Remove unnecessary elif after break.

Static analysis correctly identifies that the elif is unnecessary after a break statement.
                if isinstance(item, dict):
                    bucket = item
                    if key in bucket:
                        del bucket[key]
                        created_change = True
                    # Break out since whether or not the key is in the bucket, it should have been here so either now reserialize or raise a KeyError
                    break
-                elif isinstance(item, list):
+                if isinstance(item, list):
                    link: IPLDKind = item[0]
                    next_node: Node = await self.node_store.load(link)
                    node_stack.append((link, next_node))
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 577-577: Line too long (149/100)

(C0301)

[refactor] 572-582: Unnecessary "elif" after "break", remove the leading "el" from "elif"

(R1723)

596-599: Remove unnecessary else after return.

Static analysis correctly identifies that the else is unnecessary after a return statement.
        data: bytes = await self.cas.load(pointer)
        if self.values_are_bytes:
            return data
-        else:
-            return dag_cbor.decode(data)
+        return dag_cbor.decode(data)
🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 596-599: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43fa124 and 40a6061.

📒 Files selected for processing (17)

py_hamt/__init__.py (1 hunks)
py_hamt/encryption_hamt_store.py (1 hunks)
py_hamt/hamt.py (1 hunks)
py_hamt/store.py (1 hunks)
py_hamt/zarr_hamt_store.py (1 hunks)
pyproject.toml (2 hunks)
tests/conftest.py (1 hunks)
tests/performance_tests.py (1 hunks)
tests/test_branch_anchors.py (1 hunks)
tests/test_extract_bits.py (1 hunks)
tests/test_hamt.py (1 hunks)
tests/test_kubo_cas.py (1 hunks)
tests/test_kubocas_auth.py (1 hunks)
tests/test_kubocas_session.py (1 hunks)
tests/test_zarr_ipfs.py (2 hunks)
tests/test_zarr_ipfs_encrypted.py (1 hunks)
tests/testing_utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

🧰 Additional context used

🧠 Learnings (1)

py_hamt/zarr_hamt_store.py (1)

Learnt from: abidsikder
PR: dClimate/py-hamt#44
File: py_hamt/zarr_hamt_store.py:180-192
Timestamp: 2025-05-30T15:30:59.357Z
Learning: In the ZarrHAMTStore.list_dir method in py_hamt/zarr_hamt_store.py, there's a potential for duplicate directory names to be yielded when multiple files share the same subdirectory. While this could be fixed with a set to track yielded names, this approach would consume significant memory for large datasets (potentially gigabytes for millions of keys like ERA5 zarr). The memory usage concern outweighs the fix unless it's confirmed that zarr actually calls list_dir on data variable directories in practice, rather than using assumed chunk keys from zarr.json metadata.

🧬 Code Graph Analysis (5)

tests/conftest.py (1)

tests/testing_utils.py (2)

create_ipfs (63-117)

ipld_strategy (32-43)

tests/test_branch_anchors.py (2)

py_hamt/hamt.py (7)

HAMT (287-696)

blake3_hashfn (51-59)

keys (664-678)

cache_size (419-432)

cache_vacate (434-446)

make_read_only (397-408)

get (592-599)

py_hamt/store.py (1)

InMemoryCAS (37-71)

tests/test_extract_bits.py (1)

py_hamt/hamt.py (1)

extract_bits (21-45)

tests/test_kubo_cas.py (2)

py_hamt/store.py (7)

InMemoryCAS (37-71)

load (33-34)

load (52-71)

load (262-269)

save (26-30)

save (47-50)

save (246-260)

tests/conftest.py (1)

global_client_session (9-13)

tests/testing_utils.py (1)

py_hamt/store.py (3)

load (33-34)

load (52-71)

load (262-269)

🪛 Pylint (3.3.7)

tests/test_zarr_ipfs_encrypted.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'numpy'

(E0401)

[error] 4-4: Unable to import 'pandas'

(E0401)

[error] 6-6: Unable to import 'xarray'

(E0401)

[error] 7-7: Unable to import 'zarr'

(E0401)

[error] 8-8: Unable to import 'zarr.core.buffer'

(E0401)

[error] 9-9: Unable to import 'Crypto.Random'

(E0401)

[error] 11-11: Unable to import 'py_hamt'

(E0401)

[error] 12-12: Unable to import 'py_hamt.zarr_hamt_store'

(E0401)

[refactor] 78-78: Too many local variables (24/15)

(R0914)

[warning] 80-80: Redefining name 'random_zarr_dataset' from outer scope (line 16)

(W0621)

[refactor] 108-108: Redundant comparison - ezhs_write == ezhs_write

(R0124)

[refactor] 179-179: Redundant comparison - ezhs_read_ok == ezhs_read_ok

(R0124)

[refactor] 78-78: Too many statements (64/50)

(R0915)

tests/test_zarr_ipfs.py

[error] 6-6: Unable to import 'xarray'

(E0401)

[error] 7-7: Unable to import 'zarr'

(E0401)

[error] 8-8: Unable to import 'zarr.core.buffer'

(E0401)

[error] 9-9: Unable to import 'dag_cbor.ipld'

(E0401)

[error] 11-11: Unable to import 'py_hamt'

(E0401)

[error] 12-12: Unable to import 'py_hamt.store'

(E0401)

[error] 13-13: Unable to import 'py_hamt.zarr_hamt_store'

(E0401)

[convention] 61-61: Missing function or method docstring

(C0116)

[refactor] 61-61: Too many local variables (23/15)

(R0914)

[warning] 63-63: Redefining name 'random_zarr_dataset' from outer scope (line 17)

(W0621)

[refactor] 114-114: Redundant comparison - zhs == zhs

(R0124)

[refactor] 61-61: Too many statements (80/50)

(R0915)

[convention] 183-183: Missing function or method docstring

(C0116)

tests/test_hamt.py

[convention] 25-25: Line too long (120/100)

(C0301)

[convention] 52-52: Line too long (170/100)

(C0301)

[convention] 55-55: Line too long (116/100)

(C0301)

[convention] 95-95: Line too long (103/100)

(C0301)

[convention] 96-96: Line too long (124/100)

(C0301)

[convention] 164-164: Line too long (116/100)

(C0301)

[convention] 204-204: Line too long (128/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 6-6: Unable to import 'dag_cbor'

(E0401)

[error] 7-7: Unable to import 'hypothesis'

(E0401)

[error] 8-8: Unable to import 'hypothesis'

(E0401)

[error] 9-9: Unable to import 'multiformats'

(E0401)

[error] 12-12: Unable to import 'py_hamt.hamt'

(E0401)

[error] 13-13: Unable to import 'py_hamt.store'

(E0401)

[convention] 21-21: Missing function or method docstring

(C0116)

[refactor] 21-21: Too many local variables (21/15)

(R0914)

[refactor] 21-21: Too many statements (68/50)

(R0915)

[convention] 167-167: Missing function or method docstring

(C0116)

[convention] 181-181: Missing function or method docstring

(C0116)

[convention] 190-190: Missing function or method docstring

(C0116)

[convention] 206-206: Missing function or method docstring

(C0116)

py_hamt/zarr_hamt_store.py

[convention] 21-21: Line too long (195/100)

(C0301)

[convention] 26-26: Line too long (249/100)

(C0301)

[convention] 59-59: Line too long (251/100)

(C0301)

[convention] 60-60: Line too long (106/100)

(C0301)

[convention] 63-63: Line too long (110/100)

(C0301)

[convention] 66-66: Line too long (434/100)

(C0301)

[convention] 75-75: Line too long (113/100)

(C0301)

[convention] 101-101: Line too long (138/100)

(C0301)

[convention] 111-111: Line too long (103/100)

(C0301)

[convention] 117-117: Line too long (102/100)

(C0301)

[convention] 154-154: Unnecessary parens after 'not' keyword

(C0325)

[convention] 172-172: Line too long (142/100)

(C0301)

[convention] 176-176: Line too long (142/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 4-4: Unable to import 'zarr.abc.store'

(E0401)

[error] 5-5: Unable to import 'zarr.core.buffer'

(E0401)

[error] 6-6: Unable to import 'zarr.core.common'

(E0401)

[warning] 96-96: Unused argument 'byte_range'

(W0613)

py_hamt/store.py

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)

py_hamt/hamt.py

[convention] 23-23: Line too long (131/100)

(C0301)

[convention] 31-31: Line too long (106/100)

(C0301)

[convention] 56-56: Line too long (117/100)

(C0301)

[convention] 91-91: Line too long (151/100)

(C0301)

[convention] 142-142: Line too long (189/100)

(C0301)

[convention] 183-183: Line too long (101/100)

(C0301)

[convention] 200-200: Line too long (137/100)

(C0301)

[convention] 201-201: Line too long (191/100)

(C0301)

[convention] 213-213: Line too long (106/100)

(C0301)

[convention] 214-214: Line too long (210/100)

(C0301)

[convention] 218-218: Line too long (108/100)

(C0301)

[convention] 241-241: Line too long (121/100)

(C0301)

[convention] 255-255: Line too long (141/100)

(C0301)

[convention] 256-256: Line too long (190/100)

(C0301)

[convention] 261-261: Line too long (243/100)

(C0301)

[convention] 271-271: Line too long (122/100)

(C0301)

[convention] 289-289: Line too long (148/100)

(C0301)

[convention] 293-293: Line too long (119/100)

(C0301)

[convention] 298-298: Line too long (145/100)

(C0301)

[convention] 300-300: Line too long (158/100)

(C0301)

[convention] 302-302: Line too long (255/100)

(C0301)

[convention] 326-326: Line too long (233/100)

(C0301)

[convention] 336-336: Line too long (125/100)

(C0301)

[convention] 338-338: Line too long (179/100)

(C0301)

[convention] 340-340: Line too long (208/100)

(C0301)

[convention] 347-347: Line too long (177/100)

(C0301)

[convention] 349-349: Line too long (144/100)

(C0301)

[convention] 356-356: Line too long (193/100)

(C0301)

[convention] 358-358: Line too long (459/100)

(C0301)

[convention] 375-375: Line too long (382/100)

(C0301)

[convention] 387-387: Line too long (272/100)

(C0301)

[convention] 389-389: Line too long (255/100)

(C0301)

[convention] 396-396: Line too long (168/100)

(C0301)

[convention] 399-399: Line too long (155/100)

(C0301)

[convention] 401-401: Line too long (155/100)

(C0301)

[convention] 412-412: Line too long (252/100)

(C0301)

[convention] 415-415: Line too long (109/100)

(C0301)

[convention] 423-423: Line too long (134/100)

(C0301)

[convention] 452-452: Line too long (151/100)

(C0301)

[convention] 453-453: Line too long (153/100)

(C0301)

[convention] 454-454: Line too long (137/100)

(C0301)

[convention] 458-458: Line too long (116/100)

(C0301)

[convention] 462-462: Line too long (109/100)

(C0301)

[convention] 467-467: Line too long (193/100)

(C0301)

[convention] 483-483: Line too long (140/100)

(C0301)

[convention] 498-498: Line too long (103/100)

(C0301)

[convention] 531-531: Line too long (144/100)

(C0301)

[convention] 537-537: Line too long (137/100)

(C0301)

[convention] 542-542: Line too long (104/100)

(C0301)

[convention] 577-577: Line too long (149/100)

(C0301)

[convention] 605-605: Line too long (285/100)

(C0301)

[convention] 624-624: Line too long (141/100)

(C0301)

[convention] 668-668: Line too long (146/100)

(C0301)

[convention] 670-670: Line too long (101/100)

(C0301)

[convention] 690-690: Line too long (264/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 14-14: Unable to import 'dag_cbor'

(E0401)

[error] 15-15: Unable to import 'dag_cbor.ipld'

(E0401)

[error] 16-16: Unable to import 'multiformats'

(E0401)

[error] 18-18: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'

(E0001)

[convention] 62-62: Missing class docstring

(C0115)

[convention] 68-68: Missing function or method docstring

(C0116)

[convention] 73-73: Missing function or method docstring

(C0116)

[convention] 74-76: Consider using enumerate instead of iterating with range and len

(C0200)

[convention] 78-78: Missing function or method docstring

(C0116)

[convention] 83-83: Missing function or method docstring

(C0116)

[convention] 87-87: Missing function or method docstring

(C0116)

[convention] 92-92: Missing function or method docstring

(C0116)

[convention] 104-104: Missing function or method docstring

(C0116)

[convention] 105-108: Consider using enumerate instead of iterating with range and len

(C0200)

[convention] 111-111: Missing function or method docstring

(C0116)

[convention] 115-115: Missing function or method docstring

(C0116)

[warning] 122-124: Consider explicitly re-raising using 'except Exception as exc' and 'raise Exception('Invalid dag-cbor encoded data from the store was attempted to be decoded') from exc'

(W0707)

[warning] 122-124: Raising too general exception: Exception

(W0719)

[convention] 131-131: Missing class docstring

(C0115)

[convention] 133-133: Empty method docstring

(C0112)

[convention] 137-137: Empty method docstring

(C0112)

[warning] 137-137: Redefining built-in 'id'

(W0622)

[convention] 150-150: Missing class docstring

(C0115)

[warning] 156-156: Raising too general exception: Exception

(W0719)

[warning] 158-158: Redefining built-in 'id'

(W0622)

[convention] 171-173: Consider iterating with .items()

(C0206)

[convention] 180-180: Missing class docstring

(C0115)

[convention] 186-186: Missing function or method docstring

(C0116)

[warning] 186-186: Redefining built-in 'id'

(W0622)

[convention] 189-189: Missing function or method docstring

(C0116)

[convention] 203-209: Consider iterating with .items()

(C0206)

[convention] 260-260: Missing function or method docstring

(C0116)

[warning] 275-275: Redefining built-in 'id'

(W0622)

[refactor] 287-287: Too many instance attributes (8/7)

(R0902)

[refactor] 316-316: Too many arguments (7/5)

(R0913)

[refactor] 316-316: Too many positional arguments (7/5)

(R0917)

[warning] 492-492: Raising too general exception: Exception

(W0719)

[refactor] 505-505: Too many local variables (20/15)

(R0914)

[warning] 557-557: Raising too general exception: Exception

(W0719)

[refactor] 572-582: Unnecessary "elif" after "break", remove the leading "el" from "elif"

(R1723)

[refactor] 596-599: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

py_hamt/__init__.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'

(E0001)

py_hamt/encryption_hamt_store.py

[convention] 21-21: Line too long (241/100)

(C0301)

[convention] 164-164: Line too long (103/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'zarr.abc.store'

(E0401)

[error] 4-4: Unable to import 'zarr.core.buffer'

(E0401)

[error] 5-5: Unable to import 'Crypto.Cipher'

(E0401)

[error] 6-6: Unable to import 'Crypto.Random'

(E0401)

[warning] 12-12: Method 'get_partial_values' is abstract in class 'ZarrHAMTStore' but is not overridden in child class 'SimpleEncryptedZarrHAMTStore'

(W0223)

[warning] 12-12: Method 'set_partial_values' is abstract in class 'ZarrHAMTStore' but is not overridden in child class 'SimpleEncryptedZarrHAMTStore'

(W0223)

[warning] 175-175: Raising too general exception: Exception

(W0719)

tests/conftest.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 1-1: Unable to import 'aiohttp'

(E0401)

[warning] 5-5: Unused create_ipfs imported from testing_utils

(W0611)

[warning] 5-5: Unused ipld_strategy imported from testing_utils

(W0611)

tests/performance_tests.py

[convention] 2-2: Line too long (192/100)

(C0301)

[convention] 18-18: Line too long (131/100)

(C0301)

[convention] 19-19: Line too long (110/100)

(C0301)

[error] 12-12: Unable to import 'py_hamt'

(E0401)

[error] 13-13: Unable to import 'py_hamt.store'

(E0401)

[convention] 21-21: Variable name "LARGE_KV_SET_SIZE" doesn't conform to snake_case naming style

(C0103)

tests/test_branch_anchors.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 4-4: Unable to import 'py_hamt'

(E0401)

[error] 5-5: Unable to import 'py_hamt.store'

(E0401)

[convention] 25-25: Missing function or method docstring

(C0116)

tests/test_extract_bits.py

[error] 3-3: Unable to import 'py_hamt.hamt'

(E0401)

[convention] 7-7: Missing function or method docstring

(C0116)

tests/test_kubo_cas.py

[convention] 62-62: Line too long (127/100)

(C0301)

[convention] 92-92: Line too long (128/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'aiohttp'

(E0401)

[error] 4-4: Unable to import 'dag_cbor'

(E0401)

[error] 6-6: Unable to import 'dag_cbor'

(E0401)

[error] 7-7: Unable to import 'hypothesis'

(E0401)

[error] 10-10: Unable to import 'py_hamt'

(E0401)

[error] 11-11: Unable to import 'py_hamt.store'

(E0401)

[convention] 16-16: Missing function or method docstring

(C0116)

[warning] 82-82: Catching too general exception Exception

(W0718)

[warning] 107-107: Catching too general exception Exception

(W0718)

[convention] 139-139: Missing function or method docstring

(C0116)

[warning] 139-139: Redefining name 'create_ipfs' from outer scope (line 8)

(W0621)

[warning] 8-8: Unused create_ipfs imported from testing_utils

(W0611)

tests/test_kubocas_auth.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 4-4: Unable to import 'aiohttp'

(E0401)

[error] 7-7: Unable to import 'py_hamt.store'

(E0401)

[convention] 15-15: Missing function or method docstring

(C0116)

[warning] 18-18: Access to a protected member _loop_session of a client class

(W0212)

[convention] 27-27: Missing function or method docstring

(C0116)

[warning] 29-29: Access to a protected member _loop_session of a client class

(W0212)

[warning] 46-46: Access to a protected member _loop_session of a client class

(W0212)

[warning] 69-69: Access to a protected member _loop_session of a client class

(W0212)

tests/test_kubocas_session.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 7-7: Unable to import 'py_hamt.store'

(E0401)

[warning] 26-26: Access to a protected member _loop_session of a client class

(W0212)

[warning] 27-27: Access to a protected member _loop_session of a client class

(W0212)

[warning] 49-49: Access to a protected member _loop_session of a client class

(W0212)

[warning] 68-68: Access to a protected member _loop_session of a client class

(W0212)

[warning] 90-90: Access to a protected member _loop_session of a client class

(W0212)

[warning] 95-95: Access to a protected member _session_per_loop of a client class

(W0212)

[warning] 96-96: Access to a protected member _session_per_loop of a client class

(W0212)

[warning] 97-97: Access to a protected member _session_per_loop of a client class

(W0212)

tests/testing_utils.py

[convention] 52-52: Line too long (112/100)

(C0301)

[convention] 74-74: Line too long (141/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 12-12: Unable to import 'hypothesis'

(E0401)

[error] 13-13: Unable to import 'hypothesis.strategies'

(E0401)

[error] 14-14: Unable to import 'multiformats'

(E0401)

[convention] 32-32: Missing function or method docstring

(C0116)

[convention] 56-56: Missing function or method docstring

(C0116)

[convention] 63-63: Missing function or method docstring

(C0116)

[warning] 81-81: Using open without explicitly specifying an encoding

(W1514)

[warning] 98-98: Using open without explicitly specifying an encoding

(W1514)

[refactor] 105-107: Consider using 'with' for resource-allocating operations

(R1732)

🔇 Additional comments (47)

tests/test_extract_bits.py (2)

3-3: Good import organization.

Moving the import statement below the standard library imports follows Python import conventions.

🧰 Tools

🪛 Pylint (3.3.7)

[error] 3-3: Unable to import 'py_hamt.hamt'

(E0401)

7-7: Excellent addition of return type annotation.

Adding the -> None return type annotation improves type safety and code documentation, which is especially valuable in a major refactor introducing async patterns.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 7-7: Missing function or method docstring

(C0116)

py_hamt/__init__.py (1)

1-4: Major API refactor looks well-organized.

The replacement of synchronous interfaces (Store, IPFSStore) with async content-addressed storage interfaces (ContentAddressedStore, KuboCAS) aligns perfectly with the stated PR objectives of adopting async programming throughout the library.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'

(E0001)

tests/conftest.py (2)

8-13: Excellent async fixture implementation.

The session-scoped aiohttp.ClientSession fixture is well-designed for the async refactor. Using an async context manager ensures proper cleanup, and session scope is appropriate for sharing across tests while avoiding session management issues mentioned in the PR discussion.

4-5: Good backward compatibility approach.

Re-exporting the testing utilities maintains compatibility with existing test imports while centralizing test configuration.

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 5-5: Unused create_ipfs imported from testing_utils

(W0611)

[warning] 5-5: Unused ipld_strategy imported from testing_utils

(W0611)
tests/performance_tests.py (2)

1-5: Excellent documentation and purpose clarity.

The module docstring clearly explains that this is for manual performance testing, which helps prevent accidentally including long-running tests in CI.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 2-2: Line too long (192/100)

(C0301)

26-29:
✅ Verification successful

Verify concurrent write safety.

The test uses asyncio.gather to perform concurrent writes, which should demonstrate the performance benefits. However, ensure that the HAMT implementation properly handles concurrent writes to avoid race conditions.

🏁 Script executed:
#!/bin/bash
# Description: Check if HAMT set operations are properly synchronized for concurrent writes
# Expected: Find synchronization mechanisms or documentation about thread safety

# Search for async locks, semaphores, or synchronization mechanisms in HAMT implementation
rg -A 5 -B 5 "asyncio\.(Lock|Semaphore|Event|Condition)" 

# Look for thread safety documentation or comments
rg -i "thread.?safe|concurrent|race.?condition|synchroniz" --type py
Length of output: 2753
Concurrent writes are already synchronized via asyncio.Lock
The HAMT’s set method acquires an asyncio.Lock (see py_hamt/hamt.py), ensuring that multiple tasks in the same event loop cannot corrupt internal state. Since your performance test runs all writes in one thread/event loop, await asyncio.gather(...) is safe and no additional synchronization is needed.
tests/test_branch_anchors.py (2)

8-21: LGTM! Well-designed helper function for collision testing.

The logic correctly finds decimal string keys that share the same first byte of their Blake3 hash, which is perfect for testing HAMT branching behavior with max_bucket_size=1.

24-52: Excellent comprehensive test coverage for cache management.

This test effectively covers the cache management features in both write and read-only modes:

Cache operations during writes (lines 32-37)

Cache vacate in write mode (line 42)

Transition to read-only mode and cache operations (lines 48-51)

The test structure ensures all code paths in the cache management are exercised.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 25-25: Missing function or method docstring

(C0116)

tests/test_kubocas_session.py (3)

10-14: Useful helper for handling mixed async/sync returns.

This helper elegantly handles the case where _loop_session() might return either an awaitable or a direct session object, providing flexibility in the implementation.

56-79: Well-designed cross-loop testing mechanism.

The threading approach to test session isolation across different asyncio event loops is appropriate and thorough. The use of Event for synchronization ensures reliable coordination between the main thread and worker thread.

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 68-68: Access to a protected member _loop_session of a client class

(W0212)

82-103: Comprehensive session isolation verification.

This test properly verifies that different event loops receive distinct sessions and that the internal _session_per_loop mapping tracks them correctly. The cleanup logic ensures proper resource management.

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 90-90: Access to a protected member _loop_session of a client class

(W0212)

[warning] 95-95: Access to a protected member _session_per_loop of a client class

(W0212)

[warning] 96-96: Access to a protected member _session_per_loop of a client class

(W0212)

[warning] 97-97: Access to a protected member _session_per_loop of a client class

(W0212)

tests/test_kubocas_auth.py (3)

14-24: Good coverage of external session header preservation.

This test correctly verifies that KuboCAS preserves custom headers from externally-provided sessions and doesn't close them inappropriately.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 15-15: Missing function or method docstring

(C0116)

[warning] 18-18: Access to a protected member _loop_session of a client class

(W0212)

35-57: Thorough authentication testing with clear documentation.

The docstring clearly explains the expected behavior, and the test correctly verifies that external sessions with authentication are preserved unchanged and not closed by KuboCAS.

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 46-46: Access to a protected member _loop_session of a client class

(W0212)

59-76: Comprehensive internal session authentication testing.

This test properly verifies that KuboCAS creates internal sessions with the provided authentication and correctly manages their lifecycle by closing them on aclose().

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 69-69: Access to a protected member _loop_session of a client class

(W0212)

tests/test_kubo_cas.py (3)

15-20: Good coverage of InMemoryCAS error condition.

This test correctly verifies that InMemoryCAS.load() raises KeyError for non-existent keys (empty bytes).

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 16-16: Missing function or method docstring

(C0116)

22-44: Excellent comprehensive type safety testing.

This test thoroughly verifies that InMemoryCAS.load() properly validates input types and raises appropriate TypeError messages for invalid key types. The comprehensive list of invalid types ensures robust error handling.

134-157: Properly isolated test implementation.

This test correctly uses the create_ipfs fixture to ensure an isolated IPFS daemon, avoiding dependency on external services. The property-based testing with different codecs provides good coverage.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 139-139: Missing function or method docstring

(C0116)

[warning] 139-139: Redefining name 'create_ipfs' from outer scope (line 8)

(W0621)

tests/testing_utils.py (4)

1-15: LGTM! Clean imports and setup.

The imports are well-organized and appropriate for the testing utilities. The module provides good separation of concerns for IPFS-related testing.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 12-12: Unable to import 'hypothesis'

(E0401)

[error] 13-13: Unable to import 'hypothesis.strategies'

(E0401)

[error] 14-14: Unable to import 'multiformats'

(E0401)

17-29: Solid CID generation strategy.

The CID strategy correctly generates CIDv1 with sha2-256 hash and dag-cbor codec. The hardcoded hash prefix (0x1220) is appropriate for 32-byte SHA-256 hashes.

32-43: Comprehensive IPLD strategy covers all primitive types.

The strategy includes all IPLD scalar types (null, bool, int, float, text, binary) plus CIDs. The integer range matches IPLD specifications and float constraints properly exclude infinity/NaN.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 32-32: Missing function or method docstring

(C0116)

46-53: Good unique key constraint for HAMT testing.

The unique_by parameter ensures no duplicate keys, which is essential for accurate HAMT length and size checks. The size limits are reasonable for testing.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 52-52: Line too long (112/100)

(C0301)

tests/test_zarr_ipfs_encrypted.py (5)

1-13: Clean imports for encrypted testing.

The imports are appropriate for testing encrypted Zarr stores with IPFS backend. Good use of Crypto.Random for secure key generation.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'numpy'

(E0401)

[error] 4-4: Unable to import 'pandas'

(E0401)

[error] 6-6: Unable to import 'xarray'

(E0401)

[error] 7-7: Unable to import 'zarr'

(E0401)

[error] 8-8: Unable to import 'zarr.core.buffer'

(E0401)

[error] 9-9: Unable to import 'Crypto.Random'

(E0401)

[error] 11-11: Unable to import 'py_hamt'

(E0401)

[error] 12-12: Unable to import 'py_hamt.zarr_hamt_store'

(E0401)

15-28: Well-designed test dataset fixture.

The fixture creates a realistic test dataset with proper dimensions and metadata. The reduced size is good for test performance.

30-73: Thorough encryption key validation testing.

The test properly validates all edge cases for encryption key length: too short, too long, empty, and correct length. Good use of parametric testing approach.

119-175: Comprehensive read testing with proper error validation.

The test correctly validates reading with correct key, incorrect key, and no decryption. The error handling checks are appropriate and cover the expected failure modes.

176-207: Good coverage of store interface methods.

The test properly exercises the async store interface including exists, list operations, and error conditions for read-only operations.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 179-179: Redundant comparison - ezhs_read_ok == ezhs_read_ok

(R0124)
tests/test_zarr_ipfs.py (6)

1-56: Good refactoring to async imports and fixture.

The imports are updated appropriately for the new async HAMT implementation. The dataset fixture creates realistic test data with proper structure.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'numpy'

(E0401)

[error] 4-4: Unable to import 'pandas'

(E0401)

[error] 6-6: Unable to import 'xarray'

(E0401)

[error] 7-7: Unable to import 'zarr'

(E0401)

[error] 8-8: Unable to import 'zarr.core.buffer'

(E0401)

[error] 9-9: Unable to import 'dag_cbor.ipld'

(E0401)

[error] 11-11: Unable to import 'py_hamt'

(E0401)

[error] 12-12: Unable to import 'py_hamt.store'

(E0401)

[error] 13-13: Unable to import 'py_hamt.zarr_hamt_store'

(E0401)

59-89: Solid async write implementation.

The test correctly uses async context managers for KuboCAS and properly handles the write-append pattern. The performance timing is useful for optimization tracking.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 59-59: Line too long (130/100)

(C0301)

[convention] 61-61: Missing function or method docstring

(C0116)

[refactor] 61-61: Too many local variables (23/15)

(R0914)

[warning] 63-63: Redefining name 'random_zarr_dataset' from outer scope (line 17)

(W0621)

[refactor] 61-61: Too many statements (80/50)

(R0915)

90-117: Effective read verification with data integrity checks.

The read implementation correctly rebuilds the HAMT from the root CID and validates data integrity by comparing dataset halves. Good async pattern usage.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 114-114: Redundant comparison - zhs == zhs

(R0124)

139-180: Comprehensive store interface testing.

The test properly exercises partial value operations, metadata operations, and deletion. The enable_write/read_only mode switching is well tested.

182-192: Good test for directory listing deduplication.

This test specifically validates that list_dir properly deduplicates directory names, which is important for the Zarr interface.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 183-183: Missing function or method docstring

(C0116)

119-138:
❓ Verification inconclusive

Check for potential code duplication.

Looking at the key collection logic, there appear to be multiple similar blocks collecting keys from different sources.

🏁 Script executed:
#!/bin/bash
# Check for duplicate key collection patterns in this test file
rg -A 4 -B 1 "async for.*in.*list" tests/test_zarr_ipfs.py
Length of output: 875
Reduce duplicated key-collection logic in tests/test_zarr_ipfs.py

There are four nearly identical blocks (lines 119–138) that build a set by iterating over zhs.list(), zhs.list_prefix(""), etc. This repetition makes the test harder to maintain. Consider one of the following refactorings:

• Extract a helper coroutine:
async def collect_keys(coro):
    keys = set()
    async for k in coro():
        keys.add(k)
    return keys

hamt_keys = await collect_keys(zhs.hamt.keys)
for fn in (zhs.list, zhs.list, lambda: zhs.list_prefix("")):
    assert hamt_keys == await collect_keys(fn)
• Use pytest parametrization:
@pytest.mark.parametrize("method", [zhs.list, zhs.list, lambda: zhs.list_prefix("")])
async def test_list_variants_match_hamt(method):
    hamt_keys = {k async for k in zhs.hamt.keys()}
    assert hamt_keys == {k async for k in method()}
Locations to update:

tests/test_zarr_ipfs.py, lines ~119–138: replace duplicated loops with a shared helper or a parametrized test.
[fixed_required]
tests/test_hamt.py (5)

1-24: Excellent async test setup with comprehensive imports.

The conversion to async testing is well done. Good use of increased deadline for CI stability and proper Hypothesis integration.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 6-6: Unable to import 'dag_cbor'

(E0401)

[error] 7-7: Unable to import 'hypothesis'

(E0401)

[error] 8-8: Unable to import 'hypothesis'

(E0401)

[error] 9-9: Unable to import 'multiformats'

(E0401)

[error] 12-12: Unable to import 'py_hamt.hamt'

(E0401)

[error] 13-13: Unable to import 'py_hamt.store'

(E0401)

[convention] 21-21: Missing function or method docstring

(C0116)

[refactor] 21-21: Too many local variables (21/15)

(R0914)

[refactor] 21-21: Too many statements (68/50)

(R0915)

25-50: Solid concurrent testing with dynamic bucket sizing.

The test effectively exercises both sequential and concurrent operations while randomly changing bucket size. This is great for testing HAMT robustness under varying conditions.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 25-25: Line too long (120/100)

(C0301)

84-123: Comprehensive error handling and HAMT rebuilding tests.

The tests properly validate read-only mode enforcement, HAMT reconstruction from root node ID, and error conditions. The cache write protection test is particularly good for coverage.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 95-95: Line too long (103/100)

(C0301)

[convention] 96-96: Line too long (124/100)

(C0301)

124-153: Excellent cache management testing.

The concurrent cache size management tests are well-designed and exercise important code paths for memory management. The separation of read cache and write cache testing is good.

155-217: Good additional test coverage.

The tests for byte values, invalid node deserialization, key rewriting, and link following provide good coverage of edge cases and core functionality.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 164-164: Line too long (116/100)

(C0301)

[convention] 204-204: Line too long (128/100)

(C0301)

[convention] 167-167: Missing function or method docstring

(C0116)

[convention] 181-181: Missing function or method docstring

(C0116)

[convention] 190-190: Missing function or method docstring

(C0116)

[convention] 206-206: Missing function or method docstring

(C0116)

py_hamt/encryption_hamt_store.py (3)

92-111: Constructor implementation looks good.

The validation of the 32-byte encryption key requirement and metadata cache initialization are properly implemented.

113-135: Encryption/decryption implementation is secure and correct.

The methods properly implement ChaCha20-Poly1305 with:

24-byte nonce for XChaCha20 variant

Header as associated authenticated data

Correct byte layout (nonce + tag + ciphertext)

Appropriate error handling

146-171: Get method implementation with metadata caching is well-designed.

The method efficiently caches decrypted metadata files (ending with "zarr.json") to optimize repeated reads, which is crucial for Zarr performance.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 164-164: Line too long (103/100)

(C0301)

py_hamt/zarr_hamt_store.py (1)

70-72: Good defensive programming with HAMT validation.

The assertions ensure consistency between the HAMT and store configurations, preventing subtle bugs from mismatched settings.

py_hamt/store.py (1)

246-270: Excellent async implementation with proper concurrency control.

The save and load methods correctly use:

Async HTTP operations via aiohttp

Semaphore-based rate limiting

Proper error handling with raise_for_status()

py_hamt/hamt.py (4)

21-45: LGTM: Bit extraction logic is correct.

The extract_bits function correctly implements bit manipulation for HAMT depth-based indexing. The bounds checking and bit masking operations are properly implemented.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 23-23: Line too long (131/100)

(C0301)

[convention] 31-31: Line too long (106/100)

(C0301)

51-59: LGTM: Blake3 hash function implementation is correct.

The hash function properly uses the multihash library to generate 32-byte Blake3 digests and unwraps them for raw bytes.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 56-56: Line too long (117/100)

(C0301)

63-66: LGTM: Simplified node structure improves performance.

The fixed-size 256-element list structure is a good optimization that aligns with the 8-bit indexing used in extract_bits. Using empty dicts for buckets and single-element lists for links is a clean design.

664-684: LGTM: Async iteration with proper locking.

The async iteration implementation correctly handles locking for write mode vs. concurrent access in read-only mode. The pattern is consistent across keys() and other iteration methods.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 668-668: Line too long (146/100)

(C0301)

[convention] 670-670: Line too long (101/100)

(C0301)

coderabbitai · 2025-06-02T22:46:16Z

tests/testing_utils.py

+@pytest.fixture(scope="module")
+def create_ipfs() -> Generator[Tuple[str, str], None, None]:
+    # Create temporary directory, set it as the IPFS Path
+    temp_dir: Path = Path(tempfile.mkdtemp())
+    custom_env: dict[str, str] = os.environ.copy()
+    custom_env["IPFS_PATH"] = str(temp_dir)
+
+    # IPFS init
+    subprocess.run(
+        ["ipfs", "init", "--profile", "pebbleds"], check=True, env=custom_env
+    )
+
+    # Edit the config file so that it serves on randomly selected and available ports to not conflict with any currently running ipfs daemons
+    swarm_port: int = find_free_port()
+    rpc_port: int = find_free_port()
+    gateway_port: int = find_free_port()
+
+    config_path: Path = temp_dir / "config"
+    config: dict[str, Any]
+    with open(config_path, "r") as f:
+        config = json.load(f)
+
+    swarm_addrs: list[str] = config["Addresses"]["Swarm"]
+    new_port_swarm_addrs: list[str] = [
+        s.replace("4001", str(swarm_port)) for s in swarm_addrs
+    ]
+    config["Addresses"]["Swarm"] = new_port_swarm_addrs
+
+    rpc_multiaddr: str = config["Addresses"]["API"]
+    gateway_multiaddr: str = config["Addresses"]["Gateway"]
+
+    config["Addresses"]["API"] = rpc_multiaddr.replace("5001", str(rpc_port))
+    config["Addresses"]["Gateway"] = gateway_multiaddr.replace(
+        "8080", str(gateway_port)
+    )
+
+    with open(config_path, "w") as f:
+        json.dump(config, f, indent=2)
+
+    # Start the daemon
+    rpc_uri_stem: str = f"http://127.0.0.1:{rpc_port}"
+    gateway_uri_stem: str = f"http://127.0.0.1:{gateway_port}"
+
+    ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
+        ["ipfs", "daemon"], env=custom_env
+    )
+    # Should be enough time for the ipfs daemon process to start up
+    time.sleep(5)
+
+    yield rpc_uri_stem, gateway_uri_stem
+
+    # Close the daemon
+    ipfs_process.kill()
+
+    # Delete the temporary directory
+    shutil.rmtree(temp_dir)


💡 Verification agent

🧩 Analysis chain

IPFS fixture works but needs improvements.

The fixture correctly sets up an isolated IPFS environment but has several issues that should be addressed:

Uses the vulnerable find_free_port() function

Missing proper error handling for daemon startup failures

Hard-coded sleep instead of proper readiness check

Missing encoding specification for file operations

@pytest.fixture(scope="module") def create_ipfs() -> Generator[Tuple[str, str], None, None]: # Create temporary directory, set it as the IPFS Path temp_dir: Path = Path(tempfile.mkdtemp()) custom_env: dict[str, str] = os.environ.copy() custom_env["IPFS_PATH"] = str(temp_dir) + + # Reserve ports and keep sockets open + swarm_port, swarm_socket = find_free_port() + rpc_port, rpc_socket = find_free_port() + gateway_port, gateway_socket = find_free_port() - # Edit the config file so that it serves on randomly selected and available ports to not conflict with any currently running ipfs daemons - swarm_port: int = find_free_port() - rpc_port: int = find_free_port() - gateway_port: int = find_free_port() config_path: Path = temp_dir / "config" config: dict[str, Any] - with open(config_path, "r") as f: + with open(config_path, "r", encoding="utf-8") as f: config = json.load(f) # ... port configuration code ... - with open(config_path, "w") as f: + with open(config_path, "w", encoding="utf-8") as f: json.dump(config, f, indent=2) + # Close the reserved sockets now that config is written + swarm_socket.close() + rpc_socket.close() + gateway_socket.close() + # Start the daemon rpc_uri_stem: str = f"http://127.0.0.1:{rpc_port}" gateway_uri_stem: str = f"http://127.0.0.1:{gateway_port}" - ipfs_process: subprocess.Popen[bytes] = subprocess.Popen( - ["ipfs", "daemon"], env=custom_env - ) - # Should be enough time for the ipfs daemon process to start up - time.sleep(5) + try: + ipfs_process: subprocess.Popen[bytes] = subprocess.Popen( + ["ipfs", "daemon"], env=custom_env, + stdout=subprocess.PIPE, stderr=subprocess.PIPE + ) + # Wait for daemon to be ready instead of fixed sleep + for _ in range(30): # 30 second timeout + time.sleep(1) + if ipfs_process.poll() is None: # Still running + try: + import requests + requests.get(f"{rpc_uri_stem}/api/v0/version", timeout=1) + break + except: + continue + else: + raise RuntimeError("IPFS daemon failed to start within 30 seconds") + except Exception: + shutil.rmtree(temp_dir) + raise

🏁 Script executed:

#!/bin/bash # Locate the implementation of find_free_port to verify its behavior rg -n "def find_free_port" -A5

Length of output: 394

🏁 Script executed:

#!/bin/bash # Display the create_ipfs fixture implementation to confirm context sed -n '1,200p' tests/testing_utils.py

Length of output: 3563

Improve IPFS fixture reliability and safety

This fixture correctly isolates an IPFS daemon but should be hardened to avoid port-allocation races, ensure proper cleanup on failures, and reliably detect when the daemon is ready.

Please address the following in tests/testing_utils.py (create_ipfs fixture, starting at line 62):

Reserve the ports by holding open sockets until after writing the config to prevent race conditions in find_free_port().

Specify encoding="utf-8" when opening the config file for read/write.

Replace the fixed time.sleep(5) with a loop that polls the daemon’s /api/v0/version endpoint (with a timeout) to confirm readiness.

Wrap daemon startup in a try/except to clean up the temp directory on failure and capture stdout/stderr.

Suggested diff:

@pytest.fixture(scope="module") def create_ipfs() -> Generator[Tuple[str, str], None, None]: # ... - swarm_port: int = find_free_port() - rpc_port: int = find_free_port() - gateway_port: int = find_free_port() + # Reserve ports by keeping sockets open until config is written + swarm_port, swarm_sock = find_free_port() + rpc_port, rpc_sock = find_free_port() + gateway_port, gateway_sock = find_free_port() config_path: Path = temp_dir / "config" config: dict[str, Any] - with open(config_path, "r") as f: + with open(config_path, "r", encoding="utf-8") as f: config = json.load(f) # … update Addresses … - with open(config_path, "w") as f: + with open(config_path, "w", encoding="utf-8") as f: json.dump(config, f, indent=2) + # Release reserved ports + swarm_sock.close() + rpc_sock.close() + gateway_sock.close() - ipfs_process: subprocess.Popen[bytes] = subprocess.Popen( - ["ipfs", "daemon"], env=custom_env - ) - time.sleep(5) + try: + ipfs_process = subprocess.Popen( + ["ipfs", "daemon"], + env=custom_env, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + ) + # Wait for up to 30s for the daemon to become responsive + for _ in range(30): + time.sleep(1) + try: + import requests + requests.get(f"http://127.0.0.1:{rpc_port}/api/v0/version", timeout=1) + break + except: + continue + else: + raise RuntimeError("IPFS daemon did not start within 30 seconds") + except Exception: + ipfs_process.kill() + shutil.rmtree(temp_dir) + raise

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 74-74: Line too long (141/100)

(C0301)

[convention] 63-63: Missing function or method docstring

(C0116)

[warning] 81-81: Using open without explicitly specifying an encoding

(W1514)

[warning] 98-98: Using open without explicitly specifying an encoding

(W1514)

[refactor] 105-107: Consider using 'with' for resource-allocating operations

(R1732)

🤖 Prompt for AI Agents

In tests/testing_utils.py around lines 62 to 117, improve the create_ipfs fixture by reserving the ports using open sockets before writing the config to avoid race conditions from find_free_port(). Specify encoding="utf-8" when opening the config file for reading and writing. Replace the fixed time.sleep(5) with a loop that polls the IPFS daemon's /api/v0/version endpoint with a timeout to confirm readiness. Wrap the daemon startup in a try/except block to ensure proper cleanup of the temporary directory on failure and capture stdout/stderr for debugging.

coderabbitai · 2025-06-02T22:46:16Z

py_hamt/hamt.py

+    async def _set_pointer(self, key: str, val_ptr: IPLDKind) -> None:
+        async with self.lock:
+            node_stack: list[tuple[IPLDKind, Node]] = []
+            root_node: Node = await self.node_store.load(self.root_node_id)
+            node_stack.append((self.root_node_id, root_node))
+
+            # FIFO queue to keep track of all the KVs we need to insert
+            # This is needed if any buckets overflow and so we need to reinsert all those KVs
+            kvs_queue: list[tuple[str, IPLDKind]] = []
+            kvs_queue.append((key, val_ptr))
+
+            while len(kvs_queue) > 0:
+                _, top_node = node_stack[-1]
+                curr_key, curr_val_ptr = kvs_queue[0]
+
+                raw_hash: bytes = self.hash_fn(curr_key.encode())
+                map_key: int = extract_bits(raw_hash, len(node_stack) - 1, 8)
+
+                item = top_node.data[map_key]
+                if isinstance(item, list):
+                    next_node_id: IPLDKind = item[0]
+                    next_node: Node = await self.node_store.load(next_node_id)
+                    node_stack.append((next_node_id, next_node))
+                elif isinstance(item, dict):
+                    bucket: dict[str, IPLDKind] = item
+
+                    # If this bucket already has this same key, or has space, just rewrite the value and then go work on the others in the queue
+                    if curr_key in bucket or len(bucket) < self.max_bucket_size:
+                        bucket[curr_key] = curr_val_ptr
+                        kvs_queue.pop(0)
+                        continue

-    def __getitem__(self, key: str) -> IPLDKind:
-        if not self.read_only:
-            self.lock.acquire(blocking=True)
+                    # The current key is not in the bucket and the bucket is too full, so empty KVs from the bucket and restart insertion
+                    for k in bucket:
+                        v_ptr = bucket[k]
+                        kvs_queue.append((k, v_ptr))

-        raw_hash = self.hash_fn(key.encode())
+                    # Create a new link to a new node so that we can reflow these KVs into a new subtree
+                    new_node = Node()
+                    new_node_id: IPLDKind = await self.node_store.save(None, new_node)
+                    link: list[IPLDKind] = [new_node_id]
+                    top_node.data[map_key] = link

-        node_id_stack: list[Link] = []
-        node_id_stack.append(self.root_node_id)
+            # Finally, reserialize and fix all links, deleting empty nodes as needed
+            await self._reserialize_and_link(node_stack)
+            self.root_node_id = node_stack[0][0]


🛠️ Refactor suggestion

Consider breaking down the complex _set_pointer method.

This method has 20+ local variables and implements complex bucket overflow logic. Consider extracting helper methods for better maintainability.

The method could be broken down into:

_handle_bucket_overflow()

_insert_into_bucket()

_traverse_to_insertion_point()

This would improve readability and testability of the insertion logic.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 531-531: Line too long (144/100)

(C0301)

[convention] 537-537: Line too long (137/100)

(C0301)

[convention] 542-542: Line too long (104/100)

(C0301)

[refactor] 505-505: Too many local variables (20/15)

(R0914)

🤖 Prompt for AI Agents

In py_hamt/hamt.py around lines 505 to 550, the _set_pointer method is too complex with many local variables and intricate bucket overflow logic. Refactor by extracting helper methods such as _handle_bucket_overflow to manage bucket splitting and reinsertion, _insert_into_bucket to handle inserting or updating keys in buckets, and _traverse_to_insertion_point to navigate the node stack to the correct insertion location. This modularization will improve readability, maintainability, and testability of the insertion logic.

coderabbitai · 2025-06-02T22:46:16Z

py_hamt/hamt.py

+        # node stack is a list of tuples that look like (parent_id, self_id, node)
+        node_stack: list[tuple[int | None, int, Node]] = []
+        # The root node may not be in the buffer, e.g. this is HAMT initialized with a specific root node id
+        if self.is_buffer_id(self.hamt.root_node_id):
+            root_node: Node = self.buffer[cast(int, self.hamt.root_node_id)]
+            node_stack.append((None, cast(int, self.hamt.root_node_id), root_node))
+
+        while len(node_stack) > 0:
+            parent_buffer_id, top_buffer_id, top_node = node_stack[-1]
+            new_nodes_on_stack: list[tuple[int, int, Node]] = []
+            for child_buffer_id in self.children_in_memory(top_node):
+                child_node: Node = self.buffer[child_buffer_id]
+                new_nodes_on_stack.append((top_buffer_id, child_buffer_id, child_node))
+
+            no_children_in_memory: bool = len(new_nodes_on_stack) == 0
+            # Flush this node out and relink the rest of the tree
+            if no_children_in_memory:
+                is_root: bool = parent_buffer_id is None
+                old_id: int = top_buffer_id
+                new_id: IPLDKind = await self.hamt.cas.save(
+                    top_node.serialize(), codec="dag-cbor"
+                )
+                del self.buffer[old_id]
+                node_stack.pop()

-    ## `__len__`
-    Total number of keys. Note that this will have to scan the entire tree in order to count, which can take a while depending on the speed of the store retrieval.
+                # If it's the root, we need to set the hamt's root node id once this is done sending to the backing store
+                if is_root:
+                    self.hamt.root_node_id = new_id
+                # Edit and properly relink the parent if this is not the root
+                else:
+                    # parent_buffer_id is never None in this branch
+                    assert parent_buffer_id is not None
+                    parent_node: Node = self.buffer[parent_buffer_id]
+                    parent_node.replace_link(old_id, new_id)
+            # Continue recursing down the tree
+            else:
+                node_stack.extend(new_nodes_on_stack)



💡 Verification agent

🧩 Analysis chain

Verify the complex vacate algorithm correctness.

The vacate method implements a complex dependency-aware flushing algorithm. The logic appears sound, but the complexity warrants verification.

The algorithm uses a DFS traversal to flush nodes from leaves to root, ensuring dependencies are satisfied. Let me verify that the stack manipulation and parent relinking logic is correct:

🏁 Script executed:

#!/bin/bash # Description: Search for tests that verify the InMemoryTreeStore.vacate() method behavior # Look for tests specifically testing the vacate functionality rg -A 10 -B 5 "vacate.*test|test.*vacate" --type py # Look for tests of InMemoryTreeStore rg -A 10 -B 5 "InMemoryTreeStore" --type py

Length of output: 5286

Add dedicated tests for InMemoryTreeStore.vacate

The vacate method contains a non-trivial DFS-based flush algorithm but currently has no direct test coverage. Please add unit tests to ensure its correctness:

• py_hamt/hamt.py (lines 215–252): the core vacate implementation
• Create tests covering:
– Single‐node (root only) scenario
– Multi‐level tree: verify leaves are flushed before parents and links are relinked correctly
– Root ID update when vacating the root node
– Mixed buffer state (some children already flushed vs. still in memory)
– Empty buffer / no-op case

Tests can live under a new file, e.g. tests/test_inmemory_tree_store_vacate.py, and should drive HAMT.make_read_only() (which calls vacate) to observe side effects on buffer and root_node_id.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 218-218: Line too long (108/100)

(C0301)

[convention] 241-241: Line too long (121/100)

(C0301)

🤖 Prompt for AI Agents

In py_hamt/hamt.py around lines 215 to 252, the vacate method lacks direct unit test coverage despite its complex DFS-based flushing logic. Add dedicated tests in a new file like tests/test_inmemory_tree_store_vacate.py that cover single-node scenarios, multi-level trees verifying leaf-first flushing and correct parent relinking, root ID updates after vacating, mixed buffer states with some children flushed, and empty buffer no-op cases. Use HAMT.make_read_only() to trigger vacate and assert expected changes in buffer and root_node_id to ensure correctness.

…ynamically

Tests/add kubo if not

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

tests/testing_utils.py (1)
80-83: TOCTOU race condition in port allocation.

The _free_port() function has the same race condition issue mentioned in past reviews. Although used in Docker context, the port can still be taken between socket close and Docker container startup.

Consider returning both the port and an open socket, or use Docker's automatic port allocation:
-def _free_port() -> int:
-    with socket.socket() as s:
-        s.bind(("", 0))
-        return s.getsockname()[1]
+def _free_port() -> int:
+    """Get a free port. Note: Subject to race conditions in concurrent environments."""
+    with socket.socket() as s:
+        s.bind(("", 0))
+        return s.getsockname()[1]
Alternatively, let Docker handle port allocation automatically by using "5001/tcp": None in the ports mapping.
tests/test_kubo_cas.py (1)
162-211: 🛠️ Refactor suggestion

Inconsistent IPFS availability handling.

This test manually checks for IPFS daemon availability instead of using the create_ipfs fixture consistently. This creates brittleness and inconsistency with other tests.

Consider using the create_ipfs fixture consistently, or if testing default URLs specifically is important, use a separate test that's properly isolated:
 @pytest.mark.ipfs
 @pytest.mark.asyncio(loop_scope="session")
 @given(data=ipld_strategy())
 @settings(deadline=1000, print_blob=True)
-async def test_kubo_default_urls(global_client_session, data: IPLDKind):
+async def test_kubo_default_urls(create_ipfs, global_client_session, data: IPLDKind):
     """
-    Tests KuboCAS using its default URLs and when None is passed for URLs.
-    Requires a local IPFS daemon on default ports.
+    Tests KuboCAS default URL behavior with fixture-provided daemon.
     """
-    # Check if local IPFS daemon is available on default ports
-    import http.client
-
-    try:
-        conn = http.client.HTTPConnection("127.0.0.1", 5001, timeout=1)
-        conn.request("POST", "/api/v0/version")
-        response = conn.getresponse()
-        if response.status != 200:
-            pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")
-    except Exception:
-        pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")
+    rpc_url, gateway_url = create_ipfs
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 172-172: Import outside toplevel (http.client)

(C0415)

[warning] 180-180: Catching too general exception Exception

(W0718)

[warning] 193-193: Catching too general exception Exception

(W0718)

[warning] 209-209: Catching too general exception Exception

(W0718)

🧹 Nitpick comments (6)

tests/testing_utils.py (3)
1-11: Add module docstring for better documentation.

Consider adding a module-level docstring to describe the purpose of this testing utilities module.
+"""
+Testing utilities for py-hamt IPFS integration tests.
+
+Provides Hypothesis strategies for generating test data, helper functions for IPFS
+daemon detection, Docker client management, and pytest fixtures for IPFS testing.
+"""
 import http.client
 import os
 import socket
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 8-8: Unable to import 'hypothesis'

(E0401)

[error] 9-9: Unable to import 'hypothesis.strategies'

(E0401)

[error] 10-10: Unable to import 'multiformats'

(E0401)

28-39: Add docstring for ipld_strategy function.

The function generates IPLD-compatible data structures but lacks documentation explaining its purpose and return type.
 def ipld_strategy() -> SearchStrategy:
+    """
+    Generate Hypothesis strategy for IPLD-compatible data structures.
+    
+    Returns a strategy that produces values compatible with IPLD (InterPlanetary
+    Linked Data) format, including primitives, binary data, and CIDs.
+    """
     return st.one_of(
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 28-28: Missing function or method docstring

(C0116)

63-63: Replace broad exception handling with specific exceptions.

Multiple functions use broad except Exception: blocks which can mask unexpected errors. Consider catching specific exceptions where possible.

For HTTP connection functions, catch specific connection-related exceptions:
     try:
         conn = http.client.HTTPConnection(p.hostname, p.port, timeout=1)
         conn.request("POST", "/api/v0/version")
         return conn.getresponse().status == 200
-    except Exception:
+    except (OSError, http.client.HTTPException):
         return False
Also applies to: 76-76, 116-116, 133-133, 141-141

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 63-63: Catching too general exception Exception

(W0718)
tests/conftest.py (1)
1-5: Add module docstring and ignore false positive import warnings.

The re-exports are intentional for backward compatibility, but the module should have documentation.
+"""
+Pytest configuration and shared fixtures for py-hamt tests.
+
+Provides shared aiohttp session, IPFS test control via --ipfs flag,
+and re-exports testing utilities for backward compatibility.
+"""
 import aiohttp
 import pytest
 
 # re-export helpers so existing imports keep working (optional)
-from testing_utils import create_ipfs, ipld_strategy  # noqa: F401
+from testing_utils import create_ipfs, ipld_strategy  # noqa: F401,W0611
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 1-1: Unable to import 'aiohttp'

(E0401)

[warning] 5-5: Unused create_ipfs imported from testing_utils

(W0611)

[warning] 5-5: Unused ipld_strategy imported from testing_utils

(W0611)
README.md (1)
50-50: Fix grammar issue with "both...as well as" construction.
-We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`.
+We use `pytest` with 100% code coverage, with test inputs that are both handwritten and generated by `hypothesis`.
🧰 Tools

🪛 LanguageTool

[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)
tests/test_kubo_cas.py (1)
1-11: Add module docstring for test file.
+"""
+Tests for KuboCAS and InMemoryCAS implementations.
+
+Tests async content-addressed storage with IPFS integration,
+property-based testing with Hypothesis, and error handling.
+"""
 from typing import Literal, cast
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'aiohttp'

(E0401)

[error] 4-4: Unable to import 'dag_cbor'

(E0401)

[error] 6-6: Unable to import 'dag_cbor'

(E0401)

[error] 7-7: Unable to import 'hypothesis'

(E0401)

[error] 10-10: Unable to import 'py_hamt'

(E0401)

[error] 11-11: Unable to import 'py_hamt.store'

(E0401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40a6061 and 5c4b043.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

.github/workflows/run-checks.yaml (1 hunks)
README.md (2 hunks)
pyproject.toml (2 hunks)
run-checks.sh (1 hunks)
tests/conftest.py (1 hunks)
tests/test_kubo_cas.py (1 hunks)
tests/testing_utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

run-checks.sh
pyproject.toml
.github/workflows/run-checks.yaml

🧰 Additional context used

🧬 Code Graph Analysis (2)

tests/conftest.py (1)

tests/testing_utils.py (2)

create_ipfs (151-197)

ipld_strategy (28-39)

tests/test_kubo_cas.py (3)

tests/testing_utils.py (2)

ipld_strategy (28-39)

create_ipfs (151-197)

py_hamt/store.py (8)

KuboCAS (74-269)

InMemoryCAS (37-71)

load (33-34)

load (52-71)

load (262-269)

save (26-30)

save (47-50)

save (246-260)

tests/conftest.py (1)

global_client_session (9-13)

🪛 Pylint (3.3.7)

tests/conftest.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 1-1: Unable to import 'aiohttp'

(E0401)

[convention] 16-16: Missing function or method docstring

(C0116)

[convention] 25-25: Missing function or method docstring

(C0116)

[convention] 29-29: Missing function or method docstring

(C0116)

[warning] 5-5: Unused create_ipfs imported from testing_utils

(W0611)

[warning] 5-5: Unused ipld_strategy imported from testing_utils

(W0611)

tests/testing_utils.py

[convention] 48-48: Line too long (112/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 8-8: Unable to import 'hypothesis'

(E0401)

[error] 9-9: Unable to import 'hypothesis.strategies'

(E0401)

[error] 10-10: Unable to import 'multiformats'

(E0401)

[convention] 28-28: Missing function or method docstring

(C0116)

[warning] 63-63: Catching too general exception Exception

(W0718)

[warning] 76-76: Catching too general exception Exception

(W0718)

[warning] 116-116: Catching too general exception Exception

(W0718)

[warning] 133-133: Catching too general exception Exception

(W0718)

[warning] 141-141: Catching too general exception Exception

(W0718)

tests/test_kubo_cas.py

[convention] 63-63: Line too long (129/100)

(C0301)

[convention] 93-93: Line too long (130/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'aiohttp'

(E0401)

[error] 4-4: Unable to import 'dag_cbor'

(E0401)

[error] 6-6: Unable to import 'dag_cbor'

(E0401)

[error] 7-7: Unable to import 'hypothesis'

(E0401)

[error] 10-10: Unable to import 'py_hamt'

(E0401)

[error] 11-11: Unable to import 'py_hamt.store'

(E0401)

[convention] 16-16: Missing function or method docstring

(C0116)

[convention] 172-172: Import outside toplevel (http.client)

(C0415)

[warning] 180-180: Catching too general exception Exception

(W0718)

[warning] 193-193: Catching too general exception Exception

(W0718)

[warning] 209-209: Catching too general exception Exception

(W0718)

[convention] 218-218: Missing function or method docstring

(C0116)

🪛 LanguageTool

README.md

[uncategorized] ~48-~48: Possible missing comma found.
Context: ...check formatting and linting. Under the hood it will be using the pre-commit comma...

(AI_HYDRA_LEO_MISSING_COMMA)

[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)

[style] ~60-~60: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...

(IN_ORDER_TO_PREMIUM)

[uncategorized] ~60-~60: Possible missing comma found.
Context: .../run-checks.yamluses thesetup-ipfs` step which ensures that a local ipfs daemon ...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~60-~60: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)

README.md

54-54: Blank line inside blockquote
null

(MD028, no-blanks-blockquote)

🔇 Additional comments (9)

tests/testing_utils.py (1)

150-197: IPFS fixture implementation is well-designed.

The fixture provides good fallback logic with proper resource cleanup. The Docker-based approach with readiness checking is robust and handles different deployment scenarios well.

tests/conftest.py (2)

8-13: Excellent async session management.

The session-scoped async fixture provides efficient resource sharing across tests while ensuring proper cleanup. This is a best practice for aiohttp testing.

16-35: Well-implemented conditional test execution.

The --ipfs flag approach with automatic test skipping provides good control over IPFS-dependent tests. This is particularly useful for CI environments and local development.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 16-16: Missing function or method docstring

(C0116)

[convention] 25-25: Missing function or method docstring

(C0116)

[convention] 29-29: Missing function or method docstring

(C0116)

README.md (3)

13-15: Clear description update reflects architectural changes.

The updated description properly emphasizes the content-addressed storage focus and IPFS/IPLD targeting, which aligns with the async refactoring objectives.

27-41: Comprehensive development setup instructions.

The addition of explicit setup commands using uv, virtual environment creation, and pre-commit hook installation provides clear guidance for new contributors.

58-82: Excellent clarification of IPFS testing scenarios.

The detailed breakdown of different testing scenarios (GitHub Actions, local with Docker, local with daemon, quick local) provides valuable guidance for developers in various environments.

🧰 Tools

🪛 LanguageTool

[style] ~60-~60: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...

(IN_ORDER_TO_PREMIUM)

[uncategorized] ~60-~60: Possible missing comma found.
Context: .../run-checks.yamluses thesetup-ipfs` step which ensures that a local ipfs daemon ...

(AI_HYDRA_LEO_MISSING_COMMA)

[uncategorized] ~60-~60: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...

(AI_HYDRA_LEO_MISSING_COMMA)

tests/test_kubo_cas.py (3)

15-43: Comprehensive error handling tests for InMemoryCAS.

The tests properly validate both KeyError for missing keys and TypeError for invalid key types with appropriate error messages. Good defensive testing.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 16-16: Missing function or method docstring

(C0116)

135-160: Well-structured test with proper fixture usage.

The test correctly uses the create_ipfs fixture and global_client_session for reliable IPFS testing across different environments (Docker/local daemon).

213-235: Good test coverage with proper session management.

The test demonstrates proper async session management by creating its own aiohttp.ClientSession and using proper type safety with codec literals.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 218-218: Missing function or method docstring

(C0116)

tests/test_kubo_cas.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

py_hamt/store.py (2)
23-23: Fix the type alias syntax to resolve parsing error.

The type alias syntax is causing a parsing error with Pylint. Use the traditional syntax that's compatible with older Python versions.
-    type CodecInput = Literal["raw", "dag-cbor"]
+    CodecInput = Literal["raw", "dag-cbor"]
🧰 Tools

🪛 Pylint (3.3.7)

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)

231-241: Add test coverage for async context manager methods.

The async context manager methods (aclose, __aenter__, __aexit__) need test coverage to ensure session cleanup works correctly, as identified in previous reviews.

🧹 Nitpick comments (2)

py_hamt/store.py (2)
140-140: Add validation for the concurrency parameter.

Consider adding validation to ensure the concurrency parameter is a positive integer to prevent runtime issues with the semaphore.
+        if concurrency <= 0:
+            raise ValueError("concurrency must be a positive integer")
258-260: Add comment explaining DAG_PB special handling.

The special case for DAG_PB_MARKER could benefit from a comment explaining why this codec is not overridden (likely because it's UnixFS sharded data as mentioned in the AI summary).
+        # Don't override DAG_PB codec as it indicates UnixFS sharded data
         if cid.codec.code != self.DAG_PB_MARKER:
             cid = cid.set(codec=codec)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14385ee and 7b95f1c.

📒 Files selected for processing (1)

py_hamt/store.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

py_hamt/store.py (3)

py_hamt/hamt.py (7)

save (133-134)

save (155-156)

save (267-273)

load (137-138)

load (158-167)

load (275-284)

get (592-599)

py_hamt/encryption_hamt_store.py (1)

get (146-170)

py_hamt/zarr_hamt_store.py (1)

get (92-118)

🪛 Pylint (3.3.7)

py_hamt/store.py

[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'

(E0001)

🔇 Additional comments (2)

py_hamt/store.py (2)

52-71: Excellent defensive programming and error handling.

The type checking with detailed explanation of the Liskov substitution principle issue is well-documented. The improved error handling using KeyError with chained exceptions is much better than generic exceptions.

262-269: Clean async implementation with proper throttling.

The load method effectively uses aiohttp with semaphore throttling and appropriate error handling. This addresses the previous blocking requests issue mentioned in past reviews.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

py_hamt/store.py (2)
23-23: Fix the type alias syntax error.

The type alias syntax is still causing parsing errors. Based on the past review comment, this needs to be corrected.
-    CodecInput = Literal["raw", "dag-cbor"]
+CodecInput = Literal["raw", "dag-cbor"]
Move the type alias outside the class to resolve the parsing issue.

231-242: Add test coverage for async context manager methods.

The async context manager methods (aclose, __aenter__, __aexit__) still need test coverage as identified in the past review comments. The pipeline failure indicates these lines aren't covered by tests.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 231-231: Missing function or method docstring

(C0116)

🧹 Nitpick comments (3)

py_hamt/store.py (3)
44-44: Use dict literal for better performance.
-        self.store = dict()
+        self.store = {}
🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 44-44: Consider using '{}' instead of a call to 'dict'.

(R1735)

47-51: Consider adding codec validation.

The codec parameter is not used in the implementation, but the method signature suggests it should affect how data is stored or identified.
 async def save(self, data: bytes, codec: ContentAddressedStore.CodecInput) -> bytes:
+    # TODO: Consider incorporating codec into hash calculation for different content types
     hash: bytes = self.hash_alg.digest(data, size=32)
     self.store[hash] = data
     return hash
🧰 Tools

🪛 Pylint (3.3.7)

[warning] 48-48: Redefining built-in 'hash'

(W0622)

1-8: Add module docstring.
+"""
+Content-addressed storage implementations for HAMT.
+
+Provides abstract base class and concrete implementations for storing 
+and retrieving data in content-addressed storage systems like IPFS.
+"""
 import asyncio
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

[error] 5-5: Unable to import 'aiohttp'

(E0401)

[error] 6-6: Unable to import 'dag_cbor.ipld'

(E0401)

[error] 7-7: Unable to import 'multiformats'

(E0401)

[error] 8-8: Unable to import 'multiformats.multihash'

(E0401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b95f1c and 87e3451.

📒 Files selected for processing (1)

py_hamt/store.py (1 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

py_hamt/store.py

[convention] 13-13: Line too long (104/100)

(C0301)

[convention] 15-15: Line too long (145/100)

(C0301)

[convention] 18-18: Line too long (194/100)

(C0301)

[convention] 20-20: Line too long (119/100)

(C0301)

[convention] 29-29: Line too long (116/100)

(C0301)

[convention] 38-38: Line too long (261/100)

(C0301)

[convention] 57-57: Line too long (103/100)

(C0301)

[convention] 58-58: Line too long (109/100)

(C0301)

[convention] 60-60: Line too long (134/100)

(C0301)

[convention] 83-83: Line too long (168/100)

(C0301)

[convention] 133-133: Line too long (190/100)

(C0301)

[convention] 146-146: Line too long (188/100)

(C0301)

[convention] 149-149: Line too long (177/100)

(C0301)

[convention] 150-150: Line too long (106/100)

(C0301)

[convention] 152-152: Line too long (170/100)

(C0301)

[convention] 164-164: Line too long (183/100)

(C0301)

[convention] 173-173: Line too long (205/100)

(C0301)

[convention] 174-174: Line too long (174/100)

(C0301)

[convention] 178-178: Line too long (136/100)

(C0301)

[convention] 182-182: Line too long (178/100)

(C0301)

[convention] 1-1: Missing module docstring

(C0114)

[error] 5-5: Unable to import 'aiohttp'

(E0401)

[error] 6-6: Unable to import 'dag_cbor.ipld'

(E0401)

[error] 7-7: Unable to import 'multiformats'

(E0401)

[error] 8-8: Unable to import 'multiformats.multihash'

(E0401)

[warning] 33-33: Redefining built-in 'id'

(W0622)

[refactor] 44-44: Consider using '{}' instead of a call to 'dict'.

(R1735)

[warning] 48-48: Redefining built-in 'hash'

(W0622)

[warning] 52-52: Redefining built-in 'id'

(W0622)

[refactor] 74-74: Too many instance attributes (8/7)

(R0902)

[refactor] 134-134: Too many arguments (8/5)

(R0913)

[refactor] 134-134: Too many positional arguments (6/5)

(R0917)

[convention] 231-231: Missing function or method docstring

(C0116)

[warning] 262-262: Redefining built-in 'id'

(W0622)

🔇 Additional comments (3)

py_hamt/store.py (3)

189-192: Verify URL construction and pin parameter.

The URL construction uses pin=false which aligns with the past discussion about not wanting to pin content on saves to avoid repinning entire datasets.

The implementation correctly avoids pinning as discussed in previous comments, preventing potential performance issues when dealing with large HAMTs.

194-227: Well-designed session management per event loop.

This is an excellent solution to the async session management challenges mentioned in the PR objectives. The per-loop session management elegantly handles the issue where zarr runs its own event loop in a separate thread.

The implementation properly addresses the session management complexities across different event loops that were discussed in the PR comments.

246-270: Excellent migration to async HTTP client.

The implementation successfully addresses the past review comment about blocking requests in async methods. The use of aiohttp with proper session management and semaphore-based concurrency control is well-executed.

Key improvements:

Non-blocking HTTP requests with aiohttp

Proper semaphore usage for rate limiting

Session reuse across requests

Appropriate error handling with raise_for_status()

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 262-262: Redefining built-in 'id'

(W0622)

…are staged

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

tests/test_kubo_cas.py (3)
1-1: Add module docstring.

The module is missing a docstring that should describe its purpose and functionality.
+"""
+Tests for KuboCAS and InMemoryCAS content-addressed storage implementations.
+
+This module contains async tests for the refactored storage layer, including
+property-based tests for IPLD data serialization/deserialization and error
+handling scenarios.
+"""
+
 from typing import Literal, cast
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)

15-19: Add function docstring for consistency.

This function lacks a docstring explaining its purpose.
 @pytest.mark.asyncio
 async def test_memory_store_exception():
+    """Test that InMemoryCAS.load raises KeyError for empty bytes key."""
     s = InMemoryCAS()
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 16-16: Missing function or method docstring

(C0116)

124-129: Add function docstring.

This function lacks a docstring explaining its purpose and how it differs from other KuboCAS tests.
 @pytest.mark.asyncio
 @given(data=ipld_strategy())
 @settings(
     deadline=500
 )  # this sometimes takes longer than the default 250 ms in GitHub CI
 async def test_kubo_cas(create_ipfs, data: IPLDKind):  # noqa
+    """
+    Test KuboCAS with a custom aiohttp session for complete code coverage.
+    
+    This test creates its own ClientSession instead of using the global one
+    to ensure all code paths are tested.
+    """
     rpc_base_url, gateway_base_url = create_ipfs
🧰 Tools

🪛 Pylint (3.3.7)

[convention] 129-129: Missing function or method docstring

(C0116)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1632dca and 7498c28.

📒 Files selected for processing (1)

tests/test_kubo_cas.py (1 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

tests/test_kubo_cas.py

[convention] 1-1: Missing module docstring

(C0114)

[error] 3-3: Unable to import 'aiohttp'

(E0401)

[error] 4-4: Unable to import 'dag_cbor'

(E0401)

[error] 6-6: Unable to import 'dag_cbor'

(E0401)

[error] 7-7: Unable to import 'hypothesis'

(E0401)

[error] 10-10: Unable to import 'py_hamt'

(E0401)

[error] 11-11: Unable to import 'py_hamt.store'

(E0401)

[convention] 16-16: Missing function or method docstring

(C0116)

[convention] 83-83: Import outside toplevel (http.client)

(C0415)

[warning] 91-91: Catching too general exception Exception

(W0718)

[warning] 104-104: Catching too general exception Exception

(W0718)

[warning] 120-120: Catching too general exception Exception

(W0718)

[convention] 129-129: Missing function or method docstring

(C0116)

🔇 Additional comments (1)

tests/test_kubo_cas.py (1)

82-93: Good implementation of daemon availability check.

The addition of the daemon availability check properly addresses the previous review concern about test brittleness when no IPFS daemon is running on default ports. The timeout and graceful skip behavior make the test more robust.

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 83-83: Import outside toplevel (http.client)

(C0415)

[warning] 91-91: Catching too general exception Exception

(W0718)

coderabbitai · 2025-06-04T04:51:50Z

tests/test_kubo_cas.py

+                loaded_encoded_data = await kubo_cas_default.load(cid)
+                result = dag_cbor.decode(loaded_encoded_data)
+                assert data == result
+            except Exception as e:


🛠️ Refactor suggestion

Use more specific exception handling.

Catching the general Exception in these test scenarios is too broad and may mask unexpected errors. Consider catching specific exceptions related to IPFS operations or let unexpected exceptions propagate to help with debugging.

- except Exception as e: + except (aiohttp.ClientError, TimeoutError) as e: pytest.fail( f"Error during KuboCAS default URL test (codec: {codec}): {e}" )

And similarly for the second occurrence:

- except Exception as e: + except (aiohttp.ClientError, TimeoutError) as e: pytest.fail(f"Error during KuboCAS None URL test (codec: {codec}): {e}")

Also applies to: 120-120

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 104-104: Catching too general exception Exception

(W0718)

🤖 Prompt for AI Agents

In tests/test_kubo_cas.py at lines 104 and 120, replace the broad 'except Exception' clauses with more specific exception types related to IPFS operations. Identify the particular exceptions that the IPFS client or related code can raise and catch only those. This will prevent masking unexpected errors and improve test reliability and debugging.

coderabbitai · 2025-06-04T04:51:50Z

tests/test_kubo_cas.py

+        response = conn.getresponse()
+        if response.status != 200:
+            pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")
+    except Exception:


🛠️ Refactor suggestion

Use more specific exception handling.

Catching the general Exception is too broad. Consider catching specific connection-related exceptions.

- except Exception: + except (ConnectionError, OSError, TimeoutError): pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception:

except (ConnectionError, OSError, TimeoutError):

pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")

🧰 Tools

🪛 Pylint (3.3.7)

[warning] 91-91: Catching too general exception Exception

(W0718)

🤖 Prompt for AI Agents

In tests/test_kubo_cas.py at line 91, replace the broad `except Exception:` clause with more specific exception handling for connection-related errors. Identify the exact exceptions raised during connection failures (such as connection errors or timeouts) and catch only those to improve error handling precision.

coderabbitai · 2025-06-04T04:51:50Z

tests/test_kubo_cas.py

+    Requires a local IPFS daemon on default ports.
+    """
+    # Check if local IPFS daemon is available on default ports
+    import http.client


🛠️ Refactor suggestion

Move import to module level.

The http.client import should be moved to the top of the file with other imports for better code organization.

+import http.client from typing import Literal, cast import aiohttp

Then remove the import from inside the function:

""" # Check if local IPFS daemon is available on default ports - import http.client try:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import http.client

# At the top of tests/test_kubo_cas.py, group all imports together

import http.client

from typing import Literal, cast

import aiohttp

# …other existing imports…

Suggested change

import http.client

# In the test_kubo_default_urls function, remove the inline import

"""

# Check if local IPFS daemon is available on default ports

try:

# …rest of the test…

🧰 Tools

🪛 Pylint (3.3.7)

[convention] 83-83: Import outside toplevel (http.client)

(C0415)

🤖 Prompt for AI Agents

In tests/test_kubo_cas.py at line 83, the import statement for http.client is inside a function. Move this import to the top of the file with the other import statements to improve code organization and readability. Then remove the import statement from inside the function to avoid redundant imports.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

README.md (3)
45-52: Refine comma usage and conjunction

Add a comma after “Under the hood,”

Change “both handwritten as well as generated” to “both handwritten and generated”
Apply this diff:
- This will run tests with code coverage, and check formatting and linting. Under the hood it will be using the `pre-commit` command ...
+ This will run tests with code coverage, and check formatting and linting. Under the hood, it will be using the `pre-commit` command ...
 
- We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`.
+ We use `pytest` with 100% code coverage and with test inputs that are both handwritten and generated by `hypothesis`.
🧰 Tools

🪛 LanguageTool

[uncategorized] ~50-~50: Possible missing comma found.
Context: ...check formatting and linting. Under the hood it will be using the pre-commit comma...

(AI_HYDRA_LEO_MISSING_COMMA)

[grammar] ~52-~52: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)

54-59: Remove blank line inside blockquote
The extra blank line between blockquote markers breaks markdownlint. Remove the empty line at line 56.

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

56-56: Blank line inside blockquote
null

(MD028, no-blanks-blockquote)

62-62: Simplify phrasing and add comma

Replace “in order to be able to run” with “to run”

Insert a comma after “installed,”
- Due to the dependency on [IPFS](...) in order to be able to run all integration tests which use IPFS a local ipfs daemon is required.
+ Due to the dependency on [IPFS](...), to run all integration tests which use IPFS, a local ipfs daemon is required.
🧰 Tools

🪛 LanguageTool

[style] ~62-~62: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...

(IN_ORDER_TO_PREMIUM)

[uncategorized] ~62-~62: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...

(AI_HYDRA_LEO_MISSING_COMMA)
AGENTS.md (2)
19-23: Fix typo in bullet
“accesed” → “accessed”
- `hamt.py`- where the HAMT data structure is constructed and accesed
+ `hamt.py` - where the HAMT data structure is constructed and accessed
🧰 Tools

🪛 LanguageTool

[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...

(UNLIKELY_OPENING_PUNCTUATION)

111-111: Add comma for clarity
Insert a comma before the coordinating conjunction to separate clauses.
- If there is an error with formatting for ruff and it can be autofixed you can normally run
+ If there is an error with formatting for ruff, and it can be autofixed, you can normally run
🧰 Tools

🪛 LanguageTool

[uncategorized] ~111-~111: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...

(COMMA_COMPOUND_SENTENCE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7498c28 and c4ba8ea.

📒 Files selected for processing (2)

AGENTS.md (1 hunks)
README.md (2 hunks)

🧰 Additional context used

🪛 LanguageTool

AGENTS.md

[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...

(THIS_NNS)

[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...

(COMMA_COMPOUND_SENTENCE_2)

[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. - /tests: Test files that should be maintained an...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~111-~111: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...

(COMMA_COMPOUND_SENTENCE)

README.md

[uncategorized] ~50-~50: Possible missing comma found.
Context: ...check formatting and linting. Under the hood it will be using the pre-commit comma...

(AI_HYDRA_LEO_MISSING_COMMA)

[grammar] ~52-~52: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...

(BOTH_AS_WELL_AS)

[style] ~62-~62: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...

(IN_ORDER_TO_PREMIUM)

[uncategorized] ~62-~62: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)

README.md

56-56: Blank line inside blockquote
null

(MD028, no-blanks-blockquote)

🔇 Additional comments (11)

README.md (10)

9-9: Badge looks good
The Codecov badge URL is valid and up to date.

13-15: Updated project description is clear
The new summary accurately reflects the HAMT’s purpose and removed outdated examples.

23-26: Usage references updated
Linking to the API docs and external repos is appropriate and keeps the README concise.

28-36: Dev setup with uv is well documented
Steps to install uv, sync, activate the venv, and install hooks read clearly.

38-44: Pre-commit instructions are clear
Including run-checks.sh usage here aligns with the updated CI and local workflow.

64-69: Test matrix summary is helpful
The examples for GitHub Actions and local scenarios are clear and comprehensive.

88-88: CPU/memory profiling section intact
Instructions for cProfile, snakeviz, and memray are accurate.

105-108: Documentation generation instructions are correct
Using uv run pdoc py_hamt is consistent with the rest of the dev workflow.

110-113: LLMs section added appropriately
Pointing AI users to AGENTS.md is clear and purposeful.

114-116: Dependency management is well covered
uv add/uv remove guidance is consistent with the project's use of uv.

🧰 Tools

🪛 LanguageTool

[typographical] ~115-~115: Consider adding a comma here.
Context: ...ytest --group dev`. For more information please see the [uv documentation](https://docs...

(PLEASE_COMMA)

AGENTS.md (1)

1-120: New AI agent guide looks comprehensive
The file concisely covers architecture, conventions, setup, testing, and PR guidelines for AI agents.

🧰 Tools

🪛 LanguageTool

[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...

(THIS_NNS)

[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...

(COMMA_COMPOUND_SENTENCE_2)

[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. - /tests: Test files that should be maintained an...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~111-~111: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...

(COMMA_COMPOUND_SENTENCE)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

AGENTS.md (9)
7-7: Capitalize “Python” in project description
Standardize language name by capitalizing “Python”.

Apply this diff:
- This library is a python implementation of a HAMT, inspired by [rvagg's IAMap project written in JavaScript](https://github.com/rvagg/iamap).
+ This library is a Python implementation of a HAMT, inspired by [rvagg's IAMap project written in JavaScript](https://github.com/rvagg/iamap).
11-11: Standardize “Zarr” casing
Use proper casing for the Zarr project name.
- dClimate primarily created this for storing large [zarrs](https://zarr.dev/) on IPFS.
+ dClimate primarily created this for storing large [Zarr](https://zarr.dev/) datasets on IPFS.
15-15: Add comma and standardize “RPC” in architecture description
Improve readability and consistency.
- An IPFS daemon running locally can be used or a remote one can be accessed via gateway(reads) & rpc endpoints(writes) on their IPFS defaults.
+ An IPFS daemon running locally can be used, or a remote one can be accessed via gateway (reads) & RPC endpoints (writes) on their IPFS defaults.
🧰 Tools

🪛 LanguageTool

[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...

(COMMA_COMPOUND_SENTENCE_2)

19-25: Fix list indentation and correct typos under “Project Structure”
Align nested list items with two spaces and fix spelling.
- - `/py_hamt`: Source code of the py-hamt library
-     - `encryption_hamt_store.py` - Example using total encryption
-     - `hamt.py`- where the HAMT data structure is constructed and accesed
-     - `store.py`- Where various stores live that writes the data. Primarily used KuboCAS
-     - `zarr_hamt_store.py` - ZarrHAMTStore class used to directly write or read zarrs leveraging the HAMT data structure onto a Store usually a Content Addressed store like KuboCAS located in store.py.
- - `/tests`: Test files that should be maintained and extended where possible.
+  - `/py_hamt`: Source code of the py-hamt library
+    - `encryption_hamt_store.py` - Example using total encryption
+    - `hamt.py` - where the HAMT data structure is constructed and accessed
+    - `store.py` - where various stores live that write the data (primarily KuboCAS)
+    - `zarr_hamt_store.py` - ZarrHAMTStore class for reading/writing Zarr leveraging the HAMT data structure on a Content Addressed store (e.g., KuboCAS)
+  - `/tests`: Test files that should be maintained and extended where possible.
🧰 Tools

🪛 LanguageTool

[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. - /tests: Test files that should be maintained an...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

36-38: Refine wording for readability
Consider hyphenation for clarity.
- Use functional easy to understand functions wherever possible.
+ Use functional, easy-to-understand functions wherever possible.
57-59: Specify language for code fence
Add a language identifier to fenced code blocks for proper syntax highlighting.
- ```
- pre-commit install
- ```
+ ```bash
+ pre-commit install
+ ```
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

57-57: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

63-63: Fix spacing in inline code span
Add a space before the inline code for readability.
- run all tests with` pytest --ipfs`
+ run all tests with `pytest --ipfs`
🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

63-63: Spaces inside code span elements
null

(MD038, no-space-in-code)

106-108: Add language specifiers to fenced code blocks
Ensure proper syntax highlighting by specifying the language for these fences.
- ```
- pytest --cov
- ```
+ ```bash
+ pytest --cov
+ ```

- ```
- uv run pre-commit run --all-files --show-diff-on-failure
- ```
+ ```bash
+ uv run pre-commit run --all-files --show-diff-on-failure
+ ```

- ```
- uv run ruff check --fix
- ```
+ ```bash
+ uv run ruff check --fix
+ ```
Also applies to: 111-113, 121-123

🧰 Tools

🪛 markdownlint-cli2 (0.17.2)

106-106: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

119-119: Add comma for compound sentence clarity
Insert a comma before “and” to separate independent clauses.
- If there is an error with formatting for ruff and it can be autofixed you can normally run
+ If there is an error with formatting for ruff, and it can be autofixed, you can normally run
🧰 Tools

🪛 LanguageTool

[uncategorized] ~119-~119: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...

(COMMA_COMPOUND_SENTENCE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4ba8ea and 48800e1.

📒 Files selected for processing (1)

AGENTS.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

AGENTS.md

[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...

(THIS_NNS)

[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...

(COMMA_COMPOUND_SENTENCE_2)

[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. - /tests: Test files that should be maintained an...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~119-~119: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...

(COMMA_COMPOUND_SENTENCE)

🪛 markdownlint-cli2 (0.17.2)

AGENTS.md

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

57-57: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

63-63: Spaces inside code span elements
null

(MD038, no-space-in-code)

106-106: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

112-112: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

121-121: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🔇 Additional comments (2)

AGENTS.md (2)

2-3: Skip false positive: correct singular usage
The determiner “this” correctly refers to the singular file “AGENTS.md”, not its plural agents.

🧰 Tools

🪛 LanguageTool

[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...

(THIS_NNS)

26-34: Docs conventions section is clear
The coding conventions and guidelines are well structured and aligned with project requirements.

abidsikder added 30 commits May 5, 2025 17:45

remove requests, add aiohttp dependency

2b9bb7d

make store async

cc68e21

address aiohttp deprecation warning

ce75866

move common testing utils into single file

6d73469

pass fuzz tested ipfsstore

916df3c

move key_value_list into common testing utilities

7dcc852

compact and simplify Node's dag-cbor representation

7e0a162

remove encryption transformers

ae0a308

even more efficient node implementation

d72dde7

remove node test, rely on hamt integration instead

0ceede9

add abstract method decorator, remove unnecessary line

655f1af

add type definition to store codec input

149bae2

ignore .DS_Store

7dd2b9c

async alternatives to all methods, VirtualStore with cache and in mem…

1a0324e

…ory tree, incomplete

finish reserialization and linking changes

9646367

convert all read node uses to use virtual store

bfb2a2d

remove transformers

a42bdaf

detach from MutableMapping interface to allow for full async creation

f3cfd29

first round of passing tests with the new interface

0fea571

node class convenience functions

e1a69bd

add encode/decode, bug fixes

1aefd24

more bug fixes, large portion of test coverage

4659356

Fix bug with flushing in memory tree when there are internally unlink…

2e3fe03

…ed nodes

misc design and documentation changes, no regressions demonstrated wi…

cd2334a

…th testing

100% test coverage for stores

6607848

remove need to constantly initialize with the root node for in memory…

cda896b

… store

fix bug with not setting root node after reserializing in set_pointer

0ac1d3a

simplify cache eviction calling and in memory flushing algorithm

a030c3a

remove unused method

adeff28

first 100% code coverage on core files

d956a5c

coderabbitai bot reviewed Jun 2, 2025

View reviewed changes

Faolain added 4 commits June 2, 2025 18:30

test: use local daemon then fallback to docker if present otherwise s…

52464ea

…kip integration tests

Merge branch 'refactor' into chore-add-import-sorting

8669f9d

lint(ruff): fix import order

e1d9334

Merge pull request #53 from dClimate/chore-add-import-sorting

40a6061

chore: add sort to ruff to standardize import order

Faolain changed the title ~~Refactor HAMT to use async and other performance optimizations~~ refactor: HAMT to use async and other performance optimizations Jun 2, 2025

coderabbitai bot reviewed Jun 2, 2025

View reviewed changes

Faolain added 3 commits June 3, 2025 01:55

fix: ensure that all tests can run depending on environment present d…

17c1ba0

…ynamically

fix: ensure that all tests can run depending on environment present d…

a3616ba

…ynamically

Merge pull request #61 from dClimate/tests/add-kubo-if-not

5c4b043

Tests/add kubo if not

coderabbitai bot reviewed Jun 3, 2025

View reviewed changes

tests/test_kubo_cas.py Outdated Show resolved Hide resolved

TheGreatAlgo and others added 2 commits June 3, 2025 14:23

fix: remove duplicated

14385ee

fix: readd parameter guard

7b95f1c

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

refactor: switch from old type alias to newer type syntax

87e3451

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

Faolain added 2 commits June 4, 2025 00:30

refactor: ensure run-checks will check all files not just those that …

1632dca

…are staged

chore: remove commented out test code

7498c28

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

Faolain added 2 commits June 4, 2025 01:45

docs: add AGENTS.md file

a300c24

docs: add AGENTS.md file

c4ba8ea

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

docs: add more setup information to agents.md

48800e1

coderabbitai bot reviewed Jun 4, 2025

View reviewed changes

TheGreatAlgo self-requested a review June 9, 2025 12:13

TheGreatAlgo approved these changes Jun 9, 2025

View reviewed changes

Faolain approved these changes Jun 9, 2025

View reviewed changes

Faolain merged commit 342e545 into main Jun 9, 2025
5 checks passed

This was referenced Jun 12, 2025

refactor: switch from aiohttp to httpx #71

Merged

feat: Sharding #68

Merged

	except Exception:
	except (ConnectionError, OSError, TimeoutError):
	pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")

-    import http.client
+# At the top of tests/test_kubo_cas.py, group all imports together
+import http.client
+from typing import Literal, cast
+import aiohttp
+# …other existing imports…

refactor: HAMT to use async and other performance optimizations #44

refactor: HAMT to use async and other performance optimizations #44

Uh oh!

Conversation

abidsikder commented May 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abidsikder commented May 19, 2025 •

edited by coderabbitai bot

Loading