-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: HAMT to use async and other performance optimizations #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ory tree, incomplete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
py_hamt/store.py (1)
23-23: Fix the type alias syntax to resolve parsing error.The current type alias syntax is causing a Pylint parsing error. Use the standard typing syntax instead.
- type CodecInput = Literal["raw", "dag-cbor"] + CodecInput = Literal["raw", "dag-cbor"]🧰 Tools
🪛 Pylint (3.3.7)
[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'
(E0001)
tests/test_zarr_ipfs.py (1)
125-138: Consider consolidating duplicate test logic.The test logic for verifying
list()functionality is duplicated across lines 125-133. While testing robustness is valuable, this duplication could be streamlined.The pattern of collecting keys from
zhs.list()multiple times is redundant. Consider consolidating or adding comments explaining why multiple identical assertions are necessary.py_hamt/zarr_hamt_store.py (1)
91-118:⚠️ Potential issueThe
byte_rangeparameter is ignored – prevents efficient partial reads.Zarr may request slices of a chunk via
byte_range. Returning the full buffer forces callers to load and allocate the entire blob, negating the benefit of the HAMT backing store for very large chunks.At minimum, honor the range when values are bytes:
- return prototype.buffer.from_bytes(val) + if byte_range: + start, stop = byte_range + val = val[start:stop] + return prototype.buffer.from_bytes(val)If partial reads are intentionally unsupported, expose
supports_partial_reads = False(mirroring howsupports_partial_writesis declared).🧰 Tools
🪛 Pylint (3.3.7)
[convention] 100-100: Line too long (138/100)
(C0301)
[convention] 110-110: Line too long (103/100)
(C0301)
[convention] 116-116: Line too long (102/100)
(C0301)
[warning] 95-95: Unused argument 'byte_range'
(W0613)
🧹 Nitpick comments (4)
py_hamt/store.py (1)
266-267: Consider combining nested context managers.The static analysis suggests combining the nested
async withstatements for cleaner code.- async with self._sem: # throttle gateway - async with self._loop_session().get(url) as resp: + async with self._sem, self._loop_session().get(url) as resp: # throttle gateway🧰 Tools
🪛 Ruff (0.11.9)
266-267: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
README.md (1)
50-50: Fix grammar: use "and" instead of "as well as" after "both".-We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`. +We use `pytest` with 100% code coverage, and with test inputs that are both handwritten and generated by `hypothesis`.🧰 Tools
🪛 LanguageTool
[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated byhypothesis. This allows ...(BOTH_AS_WELL_AS)
tests/test_zarr_ipfs.py (1)
61-181: Consider breaking down the test function for better maintainability.This test function contains 80+ statements and 23+ local variables, making it difficult to maintain and debug. Consider splitting it into focused test functions:
@pytest.mark.asyncio(loop_scope="session") async def test_zarr_write_to_ipfs(create_ipfs, random_zarr_dataset): """Test writing zarr dataset to IPFS and getting CID.""" # Write logic here @pytest.mark.asyncio(loop_scope="session") async def test_zarr_read_from_ipfs(create_ipfs, random_zarr_dataset): """Test reading zarr dataset from IPFS using CID.""" # Read logic here @pytest.mark.asyncio(loop_scope="session") async def test_zarr_store_capabilities(create_ipfs): """Test store capability flags and listing operations.""" # Store capability tests hereThis would improve test isolation, debugging, and maintenance.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 62-62: Missing function or method docstring
(C0116)
[refactor] 62-62: Too many local variables (23/15)
(R0914)
[warning] 64-64: Redefining name 'random_zarr_dataset' from outer scope (line 18)
(W0621)
[refactor] 115-115: Redundant comparison - zhs == zhs
(R0124)
[refactor] 62-62: Too many statements (80/50)
(R0915)
py_hamt/zarr_hamt_store.py (1)
153-153: Remove unnecessary parentheses.- if not (await self.exists(key)): + if not await self.exists(key):🧰 Tools
🪛 Pylint (3.3.7)
[convention] 153-153: Unnecessary parens after 'not' keyword
(C0325)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
README.md(2 hunks)py_hamt/store.py(1 hunks)py_hamt/zarr_hamt_store.py(1 hunks)pyproject.toml(2 hunks)run-checks.sh(1 hunks)tests/test_kubocas_auth.py(1 hunks)tests/test_zarr_ipfs.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
🧰 Additional context used
🧠 Learnings (1)
py_hamt/zarr_hamt_store.py (1)
Learnt from: abidsikder
PR: dClimate/py-hamt#44
File: py_hamt/zarr_hamt_store.py:180-192
Timestamp: 2025-05-30T15:30:59.357Z
Learning: In the ZarrHAMTStore.list_dir method in py_hamt/zarr_hamt_store.py, there's a potential for duplicate directory names to be yielded when multiple files share the same subdirectory. While this could be fixed with a set to track yielded names, this approach would consume significant memory for large datasets (potentially gigabytes for millions of keys like ERA5 zarr). The memory usage concern outweighs the fix unless it's confirmed that zarr actually calls list_dir on data variable directories in practice, rather than using assumed chunk keys from zarr.json metadata.
🧬 Code Graph Analysis (3)
tests/test_zarr_ipfs.py (4)
py_hamt/hamt.py (7)
HAMT(287-696)build(385-394)make_read_only(397-408)keys(664-678)get(592-599)enable_write(410-417)delete(552-590)py_hamt/store.py (2)
KuboCAS(74-269)InMemoryCAS(37-71)py_hamt/zarr_hamt_store.py (8)
ZarrHAMTStore(10-251)read_only(81-83)exists(127-133)list_prefix(189-193)get(91-117)set_if_not_exists(151-154)delete(167-177)list_dir(195-251)tests/testing_utils.py (1)
create_ipfs(64-118)
tests/test_kubocas_auth.py (1)
py_hamt/store.py (3)
KuboCAS(74-269)_loop_session(214-226)aclose(231-235)
py_hamt/zarr_hamt_store.py (2)
py_hamt/hamt.py (4)
HAMT(287-696)get(592-599)len(686-696)keys(664-678)py_hamt/encryption_hamt_store.py (1)
get(144-168)
🪛 Pylint (3.3.7)
tests/test_zarr_ipfs.py
[error] 7-7: Unable to import 'zarr'
(E0401)
[error] 8-8: Unable to import 'zarr.core.buffer'
(E0401)
[error] 9-9: Unable to import 'dag_cbor.ipld'
(E0401)
[error] 11-11: Unable to import 'py_hamt'
(E0401)
[error] 13-13: Unable to import 'py_hamt.zarr_hamt_store'
(E0401)
[error] 14-14: Unable to import 'py_hamt.store'
(E0401)
[convention] 62-62: Missing function or method docstring
(C0116)
[refactor] 62-62: Too many local variables (23/15)
(R0914)
[warning] 64-64: Redefining name 'random_zarr_dataset' from outer scope (line 18)
(W0621)
[refactor] 115-115: Redundant comparison - zhs == zhs
(R0124)
[refactor] 62-62: Too many statements (80/50)
(R0915)
[convention] 184-184: Missing function or method docstring
(C0116)
tests/test_kubocas_auth.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 2-2: Unable to import 'aiohttp'
(E0401)
[error] 5-5: Unable to import 'py_hamt.store'
(E0401)
[convention] 13-13: Missing function or method docstring
(C0116)
[warning] 16-16: Access to a protected member _loop_session of a client class
(W0212)
[convention] 25-25: Missing function or method docstring
(C0116)
[warning] 27-27: Access to a protected member _loop_session of a client class
(W0212)
[warning] 44-44: Access to a protected member _loop_session of a client class
(W0212)
[warning] 67-67: Access to a protected member _loop_session of a client class
(W0212)
[convention] 4-4: standard import "inspect" should be placed before third party imports "aiohttp", "pytest"
(C0411)
py_hamt/zarr_hamt_store.py
[convention] 20-20: Line too long (195/100)
(C0301)
[convention] 25-25: Line too long (249/100)
(C0301)
[convention] 58-58: Line too long (251/100)
(C0301)
[convention] 59-59: Line too long (106/100)
(C0301)
[convention] 62-62: Line too long (110/100)
(C0301)
[convention] 65-65: Line too long (434/100)
(C0301)
[convention] 74-74: Line too long (113/100)
(C0301)
[convention] 100-100: Line too long (138/100)
(C0301)
[convention] 110-110: Line too long (103/100)
(C0301)
[convention] 116-116: Line too long (102/100)
(C0301)
[convention] 153-153: Unnecessary parens after 'not' keyword
(C0325)
[convention] 171-171: Line too long (142/100)
(C0301)
[convention] 175-175: Line too long (142/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'zarr.abc.store'
(E0401)
[error] 4-4: Unable to import 'zarr.core.buffer'
(E0401)
[error] 5-5: Unable to import 'zarr.core.common'
(E0401)
[warning] 95-95: Unused argument 'byte_range'
(W0613)
py_hamt/store.py
[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'
(E0001)
🪛 Ruff (0.11.9)
py_hamt/store.py
266-267: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
🪛 LanguageTool
README.md
[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...
(BOTH_AS_WELL_AS)
🪛 markdownlint-cli2 (0.17.2)
README.md
54-54: Blank line inside blockquote
null
(MD028, no-blanks-blockquote)
🔇 Additional comments (17)
run-checks.sh (1)
9-10: LGTM! Good consolidation of code quality checks.Replacing separate
ruff checkandruff formatcommands withpre-commitprovides better consistency between local development and CI workflows.py_hamt/store.py (5)
1-9: LGTM! Proper async imports added.The imports correctly reflect the shift to async programming with
asyncioandaiohttpreplacing the synchronousrequestslibrary.
37-71: Excellent async conversion with robust error handling.The
InMemoryCASimplementation demonstrates good practices:
- Proper async method signatures
- Defensive type checking with clear error messages
- Good use of
castwith detailed explanation for Liskov substitution- Improved exception handling with chained exceptions
74-210: Well-designed session management for async operations.The
KuboCASclass successfully addresses the previous blockingrequestsissue by:
- Using
aiohttp.ClientSessionfor true async HTTP operations- Implementing per-event-loop session management to handle zarr's separate thread execution
- Providing flexible authentication via user-supplied sessions or constructor parameters
- Including comprehensive documentation with clear usage examples
246-269: Excellent async HTTP implementation with proper throttling.The
saveandloadmethods demonstrate best practices:
- Proper use of
aiohttpfor non-blocking operations- Semaphore-based concurrency throttling
- Correct form data handling for IPFS uploads
- Error handling with
raise_for_status()The implementation successfully resolves the previous issue of blocking requests in async methods.
🧰 Tools
🪛 Ruff (0.11.9)
266-267: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
231-241:✅ Verification successful
Verify async context manager test coverage.
The async context manager methods (
aclose,__aenter__,__aexit__) provide proper session lifecycle management. However, past review comments indicated these methods lacked test coverage.
🏁 Script executed:
#!/bin/bash # Description: Check if async context manager methods are now covered by tests # Expected: Find test coverage for these methods in the new test files echo "Searching for async context manager tests..." rg -A 5 -B 5 "async with.*KuboCAS|aclose|__aenter__|__aexit__" tests/ echo -e "\nSearching for session cleanup tests..." rg -A 5 -B 5 "_owns_session|_session_per_loop" tests/Length of output: 15874
Async context manager methods are now fully covered by tests
I’ve confirmed that the
__aenter__,__aexit__, andaclose()behaviors are exercised across multiple test modules:
- tests/test_kubocas_session.py
- tests/test_kubocas_auth.py
- tests/test_zarr_ipfs.py
- tests/test_zarr_ipfs_encrypted.py
- tests/test_kubo_cas.py
No further changes are needed here.
README.md (2)
33-41: Good addition of pre-commit setup instructions.The explicit commands for activating the virtual environment and installing pre-commit hooks align well with the consolidated approach in
run-checks.sh.
52-56: Good use of blockquotes for important notes.The highlighted notes about test coverage variability and GitHub Actions timeouts provide helpful context for developers and CI troubleshooting.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
54-54: Blank line inside blockquote
null(MD028, no-blanks-blockquote)
tests/test_kubocas_auth.py (3)
8-9: Smart helper function for handling async variations.The
_maybe_awaitutility elegantly handles both awaitable and non-awaitable returns, making the tests more robust.
13-31: Excellent test coverage for session management scenarios.These tests thoroughly verify the critical authentication and session lifecycle behaviors:
- User-supplied sessions are reused without modification
- External sessions are not closed by
aclose()- Internal sessions are properly configured and closed
This addresses the test coverage gaps mentioned in previous reviews.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 13-13: Missing function or method docstring
(C0116)
[warning] 16-16: Access to a protected member _loop_session of a client class
(W0212)
[convention] 25-25: Missing function or method docstring
(C0116)
[warning] 27-27: Access to a protected member _loop_session of a client class
(W0212)
34-73: Comprehensive authentication testing.The test suite covers both
aiohttp.BasicAuthscenarios:
- External sessions preserve authentication unchanged
- Internal sessions correctly apply constructor-provided auth
- Proper session ownership and cleanup behavior
This provides confidence in the authentication handling refactor.
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 44-44: Access to a protected member _loop_session of a client class
(W0212)
[warning] 67-67: Access to a protected member _loop_session of a client class
(W0212)
tests/test_zarr_ipfs.py (2)
71-89: Excellent async refactoring with proper resource management.The migration to async context management with
KuboCASand the newHAMT.build()pattern is well implemented. The resource cleanup via context manager and proper session handling addresses the session management challenges mentioned in the PR objectives.
183-192: Good addition of deduplication test.The new
test_list_dir_dedupfunction properly tests the directory listing deduplication functionality, which was a known concern from previous reviews.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 184-184: Missing function or method docstring
(C0116)
py_hamt/zarr_hamt_store.py (4)
10-51: Excellent documentation and API design.The class documentation is comprehensive with clear usage examples and important notes about read-only mode handling. The sample code snippet effectively demonstrates the write/read workflow.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 20-20: Line too long (195/100)
(C0301)
[convention] 25-25: Line too long (249/100)
(C0301)
53-78: Good validation in constructor with helpful documentation.The assertions ensuring HAMT read-only state matches the store state and
values_are_bytes=Trueprevent common configuration errors. The metadata cache explanation is valuable for understanding performance optimizations.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 58-58: Line too long (251/100)
(C0301)
[convention] 59-59: Line too long (106/100)
(C0301)
[convention] 62-62: Line too long (110/100)
(C0301)
[convention] 65-65: Line too long (434/100)
(C0301)
[convention] 74-74: Line too long (113/100)
(C0301)
195-251:list_dirdeduplication implemented with memory considerations.Based on retrieved learnings, this implementation correctly addresses duplicate directory name issues but may consume significant memory for large datasets (potentially gigabytes for millions of keys like ERA5 zarr).
The current approach using a
setfor deduplication is appropriate unless confirmed that zarr callslist_diron data variable directories in practice rather than using assumed chunk keys from zarr.json metadata. The thorough documentation explains the behavior clearly.
145-149: Efficient metadata cache management in write operations.The implementation correctly updates the metadata cache when writing to metadata files, maintaining consistency between the cache and HAMT storage.
…kip integration tests
chore: add sort to ruff to standardize import order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (9)
tests/test_kubo_cas.py (1)
46-109:⚠️ Potential issueTest still requires external IPFS daemon - use isolated fixture instead.
This test has the same brittleness issue identified in the previous review. It assumes a Kubo daemon is running on default ports
127.0.0.1:5001/8080, which will cause failures in CI or developer environments without a running daemon.Use the existing
create_ipfsfixture to ensure an isolated daemon:@pytest.mark.asyncio(loop_scope="session") @given(data=ipld_strategy()) @settings( deadline=1000, print_blob=True ) # Increased deadline, print_blob for debugging -async def test_kubo_default_urls( - global_client_session, data: IPLDKind -): # Inject the session fixture +async def test_kubo_default_urls( + create_ipfs, global_client_session, data: IPLDKind +): # Inject both fixtures """ - Tests KuboCAS using its default URLs and when None is passed for URLs, + Tests KuboCAS using isolated IPFS daemon URLs and when None is passed for URLs, leveraging a globally managed aiohttp.ClientSession. """ + rpc_base_url, gateway_base_url = create_ipfs # Test Case 1: KuboCAS instantiated without explicit URLs (should use its defaults) # We pass the managed global_client_session to it. - # KuboCAS itself is responsible for having default URLs if none are provided. - async with KuboCAS(session=global_client_session) as kubo_cas_default: + async with KuboCAS( + rpc_base_url=rpc_base_url, + gateway_base_url=gateway_base_url, + session=global_client_session + ) as kubo_cas_default:🧰 Tools
🪛 Pylint (3.3.7)
[convention] 62-62: Line too long (127/100)
(C0301)
[convention] 92-92: Line too long (128/100)
(C0301)
[warning] 82-82: Catching too general exception Exception
(W0718)
[warning] 107-107: Catching too general exception Exception
(W0718)
tests/testing_utils.py (1)
56-60:⚠️ Potential issueCritical TOCTOU race condition needs fixing.
This function has a Time-of-Check-Time-of-Use race where the port can be taken by another process between when the socket closes and when IPFS daemon starts.
Apply this fix to eliminate the race condition:
-def find_free_port() -> int: - with socket.socket() as s: - s.bind(("", 0)) # Bind to a free port provided by the host. - return int(s.getsockname()[1]) # Return the port number assigned. +def find_free_port() -> tuple[int, socket.socket]: + """Reserve a free TCP port and keep the socket open until the caller + explicitly closes it, eliminating the TOCTOU window.""" + s = socket.socket() + s.bind(("", 0)) + port = s.getsockname()[1] + return port, sYou'll need to update the callers to handle the returned socket and close it after the IPFS daemon starts.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 56-56: Missing function or method docstring
(C0116)
tests/test_hamt.py (1)
52-83: Remove misleading comment about lock acquisition.The comment about callers handling lock acquisition is incorrect for this context since the test operates in read-only mode where no locking is needed.
- # Make sure all pointers actually exist in the store, this should not raise any exceptions + # Verify that all pointers exist in the store (no locking needed in read-only mode) for k, _ in kvs: pointer: bytes = cast( bytes, await hamt.get_pointer(k) ) # we know that InMemoryCAS only returns bytes for its IDs await cas.load(pointer)🧰 Tools
🪛 Pylint (3.3.7)
[convention] 52-52: Line too long (170/100)
(C0301)
[convention] 55-55: Line too long (116/100)
(C0301)
py_hamt/encryption_hamt_store.py (2)
32-32: Documentation inconsistency: metadata caching is implemented.The documentation states "There is no metadata caching", but the implementation actually includes metadata caching (see lines 111, 156, 167, 179).
77-79: Sample code uses incorrect HAMT instantiation.The sample shows direct HAMT instantiation, but based on the async architecture, it should use
await HAMT.build().py_hamt/zarr_hamt_store.py (1)
92-119:byte_rangeparameter is ignored – prevents efficient partial readsZarr may request slices of a chunk via
byte_range. Returning the full buffer forces callers to load and allocate the entire blob, negating the benefit of the HAMT backing store for very large chunks.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 101-101: Line too long (138/100)
(C0301)
[convention] 111-111: Line too long (103/100)
(C0301)
[convention] 117-117: Line too long (102/100)
(C0301)
[warning] 96-96: Unused argument 'byte_range'
(W0613)
py_hamt/store.py (2)
23-23: Fix the type alias syntax.The type alias syntax on line 23 is causing a parsing error. Use the traditional type alias syntax for compatibility.
🧰 Tools
🪛 Pylint (3.3.7)
[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'
(E0001)
231-241: Add test coverage for async context manager methods.The async context manager functionality needs proper test coverage to ensure session cleanup works correctly.
py_hamt/hamt.py (1)
121-124: Fix exception handling with explicit chaining.This issue was flagged in past reviews and needs to be addressed. The bare
exceptclause masks the original error and hampers debugging.Apply the previously suggested fix:
- except: # noqa: E722 - raise Exception( - "Invalid dag-cbor encoded data from the store was attempted to be decoded" - ) + except Exception as err: + raise ValueError( + "Invalid DAG-CBOR data returned by CAS (unable to decode Node)" + ) from err🧰 Tools
🪛 Pylint (3.3.7)
[warning] 122-124: Consider explicitly re-raising using 'except Exception as exc' and 'raise Exception('Invalid dag-cbor encoded data from the store was attempted to be decoded') from exc'
(W0707)
[warning] 122-124: Raising too general exception: Exception
(W0719)
🧹 Nitpick comments (15)
py_hamt/__init__.py (1)
6-13: Consider adding a module docstring.Given this is a major version update with breaking changes, a module docstring explaining the new API structure would be helpful for users upgrading from v2.x.
+""" +py-hamt v3.0.0 - Async Hash Array Mapped Trie with Content-Addressed Storage + +Major changes in v3.0.0: +- Full async/await support throughout the library +- Content-addressed storage interface +- Performance optimizations for large datasets +- Breaking changes from v2.x synchronous API +""" + from .encryption_hamt_store import SimpleEncryptedZarrHAMTStoretests/conftest.py (1)
1-1: Consider adding a module docstring.A docstring explaining the purpose of this test configuration module would be helpful.
+""" +Test configuration and shared fixtures for py-hamt test suite. + +Provides async HTTP session management and re-exports testing utilities +for backward compatibility with existing test imports. +""" + import aiohttp🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 1-1: Unable to import 'aiohttp'
(E0401)
tests/performance_tests.py (1)
16-40: Strong performance test implementation with minor style improvements needed.The test effectively validates the performance goals mentioned in the PR objectives. The async design using
asyncio.gathershould demonstrate the concurrency benefits of the new implementation.However, please address these style issues:
- LARGE_KV_SET_SIZE: int = 1_000_000 + large_kv_set_size: int = 1_000_000 - """This test is meant for finding whether the HAMT performance scales linearly with increasing set size, an issue with HAMT v2. - Feel free to tune and run the LARGE_KV_SET_SIZE variable as needed for gathering the different timepoints. + """Test HAMT performance scaling with large datasets. + + Validates that HAMT performance scales linearly with increasing set size, + addressing performance issues from HAMT v2. Tune large_kv_set_size as needed. """ - *[hamt.set(str(k_int), k_int) for k_int in range(LARGE_KV_SET_SIZE)] + *[hamt.set(str(k_int), k_int) for k_int in range(large_kv_set_size)] - == LARGE_KV_SET_SIZE + == large_kv_set_size - for k_int in range(LARGE_KV_SET_SIZE): + for k_int in range(large_kv_set_size):🧰 Tools
🪛 Pylint (3.3.7)
[convention] 18-18: Line too long (131/100)
(C0301)
[convention] 19-19: Line too long (110/100)
(C0301)
[convention] 21-21: Variable name "LARGE_KV_SET_SIZE" doesn't conform to snake_case naming style
(C0103)
tests/test_zarr_ipfs_encrypted.py (1)
75-118: Good test structure but consider reducing complexity.The test correctly sets up encrypted write operations and validates basic store properties. However, the function is quite complex.
Consider extracting setup logic into helper functions:
+async def setup_encrypted_store(kubo_cas, encryption_key, header, read_only=False): + """Helper to create encrypted store.""" + hamt = await HAMT.build(cas=kubo_cas, values_are_bytes=True) + if read_only: + # Assuming we have a root_cid from write phase + pass + store = SimpleEncryptedZarrHAMTStore(hamt, read_only, encryption_key, header) + return hamt, store async def test_encrypted_write_read( create_ipfs, random_zarr_dataset: xr.Dataset, ): # ... existing code ... - hamt_write = await HAMT.build(cas=kubo_cas, values_are_bytes=True) - ezhs_write = SimpleEncryptedZarrHAMTStore( - hamt_write, False, correct_key, header - ) + hamt_write, ezhs_write = await setup_encrypted_store( + kubo_cas, correct_key, header + )🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 78-78: Too many local variables (24/15)
(R0914)
[warning] 80-80: Redefining name 'random_zarr_dataset' from outer scope (line 16)
(W0621)
[refactor] 108-108: Redundant comparison - ezhs_write == ezhs_write
(R0124)
[refactor] 78-78: Too many statements (64/50)
(R0915)
py_hamt/encryption_hamt_store.py (1)
174-176: Use a more specific exception type.Consider using a more specific exception instead of the generic
Exceptionfor better error handling:- if self.read_only: - raise Exception("Cannot write to a read-only store.") + if self.read_only: + raise ValueError("Cannot write to a read-only store.")Alternatively, you could define a custom
ReadOnlyErrorexception for even clearer semantics.🧰 Tools
🪛 Pylint (3.3.7)
[warning] 175-175: Raising too general exception: Exception
(W0719)
py_hamt/store.py (1)
194-227: Clever solution to the event loop session management issue.The per-event-loop session management elegantly solves the problem mentioned in the PR comments about zarr running its own event loop in a separate thread. This prevents the "session used in different event loops" errors while maintaining connection pooling benefits.
py_hamt/hamt.py (9)
74-76: Use enumerate for better readability.The static analysis correctly suggests using
enumeratehere for better Pythonic code.- def iter_link_indices(self) -> Iterator[int]: - for i in range(len(self.data)): - if isinstance(self.data[i], list): - yield i + def iter_link_indices(self) -> Iterator[int]: + for i, item in enumerate(self.data): + if isinstance(item, list): + yield i🧰 Tools
🪛 Pylint (3.3.7)
[convention] 74-76: Consider using enumerate instead of iterating with range and len
(C0200)
105-108: Use enumerate for better readability.Similar to the previous suggestion, using
enumeratewould be more Pythonic here.- def is_empty(self) -> bool: - for i in range(len(self.data)): - item: IPLDKind = self.data[i] - if (isinstance(item, dict) and len(item) > 0) or isinstance(item, list): - return False + def is_empty(self) -> bool: + for item in self.data: + if (isinstance(item, dict) and len(item) > 0) or isinstance(item, list): + return False🧰 Tools
🪛 Pylint (3.3.7)
[convention] 105-108: Consider using enumerate instead of iterating with range and len
(C0200)
133-138: Add meaningful docstrings to abstract methods.The abstract methods have empty docstrings. These should describe the contract and behavior expected from implementations.
@abstractmethod async def save(self, original_id: IPLDKind, node: Node) -> IPLDKind: - """""" + """Save a node to storage and return its ID. If original_id is not None, this may be an update.""" @abstractmethod async def load(self, id: IPLDKind) -> Node: - """""" + """Load a node from storage by its ID."""🧰 Tools
🪛 Pylint (3.3.7)
[convention] 133-133: Empty method docstring
(C0112)
[convention] 137-137: Empty method docstring
(C0112)
[warning] 137-137: Redefining built-in 'id'
(W0622)
156-156: Use more specific exception types.Using generic
Exceptionmakes error handling more difficult for callers.- raise Exception("Node was attempted to be written to the read cache") + raise RuntimeError("Cannot write to read-only cache")🧰 Tools
🪛 Pylint (3.3.7)
[warning] 156-156: Raising too general exception: Exception
(W0719)
171-173: Use .items() for dictionary iteration.Static analysis correctly suggests using
.items()when you need both key and value.def size(self) -> int: total: int = 0 - for k in self.cache: - node: Node = self.cache[k] + for k, node in self.cache.items(): total += len(node.serialize()) return total🧰 Tools
🪛 Pylint (3.3.7)
[convention] 171-173: Consider iterating with .items()
(C0206)
316-324: Consider parameter object pattern for constructor.The constructor has 7 parameters, which exceeds typical limits. Consider using a configuration object or builder pattern.
@dataclass class HAMTConfig: hash_fn: Callable[[bytes], bytes] = blake3_hashfn root_node_id: IPLDKind | None = None read_only: bool = False max_bucket_size: int = 4 values_are_bytes: bool = False def __init__(self, cas: ContentAddressedStore, config: HAMTConfig = None): config = config or HAMTConfig() # ... use config attributes🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 316-316: Too many arguments (7/5)
(R0913)
[refactor] 316-316: Too many positional arguments (7/5)
(R0917)
492-492: Use more specific exception types.Generic
Exceptionshould be replaced with more specific types for better error handling.if self.read_only: - raise Exception("Cannot call set on a read only HAMT") + raise RuntimeError("Cannot modify read-only HAMT") if self.read_only: - raise Exception("Cannot call delete on a read only HAMT") + raise RuntimeError("Cannot modify read-only HAMT")Also applies to: 557-557
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 492-492: Raising too general exception: Exception
(W0719)
572-582: Remove unnecessary elif after break.Static analysis correctly identifies that the
elifis unnecessary after abreakstatement.if isinstance(item, dict): bucket = item if key in bucket: del bucket[key] created_change = True # Break out since whether or not the key is in the bucket, it should have been here so either now reserialize or raise a KeyError break - elif isinstance(item, list): + if isinstance(item, list): link: IPLDKind = item[0] next_node: Node = await self.node_store.load(link) node_stack.append((link, next_node))🧰 Tools
🪛 Pylint (3.3.7)
[convention] 577-577: Line too long (149/100)
(C0301)
[refactor] 572-582: Unnecessary "elif" after "break", remove the leading "el" from "elif"
(R1723)
596-599: Remove unnecessary else after return.Static analysis correctly identifies that the
elseis unnecessary after areturnstatement.data: bytes = await self.cas.load(pointer) if self.values_are_bytes: return data - else: - return dag_cbor.decode(data) + return dag_cbor.decode(data)🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 596-599: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
py_hamt/__init__.py(1 hunks)py_hamt/encryption_hamt_store.py(1 hunks)py_hamt/hamt.py(1 hunks)py_hamt/store.py(1 hunks)py_hamt/zarr_hamt_store.py(1 hunks)pyproject.toml(2 hunks)tests/conftest.py(1 hunks)tests/performance_tests.py(1 hunks)tests/test_branch_anchors.py(1 hunks)tests/test_extract_bits.py(1 hunks)tests/test_hamt.py(1 hunks)tests/test_kubo_cas.py(1 hunks)tests/test_kubocas_auth.py(1 hunks)tests/test_kubocas_session.py(1 hunks)tests/test_zarr_ipfs.py(2 hunks)tests/test_zarr_ipfs_encrypted.py(1 hunks)tests/testing_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
🧰 Additional context used
🧠 Learnings (1)
py_hamt/zarr_hamt_store.py (1)
Learnt from: abidsikder
PR: dClimate/py-hamt#44
File: py_hamt/zarr_hamt_store.py:180-192
Timestamp: 2025-05-30T15:30:59.357Z
Learning: In the ZarrHAMTStore.list_dir method in py_hamt/zarr_hamt_store.py, there's a potential for duplicate directory names to be yielded when multiple files share the same subdirectory. While this could be fixed with a set to track yielded names, this approach would consume significant memory for large datasets (potentially gigabytes for millions of keys like ERA5 zarr). The memory usage concern outweighs the fix unless it's confirmed that zarr actually calls list_dir on data variable directories in practice, rather than using assumed chunk keys from zarr.json metadata.
🧬 Code Graph Analysis (5)
tests/conftest.py (1)
tests/testing_utils.py (2)
create_ipfs(63-117)ipld_strategy(32-43)
tests/test_branch_anchors.py (2)
py_hamt/hamt.py (7)
HAMT(287-696)blake3_hashfn(51-59)keys(664-678)cache_size(419-432)cache_vacate(434-446)make_read_only(397-408)get(592-599)py_hamt/store.py (1)
InMemoryCAS(37-71)
tests/test_extract_bits.py (1)
py_hamt/hamt.py (1)
extract_bits(21-45)
tests/test_kubo_cas.py (2)
py_hamt/store.py (7)
InMemoryCAS(37-71)load(33-34)load(52-71)load(262-269)save(26-30)save(47-50)save(246-260)tests/conftest.py (1)
global_client_session(9-13)
tests/testing_utils.py (1)
py_hamt/store.py (3)
load(33-34)load(52-71)load(262-269)
🪛 Pylint (3.3.7)
tests/test_zarr_ipfs_encrypted.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'numpy'
(E0401)
[error] 4-4: Unable to import 'pandas'
(E0401)
[error] 6-6: Unable to import 'xarray'
(E0401)
[error] 7-7: Unable to import 'zarr'
(E0401)
[error] 8-8: Unable to import 'zarr.core.buffer'
(E0401)
[error] 9-9: Unable to import 'Crypto.Random'
(E0401)
[error] 11-11: Unable to import 'py_hamt'
(E0401)
[error] 12-12: Unable to import 'py_hamt.zarr_hamt_store'
(E0401)
[refactor] 78-78: Too many local variables (24/15)
(R0914)
[warning] 80-80: Redefining name 'random_zarr_dataset' from outer scope (line 16)
(W0621)
[refactor] 108-108: Redundant comparison - ezhs_write == ezhs_write
(R0124)
[refactor] 179-179: Redundant comparison - ezhs_read_ok == ezhs_read_ok
(R0124)
[refactor] 78-78: Too many statements (64/50)
(R0915)
tests/test_zarr_ipfs.py
[error] 6-6: Unable to import 'xarray'
(E0401)
[error] 7-7: Unable to import 'zarr'
(E0401)
[error] 8-8: Unable to import 'zarr.core.buffer'
(E0401)
[error] 9-9: Unable to import 'dag_cbor.ipld'
(E0401)
[error] 11-11: Unable to import 'py_hamt'
(E0401)
[error] 12-12: Unable to import 'py_hamt.store'
(E0401)
[error] 13-13: Unable to import 'py_hamt.zarr_hamt_store'
(E0401)
[convention] 61-61: Missing function or method docstring
(C0116)
[refactor] 61-61: Too many local variables (23/15)
(R0914)
[warning] 63-63: Redefining name 'random_zarr_dataset' from outer scope (line 17)
(W0621)
[refactor] 114-114: Redundant comparison - zhs == zhs
(R0124)
[refactor] 61-61: Too many statements (80/50)
(R0915)
[convention] 183-183: Missing function or method docstring
(C0116)
tests/test_hamt.py
[convention] 25-25: Line too long (120/100)
(C0301)
[convention] 52-52: Line too long (170/100)
(C0301)
[convention] 55-55: Line too long (116/100)
(C0301)
[convention] 95-95: Line too long (103/100)
(C0301)
[convention] 96-96: Line too long (124/100)
(C0301)
[convention] 164-164: Line too long (116/100)
(C0301)
[convention] 204-204: Line too long (128/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 6-6: Unable to import 'dag_cbor'
(E0401)
[error] 7-7: Unable to import 'hypothesis'
(E0401)
[error] 8-8: Unable to import 'hypothesis'
(E0401)
[error] 9-9: Unable to import 'multiformats'
(E0401)
[error] 12-12: Unable to import 'py_hamt.hamt'
(E0401)
[error] 13-13: Unable to import 'py_hamt.store'
(E0401)
[convention] 21-21: Missing function or method docstring
(C0116)
[refactor] 21-21: Too many local variables (21/15)
(R0914)
[refactor] 21-21: Too many statements (68/50)
(R0915)
[convention] 167-167: Missing function or method docstring
(C0116)
[convention] 181-181: Missing function or method docstring
(C0116)
[convention] 190-190: Missing function or method docstring
(C0116)
[convention] 206-206: Missing function or method docstring
(C0116)
py_hamt/zarr_hamt_store.py
[convention] 21-21: Line too long (195/100)
(C0301)
[convention] 26-26: Line too long (249/100)
(C0301)
[convention] 59-59: Line too long (251/100)
(C0301)
[convention] 60-60: Line too long (106/100)
(C0301)
[convention] 63-63: Line too long (110/100)
(C0301)
[convention] 66-66: Line too long (434/100)
(C0301)
[convention] 75-75: Line too long (113/100)
(C0301)
[convention] 101-101: Line too long (138/100)
(C0301)
[convention] 111-111: Line too long (103/100)
(C0301)
[convention] 117-117: Line too long (102/100)
(C0301)
[convention] 154-154: Unnecessary parens after 'not' keyword
(C0325)
[convention] 172-172: Line too long (142/100)
(C0301)
[convention] 176-176: Line too long (142/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'zarr.abc.store'
(E0401)
[error] 5-5: Unable to import 'zarr.core.buffer'
(E0401)
[error] 6-6: Unable to import 'zarr.core.common'
(E0401)
[warning] 96-96: Unused argument 'byte_range'
(W0613)
py_hamt/store.py
[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'
(E0001)
py_hamt/hamt.py
[convention] 23-23: Line too long (131/100)
(C0301)
[convention] 31-31: Line too long (106/100)
(C0301)
[convention] 56-56: Line too long (117/100)
(C0301)
[convention] 91-91: Line too long (151/100)
(C0301)
[convention] 142-142: Line too long (189/100)
(C0301)
[convention] 183-183: Line too long (101/100)
(C0301)
[convention] 200-200: Line too long (137/100)
(C0301)
[convention] 201-201: Line too long (191/100)
(C0301)
[convention] 213-213: Line too long (106/100)
(C0301)
[convention] 214-214: Line too long (210/100)
(C0301)
[convention] 218-218: Line too long (108/100)
(C0301)
[convention] 241-241: Line too long (121/100)
(C0301)
[convention] 255-255: Line too long (141/100)
(C0301)
[convention] 256-256: Line too long (190/100)
(C0301)
[convention] 261-261: Line too long (243/100)
(C0301)
[convention] 271-271: Line too long (122/100)
(C0301)
[convention] 289-289: Line too long (148/100)
(C0301)
[convention] 293-293: Line too long (119/100)
(C0301)
[convention] 298-298: Line too long (145/100)
(C0301)
[convention] 300-300: Line too long (158/100)
(C0301)
[convention] 302-302: Line too long (255/100)
(C0301)
[convention] 326-326: Line too long (233/100)
(C0301)
[convention] 336-336: Line too long (125/100)
(C0301)
[convention] 338-338: Line too long (179/100)
(C0301)
[convention] 340-340: Line too long (208/100)
(C0301)
[convention] 347-347: Line too long (177/100)
(C0301)
[convention] 349-349: Line too long (144/100)
(C0301)
[convention] 356-356: Line too long (193/100)
(C0301)
[convention] 358-358: Line too long (459/100)
(C0301)
[convention] 375-375: Line too long (382/100)
(C0301)
[convention] 387-387: Line too long (272/100)
(C0301)
[convention] 389-389: Line too long (255/100)
(C0301)
[convention] 396-396: Line too long (168/100)
(C0301)
[convention] 399-399: Line too long (155/100)
(C0301)
[convention] 401-401: Line too long (155/100)
(C0301)
[convention] 412-412: Line too long (252/100)
(C0301)
[convention] 415-415: Line too long (109/100)
(C0301)
[convention] 423-423: Line too long (134/100)
(C0301)
[convention] 452-452: Line too long (151/100)
(C0301)
[convention] 453-453: Line too long (153/100)
(C0301)
[convention] 454-454: Line too long (137/100)
(C0301)
[convention] 458-458: Line too long (116/100)
(C0301)
[convention] 462-462: Line too long (109/100)
(C0301)
[convention] 467-467: Line too long (193/100)
(C0301)
[convention] 483-483: Line too long (140/100)
(C0301)
[convention] 498-498: Line too long (103/100)
(C0301)
[convention] 531-531: Line too long (144/100)
(C0301)
[convention] 537-537: Line too long (137/100)
(C0301)
[convention] 542-542: Line too long (104/100)
(C0301)
[convention] 577-577: Line too long (149/100)
(C0301)
[convention] 605-605: Line too long (285/100)
(C0301)
[convention] 624-624: Line too long (141/100)
(C0301)
[convention] 668-668: Line too long (146/100)
(C0301)
[convention] 670-670: Line too long (101/100)
(C0301)
[convention] 690-690: Line too long (264/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 14-14: Unable to import 'dag_cbor'
(E0401)
[error] 15-15: Unable to import 'dag_cbor.ipld'
(E0401)
[error] 16-16: Unable to import 'multiformats'
(E0401)
[error] 18-18: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'
(E0001)
[convention] 62-62: Missing class docstring
(C0115)
[convention] 68-68: Missing function or method docstring
(C0116)
[convention] 73-73: Missing function or method docstring
(C0116)
[convention] 74-76: Consider using enumerate instead of iterating with range and len
(C0200)
[convention] 78-78: Missing function or method docstring
(C0116)
[convention] 83-83: Missing function or method docstring
(C0116)
[convention] 87-87: Missing function or method docstring
(C0116)
[convention] 92-92: Missing function or method docstring
(C0116)
[convention] 104-104: Missing function or method docstring
(C0116)
[convention] 105-108: Consider using enumerate instead of iterating with range and len
(C0200)
[convention] 111-111: Missing function or method docstring
(C0116)
[convention] 115-115: Missing function or method docstring
(C0116)
[warning] 122-124: Consider explicitly re-raising using 'except Exception as exc' and 'raise Exception('Invalid dag-cbor encoded data from the store was attempted to be decoded') from exc'
(W0707)
[warning] 122-124: Raising too general exception: Exception
(W0719)
[convention] 131-131: Missing class docstring
(C0115)
[convention] 133-133: Empty method docstring
(C0112)
[convention] 137-137: Empty method docstring
(C0112)
[warning] 137-137: Redefining built-in 'id'
(W0622)
[convention] 150-150: Missing class docstring
(C0115)
[warning] 156-156: Raising too general exception: Exception
(W0719)
[warning] 158-158: Redefining built-in 'id'
(W0622)
[convention] 171-173: Consider iterating with .items()
(C0206)
[convention] 180-180: Missing class docstring
(C0115)
[convention] 186-186: Missing function or method docstring
(C0116)
[warning] 186-186: Redefining built-in 'id'
(W0622)
[convention] 189-189: Missing function or method docstring
(C0116)
[convention] 203-209: Consider iterating with .items()
(C0206)
[convention] 260-260: Missing function or method docstring
(C0116)
[warning] 275-275: Redefining built-in 'id'
(W0622)
[refactor] 287-287: Too many instance attributes (8/7)
(R0902)
[refactor] 316-316: Too many arguments (7/5)
(R0913)
[refactor] 316-316: Too many positional arguments (7/5)
(R0917)
[warning] 492-492: Raising too general exception: Exception
(W0719)
[refactor] 505-505: Too many local variables (20/15)
(R0914)
[warning] 557-557: Raising too general exception: Exception
(W0719)
[refactor] 572-582: Unnecessary "elif" after "break", remove the leading "el" from "elif"
(R1723)
[refactor] 596-599: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
py_hamt/__init__.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'
(E0001)
py_hamt/encryption_hamt_store.py
[convention] 21-21: Line too long (241/100)
(C0301)
[convention] 164-164: Line too long (103/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'zarr.abc.store'
(E0401)
[error] 4-4: Unable to import 'zarr.core.buffer'
(E0401)
[error] 5-5: Unable to import 'Crypto.Cipher'
(E0401)
[error] 6-6: Unable to import 'Crypto.Random'
(E0401)
[warning] 12-12: Method 'get_partial_values' is abstract in class 'ZarrHAMTStore' but is not overridden in child class 'SimpleEncryptedZarrHAMTStore'
(W0223)
[warning] 12-12: Method 'set_partial_values' is abstract in class 'ZarrHAMTStore' but is not overridden in child class 'SimpleEncryptedZarrHAMTStore'
(W0223)
[warning] 175-175: Raising too general exception: Exception
(W0719)
tests/conftest.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 1-1: Unable to import 'aiohttp'
(E0401)
[warning] 5-5: Unused create_ipfs imported from testing_utils
(W0611)
[warning] 5-5: Unused ipld_strategy imported from testing_utils
(W0611)
tests/performance_tests.py
[convention] 2-2: Line too long (192/100)
(C0301)
[convention] 18-18: Line too long (131/100)
(C0301)
[convention] 19-19: Line too long (110/100)
(C0301)
[error] 12-12: Unable to import 'py_hamt'
(E0401)
[error] 13-13: Unable to import 'py_hamt.store'
(E0401)
[convention] 21-21: Variable name "LARGE_KV_SET_SIZE" doesn't conform to snake_case naming style
(C0103)
tests/test_branch_anchors.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'py_hamt'
(E0401)
[error] 5-5: Unable to import 'py_hamt.store'
(E0401)
[convention] 25-25: Missing function or method docstring
(C0116)
tests/test_extract_bits.py
[error] 3-3: Unable to import 'py_hamt.hamt'
(E0401)
[convention] 7-7: Missing function or method docstring
(C0116)
tests/test_kubo_cas.py
[convention] 62-62: Line too long (127/100)
(C0301)
[convention] 92-92: Line too long (128/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'aiohttp'
(E0401)
[error] 4-4: Unable to import 'dag_cbor'
(E0401)
[error] 6-6: Unable to import 'dag_cbor'
(E0401)
[error] 7-7: Unable to import 'hypothesis'
(E0401)
[error] 10-10: Unable to import 'py_hamt'
(E0401)
[error] 11-11: Unable to import 'py_hamt.store'
(E0401)
[convention] 16-16: Missing function or method docstring
(C0116)
[warning] 82-82: Catching too general exception Exception
(W0718)
[warning] 107-107: Catching too general exception Exception
(W0718)
[convention] 139-139: Missing function or method docstring
(C0116)
[warning] 139-139: Redefining name 'create_ipfs' from outer scope (line 8)
(W0621)
[warning] 8-8: Unused create_ipfs imported from testing_utils
(W0611)
tests/test_kubocas_auth.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'aiohttp'
(E0401)
[error] 7-7: Unable to import 'py_hamt.store'
(E0401)
[convention] 15-15: Missing function or method docstring
(C0116)
[warning] 18-18: Access to a protected member _loop_session of a client class
(W0212)
[convention] 27-27: Missing function or method docstring
(C0116)
[warning] 29-29: Access to a protected member _loop_session of a client class
(W0212)
[warning] 46-46: Access to a protected member _loop_session of a client class
(W0212)
[warning] 69-69: Access to a protected member _loop_session of a client class
(W0212)
tests/test_kubocas_session.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 7-7: Unable to import 'py_hamt.store'
(E0401)
[warning] 26-26: Access to a protected member _loop_session of a client class
(W0212)
[warning] 27-27: Access to a protected member _loop_session of a client class
(W0212)
[warning] 49-49: Access to a protected member _loop_session of a client class
(W0212)
[warning] 68-68: Access to a protected member _loop_session of a client class
(W0212)
[warning] 90-90: Access to a protected member _loop_session of a client class
(W0212)
[warning] 95-95: Access to a protected member _session_per_loop of a client class
(W0212)
[warning] 96-96: Access to a protected member _session_per_loop of a client class
(W0212)
[warning] 97-97: Access to a protected member _session_per_loop of a client class
(W0212)
tests/testing_utils.py
[convention] 52-52: Line too long (112/100)
(C0301)
[convention] 74-74: Line too long (141/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 12-12: Unable to import 'hypothesis'
(E0401)
[error] 13-13: Unable to import 'hypothesis.strategies'
(E0401)
[error] 14-14: Unable to import 'multiformats'
(E0401)
[convention] 32-32: Missing function or method docstring
(C0116)
[convention] 56-56: Missing function or method docstring
(C0116)
[convention] 63-63: Missing function or method docstring
(C0116)
[warning] 81-81: Using open without explicitly specifying an encoding
(W1514)
[warning] 98-98: Using open without explicitly specifying an encoding
(W1514)
[refactor] 105-107: Consider using 'with' for resource-allocating operations
(R1732)
🔇 Additional comments (47)
tests/test_extract_bits.py (2)
3-3: Good import organization.Moving the import statement below the standard library imports follows Python import conventions.
🧰 Tools
🪛 Pylint (3.3.7)
[error] 3-3: Unable to import 'py_hamt.hamt'
(E0401)
7-7: Excellent addition of return type annotation.Adding the
-> Nonereturn type annotation improves type safety and code documentation, which is especially valuable in a major refactor introducing async patterns.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 7-7: Missing function or method docstring
(C0116)
py_hamt/__init__.py (1)
1-4: Major API refactor looks well-organized.The replacement of synchronous interfaces (
Store,IPFSStore) with async content-addressed storage interfaces (ContentAddressedStore,KuboCAS) aligns perfectly with the stated PR objectives of adopting async programming throughout the library.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Cannot import 'store' due to 'invalid syntax (py_hamt.store, line 23)'
(E0001)
tests/conftest.py (2)
8-13: Excellent async fixture implementation.The session-scoped
aiohttp.ClientSessionfixture is well-designed for the async refactor. Using an async context manager ensures proper cleanup, and session scope is appropriate for sharing across tests while avoiding session management issues mentioned in the PR discussion.
4-5: Good backward compatibility approach.Re-exporting the testing utilities maintains compatibility with existing test imports while centralizing test configuration.
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 5-5: Unused create_ipfs imported from testing_utils
(W0611)
[warning] 5-5: Unused ipld_strategy imported from testing_utils
(W0611)
tests/performance_tests.py (2)
1-5: Excellent documentation and purpose clarity.The module docstring clearly explains that this is for manual performance testing, which helps prevent accidentally including long-running tests in CI.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 2-2: Line too long (192/100)
(C0301)
26-29:✅ Verification successful
Verify concurrent write safety.
The test uses
asyncio.gatherto perform concurrent writes, which should demonstrate the performance benefits. However, ensure that the HAMT implementation properly handles concurrent writes to avoid race conditions.
🏁 Script executed:
#!/bin/bash # Description: Check if HAMT set operations are properly synchronized for concurrent writes # Expected: Find synchronization mechanisms or documentation about thread safety # Search for async locks, semaphores, or synchronization mechanisms in HAMT implementation rg -A 5 -B 5 "asyncio\.(Lock|Semaphore|Event|Condition)" # Look for thread safety documentation or comments rg -i "thread.?safe|concurrent|race.?condition|synchroniz" --type pyLength of output: 2753
Concurrent writes are already synchronized via asyncio.Lock
The HAMT’ssetmethod acquires anasyncio.Lock(seepy_hamt/hamt.py), ensuring that multiple tasks in the same event loop cannot corrupt internal state. Since your performance test runs all writes in one thread/event loop,await asyncio.gather(...)is safe and no additional synchronization is needed.tests/test_branch_anchors.py (2)
8-21: LGTM! Well-designed helper function for collision testing.The logic correctly finds decimal string keys that share the same first byte of their Blake3 hash, which is perfect for testing HAMT branching behavior with
max_bucket_size=1.
24-52: Excellent comprehensive test coverage for cache management.This test effectively covers the cache management features in both write and read-only modes:
- Cache operations during writes (lines 32-37)
- Cache vacate in write mode (line 42)
- Transition to read-only mode and cache operations (lines 48-51)
The test structure ensures all code paths in the cache management are exercised.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 25-25: Missing function or method docstring
(C0116)
tests/test_kubocas_session.py (3)
10-14: Useful helper for handling mixed async/sync returns.This helper elegantly handles the case where
_loop_session()might return either an awaitable or a direct session object, providing flexibility in the implementation.
56-79: Well-designed cross-loop testing mechanism.The threading approach to test session isolation across different asyncio event loops is appropriate and thorough. The use of
Eventfor synchronization ensures reliable coordination between the main thread and worker thread.🧰 Tools
🪛 Pylint (3.3.7)
[warning] 68-68: Access to a protected member _loop_session of a client class
(W0212)
82-103: Comprehensive session isolation verification.This test properly verifies that different event loops receive distinct sessions and that the internal
_session_per_loopmapping tracks them correctly. The cleanup logic ensures proper resource management.🧰 Tools
🪛 Pylint (3.3.7)
[warning] 90-90: Access to a protected member _loop_session of a client class
(W0212)
[warning] 95-95: Access to a protected member _session_per_loop of a client class
(W0212)
[warning] 96-96: Access to a protected member _session_per_loop of a client class
(W0212)
[warning] 97-97: Access to a protected member _session_per_loop of a client class
(W0212)
tests/test_kubocas_auth.py (3)
14-24: Good coverage of external session header preservation.This test correctly verifies that KuboCAS preserves custom headers from externally-provided sessions and doesn't close them inappropriately.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 15-15: Missing function or method docstring
(C0116)
[warning] 18-18: Access to a protected member _loop_session of a client class
(W0212)
35-57: Thorough authentication testing with clear documentation.The docstring clearly explains the expected behavior, and the test correctly verifies that external sessions with authentication are preserved unchanged and not closed by KuboCAS.
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 46-46: Access to a protected member _loop_session of a client class
(W0212)
59-76: Comprehensive internal session authentication testing.This test properly verifies that KuboCAS creates internal sessions with the provided authentication and correctly manages their lifecycle by closing them on
aclose().🧰 Tools
🪛 Pylint (3.3.7)
[warning] 69-69: Access to a protected member _loop_session of a client class
(W0212)
tests/test_kubo_cas.py (3)
15-20: Good coverage of InMemoryCAS error condition.This test correctly verifies that
InMemoryCAS.load()raisesKeyErrorfor non-existent keys (empty bytes).🧰 Tools
🪛 Pylint (3.3.7)
[convention] 16-16: Missing function or method docstring
(C0116)
22-44: Excellent comprehensive type safety testing.This test thoroughly verifies that
InMemoryCAS.load()properly validates input types and raises appropriateTypeErrormessages for invalid key types. The comprehensive list of invalid types ensures robust error handling.
134-157: Properly isolated test implementation.This test correctly uses the
create_ipfsfixture to ensure an isolated IPFS daemon, avoiding dependency on external services. The property-based testing with different codecs provides good coverage.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 139-139: Missing function or method docstring
(C0116)
[warning] 139-139: Redefining name 'create_ipfs' from outer scope (line 8)
(W0621)
tests/testing_utils.py (4)
1-15: LGTM! Clean imports and setup.The imports are well-organized and appropriate for the testing utilities. The module provides good separation of concerns for IPFS-related testing.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 12-12: Unable to import 'hypothesis'
(E0401)
[error] 13-13: Unable to import 'hypothesis.strategies'
(E0401)
[error] 14-14: Unable to import 'multiformats'
(E0401)
17-29: Solid CID generation strategy.The CID strategy correctly generates CIDv1 with sha2-256 hash and dag-cbor codec. The hardcoded hash prefix (0x1220) is appropriate for 32-byte SHA-256 hashes.
32-43: Comprehensive IPLD strategy covers all primitive types.The strategy includes all IPLD scalar types (null, bool, int, float, text, binary) plus CIDs. The integer range matches IPLD specifications and float constraints properly exclude infinity/NaN.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 32-32: Missing function or method docstring
(C0116)
46-53: Good unique key constraint for HAMT testing.The unique_by parameter ensures no duplicate keys, which is essential for accurate HAMT length and size checks. The size limits are reasonable for testing.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 52-52: Line too long (112/100)
(C0301)
tests/test_zarr_ipfs_encrypted.py (5)
1-13: Clean imports for encrypted testing.The imports are appropriate for testing encrypted Zarr stores with IPFS backend. Good use of Crypto.Random for secure key generation.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'numpy'
(E0401)
[error] 4-4: Unable to import 'pandas'
(E0401)
[error] 6-6: Unable to import 'xarray'
(E0401)
[error] 7-7: Unable to import 'zarr'
(E0401)
[error] 8-8: Unable to import 'zarr.core.buffer'
(E0401)
[error] 9-9: Unable to import 'Crypto.Random'
(E0401)
[error] 11-11: Unable to import 'py_hamt'
(E0401)
[error] 12-12: Unable to import 'py_hamt.zarr_hamt_store'
(E0401)
15-28: Well-designed test dataset fixture.The fixture creates a realistic test dataset with proper dimensions and metadata. The reduced size is good for test performance.
30-73: Thorough encryption key validation testing.The test properly validates all edge cases for encryption key length: too short, too long, empty, and correct length. Good use of parametric testing approach.
119-175: Comprehensive read testing with proper error validation.The test correctly validates reading with correct key, incorrect key, and no decryption. The error handling checks are appropriate and cover the expected failure modes.
176-207: Good coverage of store interface methods.The test properly exercises the async store interface including exists, list operations, and error conditions for read-only operations.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 179-179: Redundant comparison - ezhs_read_ok == ezhs_read_ok
(R0124)
tests/test_zarr_ipfs.py (6)
1-56: Good refactoring to async imports and fixture.The imports are updated appropriately for the new async HAMT implementation. The dataset fixture creates realistic test data with proper structure.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'numpy'
(E0401)
[error] 4-4: Unable to import 'pandas'
(E0401)
[error] 6-6: Unable to import 'xarray'
(E0401)
[error] 7-7: Unable to import 'zarr'
(E0401)
[error] 8-8: Unable to import 'zarr.core.buffer'
(E0401)
[error] 9-9: Unable to import 'dag_cbor.ipld'
(E0401)
[error] 11-11: Unable to import 'py_hamt'
(E0401)
[error] 12-12: Unable to import 'py_hamt.store'
(E0401)
[error] 13-13: Unable to import 'py_hamt.zarr_hamt_store'
(E0401)
59-89: Solid async write implementation.The test correctly uses async context managers for KuboCAS and properly handles the write-append pattern. The performance timing is useful for optimization tracking.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 59-59: Line too long (130/100)
(C0301)
[convention] 61-61: Missing function or method docstring
(C0116)
[refactor] 61-61: Too many local variables (23/15)
(R0914)
[warning] 63-63: Redefining name 'random_zarr_dataset' from outer scope (line 17)
(W0621)
[refactor] 61-61: Too many statements (80/50)
(R0915)
90-117: Effective read verification with data integrity checks.The read implementation correctly rebuilds the HAMT from the root CID and validates data integrity by comparing dataset halves. Good async pattern usage.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 114-114: Redundant comparison - zhs == zhs
(R0124)
139-180: Comprehensive store interface testing.The test properly exercises partial value operations, metadata operations, and deletion. The enable_write/read_only mode switching is well tested.
182-192: Good test for directory listing deduplication.This test specifically validates that
list_dirproperly deduplicates directory names, which is important for the Zarr interface.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 183-183: Missing function or method docstring
(C0116)
119-138:❓ Verification inconclusive
Check for potential code duplication.
Looking at the key collection logic, there appear to be multiple similar blocks collecting keys from different sources.
🏁 Script executed:
#!/bin/bash # Check for duplicate key collection patterns in this test file rg -A 4 -B 1 "async for.*in.*list" tests/test_zarr_ipfs.pyLength of output: 875
Reduce duplicated key-collection logic in tests/test_zarr_ipfs.py
There are four nearly identical blocks (lines 119–138) that build a
setby iterating overzhs.list(),zhs.list_prefix(""), etc. This repetition makes the test harder to maintain. Consider one of the following refactorings:• Extract a helper coroutine:
async def collect_keys(coro): keys = set() async for k in coro(): keys.add(k) return keys hamt_keys = await collect_keys(zhs.hamt.keys) for fn in (zhs.list, zhs.list, lambda: zhs.list_prefix("")): assert hamt_keys == await collect_keys(fn)• Use pytest parametrization:
@pytest.mark.parametrize("method", [zhs.list, zhs.list, lambda: zhs.list_prefix("")]) async def test_list_variants_match_hamt(method): hamt_keys = {k async for k in zhs.hamt.keys()} assert hamt_keys == {k async for k in method()}Locations to update:
- tests/test_zarr_ipfs.py, lines ~119–138: replace duplicated loops with a shared helper or a parametrized test.
[fixed_required]tests/test_hamt.py (5)
1-24: Excellent async test setup with comprehensive imports.The conversion to async testing is well done. Good use of increased deadline for CI stability and proper Hypothesis integration.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 6-6: Unable to import 'dag_cbor'
(E0401)
[error] 7-7: Unable to import 'hypothesis'
(E0401)
[error] 8-8: Unable to import 'hypothesis'
(E0401)
[error] 9-9: Unable to import 'multiformats'
(E0401)
[error] 12-12: Unable to import 'py_hamt.hamt'
(E0401)
[error] 13-13: Unable to import 'py_hamt.store'
(E0401)
[convention] 21-21: Missing function or method docstring
(C0116)
[refactor] 21-21: Too many local variables (21/15)
(R0914)
[refactor] 21-21: Too many statements (68/50)
(R0915)
25-50: Solid concurrent testing with dynamic bucket sizing.The test effectively exercises both sequential and concurrent operations while randomly changing bucket size. This is great for testing HAMT robustness under varying conditions.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 25-25: Line too long (120/100)
(C0301)
84-123: Comprehensive error handling and HAMT rebuilding tests.The tests properly validate read-only mode enforcement, HAMT reconstruction from root node ID, and error conditions. The cache write protection test is particularly good for coverage.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 95-95: Line too long (103/100)
(C0301)
[convention] 96-96: Line too long (124/100)
(C0301)
124-153: Excellent cache management testing.The concurrent cache size management tests are well-designed and exercise important code paths for memory management. The separation of read cache and write cache testing is good.
155-217: Good additional test coverage.The tests for byte values, invalid node deserialization, key rewriting, and link following provide good coverage of edge cases and core functionality.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 164-164: Line too long (116/100)
(C0301)
[convention] 204-204: Line too long (128/100)
(C0301)
[convention] 167-167: Missing function or method docstring
(C0116)
[convention] 181-181: Missing function or method docstring
(C0116)
[convention] 190-190: Missing function or method docstring
(C0116)
[convention] 206-206: Missing function or method docstring
(C0116)
py_hamt/encryption_hamt_store.py (3)
92-111: Constructor implementation looks good.The validation of the 32-byte encryption key requirement and metadata cache initialization are properly implemented.
113-135: Encryption/decryption implementation is secure and correct.The methods properly implement ChaCha20-Poly1305 with:
- 24-byte nonce for XChaCha20 variant
- Header as associated authenticated data
- Correct byte layout (nonce + tag + ciphertext)
- Appropriate error handling
146-171: Get method implementation with metadata caching is well-designed.The method efficiently caches decrypted metadata files (ending with "zarr.json") to optimize repeated reads, which is crucial for Zarr performance.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 164-164: Line too long (103/100)
(C0301)
py_hamt/zarr_hamt_store.py (1)
70-72: Good defensive programming with HAMT validation.The assertions ensure consistency between the HAMT and store configurations, preventing subtle bugs from mismatched settings.
py_hamt/store.py (1)
246-270: Excellent async implementation with proper concurrency control.The save and load methods correctly use:
- Async HTTP operations via aiohttp
- Semaphore-based rate limiting
- Proper error handling with raise_for_status()
py_hamt/hamt.py (4)
21-45: LGTM: Bit extraction logic is correct.The
extract_bitsfunction correctly implements bit manipulation for HAMT depth-based indexing. The bounds checking and bit masking operations are properly implemented.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 23-23: Line too long (131/100)
(C0301)
[convention] 31-31: Line too long (106/100)
(C0301)
51-59: LGTM: Blake3 hash function implementation is correct.The hash function properly uses the multihash library to generate 32-byte Blake3 digests and unwraps them for raw bytes.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 56-56: Line too long (117/100)
(C0301)
63-66: LGTM: Simplified node structure improves performance.The fixed-size 256-element list structure is a good optimization that aligns with the 8-bit indexing used in
extract_bits. Using empty dicts for buckets and single-element lists for links is a clean design.
664-684: LGTM: Async iteration with proper locking.The async iteration implementation correctly handles locking for write mode vs. concurrent access in read-only mode. The pattern is consistent across
keys()and other iteration methods.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 668-668: Line too long (146/100)
(C0301)
[convention] 670-670: Line too long (101/100)
(C0301)
tests/testing_utils.py
Outdated
| @pytest.fixture(scope="module") | ||
| def create_ipfs() -> Generator[Tuple[str, str], None, None]: | ||
| # Create temporary directory, set it as the IPFS Path | ||
| temp_dir: Path = Path(tempfile.mkdtemp()) | ||
| custom_env: dict[str, str] = os.environ.copy() | ||
| custom_env["IPFS_PATH"] = str(temp_dir) | ||
|
|
||
| # IPFS init | ||
| subprocess.run( | ||
| ["ipfs", "init", "--profile", "pebbleds"], check=True, env=custom_env | ||
| ) | ||
|
|
||
| # Edit the config file so that it serves on randomly selected and available ports to not conflict with any currently running ipfs daemons | ||
| swarm_port: int = find_free_port() | ||
| rpc_port: int = find_free_port() | ||
| gateway_port: int = find_free_port() | ||
|
|
||
| config_path: Path = temp_dir / "config" | ||
| config: dict[str, Any] | ||
| with open(config_path, "r") as f: | ||
| config = json.load(f) | ||
|
|
||
| swarm_addrs: list[str] = config["Addresses"]["Swarm"] | ||
| new_port_swarm_addrs: list[str] = [ | ||
| s.replace("4001", str(swarm_port)) for s in swarm_addrs | ||
| ] | ||
| config["Addresses"]["Swarm"] = new_port_swarm_addrs | ||
|
|
||
| rpc_multiaddr: str = config["Addresses"]["API"] | ||
| gateway_multiaddr: str = config["Addresses"]["Gateway"] | ||
|
|
||
| config["Addresses"]["API"] = rpc_multiaddr.replace("5001", str(rpc_port)) | ||
| config["Addresses"]["Gateway"] = gateway_multiaddr.replace( | ||
| "8080", str(gateway_port) | ||
| ) | ||
|
|
||
| with open(config_path, "w") as f: | ||
| json.dump(config, f, indent=2) | ||
|
|
||
| # Start the daemon | ||
| rpc_uri_stem: str = f"http://127.0.0.1:{rpc_port}" | ||
| gateway_uri_stem: str = f"http://127.0.0.1:{gateway_port}" | ||
|
|
||
| ipfs_process: subprocess.Popen[bytes] = subprocess.Popen( | ||
| ["ipfs", "daemon"], env=custom_env | ||
| ) | ||
| # Should be enough time for the ipfs daemon process to start up | ||
| time.sleep(5) | ||
|
|
||
| yield rpc_uri_stem, gateway_uri_stem | ||
|
|
||
| # Close the daemon | ||
| ipfs_process.kill() | ||
|
|
||
| # Delete the temporary directory | ||
| shutil.rmtree(temp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
IPFS fixture works but needs improvements.
The fixture correctly sets up an isolated IPFS environment but has several issues that should be addressed:
- Uses the vulnerable
find_free_port()function - Missing proper error handling for daemon startup failures
- Hard-coded sleep instead of proper readiness check
- Missing encoding specification for file operations
@pytest.fixture(scope="module")
def create_ipfs() -> Generator[Tuple[str, str], None, None]:
# Create temporary directory, set it as the IPFS Path
temp_dir: Path = Path(tempfile.mkdtemp())
custom_env: dict[str, str] = os.environ.copy()
custom_env["IPFS_PATH"] = str(temp_dir)
+
+ # Reserve ports and keep sockets open
+ swarm_port, swarm_socket = find_free_port()
+ rpc_port, rpc_socket = find_free_port()
+ gateway_port, gateway_socket = find_free_port()
- # Edit the config file so that it serves on randomly selected and available ports to not conflict with any currently running ipfs daemons
- swarm_port: int = find_free_port()
- rpc_port: int = find_free_port()
- gateway_port: int = find_free_port()
config_path: Path = temp_dir / "config"
config: dict[str, Any]
- with open(config_path, "r") as f:
+ with open(config_path, "r", encoding="utf-8") as f:
config = json.load(f)
# ... port configuration code ...
- with open(config_path, "w") as f:
+ with open(config_path, "w", encoding="utf-8") as f:
json.dump(config, f, indent=2)
+ # Close the reserved sockets now that config is written
+ swarm_socket.close()
+ rpc_socket.close()
+ gateway_socket.close()
+
# Start the daemon
rpc_uri_stem: str = f"http://127.0.0.1:{rpc_port}"
gateway_uri_stem: str = f"http://127.0.0.1:{gateway_port}"
- ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
- ["ipfs", "daemon"], env=custom_env
- )
- # Should be enough time for the ipfs daemon process to start up
- time.sleep(5)
+ try:
+ ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
+ ["ipfs", "daemon"], env=custom_env,
+ stdout=subprocess.PIPE, stderr=subprocess.PIPE
+ )
+ # Wait for daemon to be ready instead of fixed sleep
+ for _ in range(30): # 30 second timeout
+ time.sleep(1)
+ if ipfs_process.poll() is None: # Still running
+ try:
+ import requests
+ requests.get(f"{rpc_uri_stem}/api/v0/version", timeout=1)
+ break
+ except:
+ continue
+ else:
+ raise RuntimeError("IPFS daemon failed to start within 30 seconds")
+ except Exception:
+ shutil.rmtree(temp_dir)
+ raise🏁 Script executed:
#!/bin/bash
# Locate the implementation of find_free_port to verify its behavior
rg -n "def find_free_port" -A5Length of output: 394
🏁 Script executed:
#!/bin/bash
# Display the create_ipfs fixture implementation to confirm context
sed -n '1,200p' tests/testing_utils.pyLength of output: 3563
Improve IPFS fixture reliability and safety
This fixture correctly isolates an IPFS daemon but should be hardened to avoid port-allocation races, ensure proper cleanup on failures, and reliably detect when the daemon is ready.
Please address the following in tests/testing_utils.py (create_ipfs fixture, starting at line 62):
- Reserve the ports by holding open sockets until after writing the config to prevent race conditions in find_free_port().
- Specify
encoding="utf-8"when opening the config file for read/write. - Replace the fixed
time.sleep(5)with a loop that polls the daemon’s/api/v0/versionendpoint (with a timeout) to confirm readiness. - Wrap daemon startup in a
try/exceptto clean up the temp directory on failure and capturestdout/stderr.
Suggested diff:
@pytest.fixture(scope="module")
def create_ipfs() -> Generator[Tuple[str, str], None, None]:
# ...
- swarm_port: int = find_free_port()
- rpc_port: int = find_free_port()
- gateway_port: int = find_free_port()
+ # Reserve ports by keeping sockets open until config is written
+ swarm_port, swarm_sock = find_free_port()
+ rpc_port, rpc_sock = find_free_port()
+ gateway_port, gateway_sock = find_free_port()
config_path: Path = temp_dir / "config"
config: dict[str, Any]
- with open(config_path, "r") as f:
+ with open(config_path, "r", encoding="utf-8") as f:
config = json.load(f)
# … update Addresses …
- with open(config_path, "w") as f:
+ with open(config_path, "w", encoding="utf-8") as f:
json.dump(config, f, indent=2)
+ # Release reserved ports
+ swarm_sock.close()
+ rpc_sock.close()
+ gateway_sock.close()
- ipfs_process: subprocess.Popen[bytes] = subprocess.Popen(
- ["ipfs", "daemon"], env=custom_env
- )
- time.sleep(5)
+ try:
+ ipfs_process = subprocess.Popen(
+ ["ipfs", "daemon"],
+ env=custom_env,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ )
+ # Wait for up to 30s for the daemon to become responsive
+ for _ in range(30):
+ time.sleep(1)
+ try:
+ import requests
+ requests.get(f"http://127.0.0.1:{rpc_port}/api/v0/version", timeout=1)
+ break
+ except:
+ continue
+ else:
+ raise RuntimeError("IPFS daemon did not start within 30 seconds")
+ except Exception:
+ ipfs_process.kill()
+ shutil.rmtree(temp_dir)
+ raiseCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 74-74: Line too long (141/100)
(C0301)
[convention] 63-63: Missing function or method docstring
(C0116)
[warning] 81-81: Using open without explicitly specifying an encoding
(W1514)
[warning] 98-98: Using open without explicitly specifying an encoding
(W1514)
[refactor] 105-107: Consider using 'with' for resource-allocating operations
(R1732)
🤖 Prompt for AI Agents
In tests/testing_utils.py around lines 62 to 117, improve the create_ipfs
fixture by reserving the ports using open sockets before writing the config to
avoid race conditions from find_free_port(). Specify encoding="utf-8" when
opening the config file for reading and writing. Replace the fixed time.sleep(5)
with a loop that polls the IPFS daemon's /api/v0/version endpoint with a timeout
to confirm readiness. Wrap the daemon startup in a try/except block to ensure
proper cleanup of the temporary directory on failure and capture stdout/stderr
for debugging.
| async def _set_pointer(self, key: str, val_ptr: IPLDKind) -> None: | ||
| async with self.lock: | ||
| node_stack: list[tuple[IPLDKind, Node]] = [] | ||
| root_node: Node = await self.node_store.load(self.root_node_id) | ||
| node_stack.append((self.root_node_id, root_node)) | ||
|
|
||
| # FIFO queue to keep track of all the KVs we need to insert | ||
| # This is needed if any buckets overflow and so we need to reinsert all those KVs | ||
| kvs_queue: list[tuple[str, IPLDKind]] = [] | ||
| kvs_queue.append((key, val_ptr)) | ||
|
|
||
| while len(kvs_queue) > 0: | ||
| _, top_node = node_stack[-1] | ||
| curr_key, curr_val_ptr = kvs_queue[0] | ||
|
|
||
| raw_hash: bytes = self.hash_fn(curr_key.encode()) | ||
| map_key: int = extract_bits(raw_hash, len(node_stack) - 1, 8) | ||
|
|
||
| item = top_node.data[map_key] | ||
| if isinstance(item, list): | ||
| next_node_id: IPLDKind = item[0] | ||
| next_node: Node = await self.node_store.load(next_node_id) | ||
| node_stack.append((next_node_id, next_node)) | ||
| elif isinstance(item, dict): | ||
| bucket: dict[str, IPLDKind] = item | ||
|
|
||
| # If this bucket already has this same key, or has space, just rewrite the value and then go work on the others in the queue | ||
| if curr_key in bucket or len(bucket) < self.max_bucket_size: | ||
| bucket[curr_key] = curr_val_ptr | ||
| kvs_queue.pop(0) | ||
| continue | ||
|
|
||
| def __getitem__(self, key: str) -> IPLDKind: | ||
| if not self.read_only: | ||
| self.lock.acquire(blocking=True) | ||
| # The current key is not in the bucket and the bucket is too full, so empty KVs from the bucket and restart insertion | ||
| for k in bucket: | ||
| v_ptr = bucket[k] | ||
| kvs_queue.append((k, v_ptr)) | ||
|
|
||
| raw_hash = self.hash_fn(key.encode()) | ||
| # Create a new link to a new node so that we can reflow these KVs into a new subtree | ||
| new_node = Node() | ||
| new_node_id: IPLDKind = await self.node_store.save(None, new_node) | ||
| link: list[IPLDKind] = [new_node_id] | ||
| top_node.data[map_key] = link | ||
|
|
||
| node_id_stack: list[Link] = [] | ||
| node_id_stack.append(self.root_node_id) | ||
| # Finally, reserialize and fix all links, deleting empty nodes as needed | ||
| await self._reserialize_and_link(node_stack) | ||
| self.root_node_id = node_stack[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider breaking down the complex _set_pointer method.
This method has 20+ local variables and implements complex bucket overflow logic. Consider extracting helper methods for better maintainability.
The method could be broken down into:
_handle_bucket_overflow()_insert_into_bucket()_traverse_to_insertion_point()
This would improve readability and testability of the insertion logic.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 531-531: Line too long (144/100)
(C0301)
[convention] 537-537: Line too long (137/100)
(C0301)
[convention] 542-542: Line too long (104/100)
(C0301)
[refactor] 505-505: Too many local variables (20/15)
(R0914)
🤖 Prompt for AI Agents
In py_hamt/hamt.py around lines 505 to 550, the _set_pointer method is too
complex with many local variables and intricate bucket overflow logic. Refactor
by extracting helper methods such as _handle_bucket_overflow to manage bucket
splitting and reinsertion, _insert_into_bucket to handle inserting or updating
keys in buckets, and _traverse_to_insertion_point to navigate the node stack to
the correct insertion location. This modularization will improve readability,
maintainability, and testability of the insertion logic.
| # node stack is a list of tuples that look like (parent_id, self_id, node) | ||
| node_stack: list[tuple[int | None, int, Node]] = [] | ||
| # The root node may not be in the buffer, e.g. this is HAMT initialized with a specific root node id | ||
| if self.is_buffer_id(self.hamt.root_node_id): | ||
| root_node: Node = self.buffer[cast(int, self.hamt.root_node_id)] | ||
| node_stack.append((None, cast(int, self.hamt.root_node_id), root_node)) | ||
|
|
||
| while len(node_stack) > 0: | ||
| parent_buffer_id, top_buffer_id, top_node = node_stack[-1] | ||
| new_nodes_on_stack: list[tuple[int, int, Node]] = [] | ||
| for child_buffer_id in self.children_in_memory(top_node): | ||
| child_node: Node = self.buffer[child_buffer_id] | ||
| new_nodes_on_stack.append((top_buffer_id, child_buffer_id, child_node)) | ||
|
|
||
| no_children_in_memory: bool = len(new_nodes_on_stack) == 0 | ||
| # Flush this node out and relink the rest of the tree | ||
| if no_children_in_memory: | ||
| is_root: bool = parent_buffer_id is None | ||
| old_id: int = top_buffer_id | ||
| new_id: IPLDKind = await self.hamt.cas.save( | ||
| top_node.serialize(), codec="dag-cbor" | ||
| ) | ||
| del self.buffer[old_id] | ||
| node_stack.pop() | ||
|
|
||
| ## `__len__` | ||
| Total number of keys. Note that this will have to scan the entire tree in order to count, which can take a while depending on the speed of the store retrieval. | ||
| # If it's the root, we need to set the hamt's root node id once this is done sending to the backing store | ||
| if is_root: | ||
| self.hamt.root_node_id = new_id | ||
| # Edit and properly relink the parent if this is not the root | ||
| else: | ||
| # parent_buffer_id is never None in this branch | ||
| assert parent_buffer_id is not None | ||
| parent_node: Node = self.buffer[parent_buffer_id] | ||
| parent_node.replace_link(old_id, new_id) | ||
| # Continue recursing down the tree | ||
| else: | ||
| node_stack.extend(new_nodes_on_stack) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the complex vacate algorithm correctness.
The vacate method implements a complex dependency-aware flushing algorithm. The logic appears sound, but the complexity warrants verification.
The algorithm uses a DFS traversal to flush nodes from leaves to root, ensuring dependencies are satisfied. Let me verify that the stack manipulation and parent relinking logic is correct:
🏁 Script executed:
#!/bin/bash
# Description: Search for tests that verify the InMemoryTreeStore.vacate() method behavior
# Look for tests specifically testing the vacate functionality
rg -A 10 -B 5 "vacate.*test|test.*vacate" --type py
# Look for tests of InMemoryTreeStore
rg -A 10 -B 5 "InMemoryTreeStore" --type pyLength of output: 5286
Add dedicated tests for InMemoryTreeStore.vacate
The vacate method contains a non-trivial DFS-based flush algorithm but currently has no direct test coverage. Please add unit tests to ensure its correctness:
• py_hamt/hamt.py (lines 215–252): the core vacate implementation
• Create tests covering:
– Single‐node (root only) scenario
– Multi‐level tree: verify leaves are flushed before parents and links are relinked correctly
– Root ID update when vacating the root node
– Mixed buffer state (some children already flushed vs. still in memory)
– Empty buffer / no-op case
Tests can live under a new file, e.g. tests/test_inmemory_tree_store_vacate.py, and should drive HAMT.make_read_only() (which calls vacate) to observe side effects on buffer and root_node_id.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 218-218: Line too long (108/100)
(C0301)
[convention] 241-241: Line too long (121/100)
(C0301)
🤖 Prompt for AI Agents
In py_hamt/hamt.py around lines 215 to 252, the vacate method lacks direct unit
test coverage despite its complex DFS-based flushing logic. Add dedicated tests
in a new file like tests/test_inmemory_tree_store_vacate.py that cover
single-node scenarios, multi-level trees verifying leaf-first flushing and
correct parent relinking, root ID updates after vacating, mixed buffer states
with some children flushed, and empty buffer no-op cases. Use
HAMT.make_read_only() to trigger vacate and assert expected changes in buffer
and root_node_id to ensure correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/testing_utils.py (1)
80-83: TOCTOU race condition in port allocation.The
_free_port()function has the same race condition issue mentioned in past reviews. Although used in Docker context, the port can still be taken between socket close and Docker container startup.Consider returning both the port and an open socket, or use Docker's automatic port allocation:
-def _free_port() -> int: - with socket.socket() as s: - s.bind(("", 0)) - return s.getsockname()[1] +def _free_port() -> int: + """Get a free port. Note: Subject to race conditions in concurrent environments.""" + with socket.socket() as s: + s.bind(("", 0)) + return s.getsockname()[1]Alternatively, let Docker handle port allocation automatically by using
"5001/tcp": Nonein the ports mapping.tests/test_kubo_cas.py (1)
162-211: 🛠️ Refactor suggestionInconsistent IPFS availability handling.
This test manually checks for IPFS daemon availability instead of using the
create_ipfsfixture consistently. This creates brittleness and inconsistency with other tests.Consider using the
create_ipfsfixture consistently, or if testing default URLs specifically is important, use a separate test that's properly isolated:@pytest.mark.ipfs @pytest.mark.asyncio(loop_scope="session") @given(data=ipld_strategy()) @settings(deadline=1000, print_blob=True) -async def test_kubo_default_urls(global_client_session, data: IPLDKind): +async def test_kubo_default_urls(create_ipfs, global_client_session, data: IPLDKind): """ - Tests KuboCAS using its default URLs and when None is passed for URLs. - Requires a local IPFS daemon on default ports. + Tests KuboCAS default URL behavior with fixture-provided daemon. """ - # Check if local IPFS daemon is available on default ports - import http.client - - try: - conn = http.client.HTTPConnection("127.0.0.1", 5001, timeout=1) - conn.request("POST", "/api/v0/version") - response = conn.getresponse() - if response.status != 200: - pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)") - except Exception: - pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)") + rpc_url, gateway_url = create_ipfs🧰 Tools
🪛 Pylint (3.3.7)
[convention] 172-172: Import outside toplevel (http.client)
(C0415)
[warning] 180-180: Catching too general exception Exception
(W0718)
[warning] 193-193: Catching too general exception Exception
(W0718)
[warning] 209-209: Catching too general exception Exception
(W0718)
🧹 Nitpick comments (6)
tests/testing_utils.py (3)
1-11: Add module docstring for better documentation.Consider adding a module-level docstring to describe the purpose of this testing utilities module.
+""" +Testing utilities for py-hamt IPFS integration tests. + +Provides Hypothesis strategies for generating test data, helper functions for IPFS +daemon detection, Docker client management, and pytest fixtures for IPFS testing. +""" import http.client import os import socket🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 8-8: Unable to import 'hypothesis'
(E0401)
[error] 9-9: Unable to import 'hypothesis.strategies'
(E0401)
[error] 10-10: Unable to import 'multiformats'
(E0401)
28-39: Add docstring for ipld_strategy function.The function generates IPLD-compatible data structures but lacks documentation explaining its purpose and return type.
def ipld_strategy() -> SearchStrategy: + """ + Generate Hypothesis strategy for IPLD-compatible data structures. + + Returns a strategy that produces values compatible with IPLD (InterPlanetary + Linked Data) format, including primitives, binary data, and CIDs. + """ return st.one_of(🧰 Tools
🪛 Pylint (3.3.7)
[convention] 28-28: Missing function or method docstring
(C0116)
63-63: Replace broad exception handling with specific exceptions.Multiple functions use broad
except Exception:blocks which can mask unexpected errors. Consider catching specific exceptions where possible.For HTTP connection functions, catch specific connection-related exceptions:
try: conn = http.client.HTTPConnection(p.hostname, p.port, timeout=1) conn.request("POST", "/api/v0/version") return conn.getresponse().status == 200 - except Exception: + except (OSError, http.client.HTTPException): return FalseAlso applies to: 76-76, 116-116, 133-133, 141-141
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 63-63: Catching too general exception Exception
(W0718)
tests/conftest.py (1)
1-5: Add module docstring and ignore false positive import warnings.The re-exports are intentional for backward compatibility, but the module should have documentation.
+""" +Pytest configuration and shared fixtures for py-hamt tests. + +Provides shared aiohttp session, IPFS test control via --ipfs flag, +and re-exports testing utilities for backward compatibility. +""" import aiohttp import pytest # re-export helpers so existing imports keep working (optional) -from testing_utils import create_ipfs, ipld_strategy # noqa: F401 +from testing_utils import create_ipfs, ipld_strategy # noqa: F401,W0611🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 1-1: Unable to import 'aiohttp'
(E0401)
[warning] 5-5: Unused create_ipfs imported from testing_utils
(W0611)
[warning] 5-5: Unused ipld_strategy imported from testing_utils
(W0611)
README.md (1)
50-50: Fix grammar issue with "both...as well as" construction.-We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`. +We use `pytest` with 100% code coverage, with test inputs that are both handwritten and generated by `hypothesis`.🧰 Tools
🪛 LanguageTool
[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated byhypothesis. This allows ...(BOTH_AS_WELL_AS)
tests/test_kubo_cas.py (1)
1-11: Add module docstring for test file.+""" +Tests for KuboCAS and InMemoryCAS implementations. + +Tests async content-addressed storage with IPFS integration, +property-based testing with Hypothesis, and error handling. +""" from typing import Literal, cast🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'aiohttp'
(E0401)
[error] 4-4: Unable to import 'dag_cbor'
(E0401)
[error] 6-6: Unable to import 'dag_cbor'
(E0401)
[error] 7-7: Unable to import 'hypothesis'
(E0401)
[error] 10-10: Unable to import 'py_hamt'
(E0401)
[error] 11-11: Unable to import 'py_hamt.store'
(E0401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
.github/workflows/run-checks.yaml(1 hunks)README.md(2 hunks)pyproject.toml(2 hunks)run-checks.sh(1 hunks)tests/conftest.py(1 hunks)tests/test_kubo_cas.py(1 hunks)tests/testing_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- run-checks.sh
- pyproject.toml
- .github/workflows/run-checks.yaml
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/conftest.py (1)
tests/testing_utils.py (2)
create_ipfs(151-197)ipld_strategy(28-39)
tests/test_kubo_cas.py (3)
tests/testing_utils.py (2)
ipld_strategy(28-39)create_ipfs(151-197)py_hamt/store.py (8)
KuboCAS(74-269)InMemoryCAS(37-71)load(33-34)load(52-71)load(262-269)save(26-30)save(47-50)save(246-260)tests/conftest.py (1)
global_client_session(9-13)
🪛 Pylint (3.3.7)
tests/conftest.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 1-1: Unable to import 'aiohttp'
(E0401)
[convention] 16-16: Missing function or method docstring
(C0116)
[convention] 25-25: Missing function or method docstring
(C0116)
[convention] 29-29: Missing function or method docstring
(C0116)
[warning] 5-5: Unused create_ipfs imported from testing_utils
(W0611)
[warning] 5-5: Unused ipld_strategy imported from testing_utils
(W0611)
tests/testing_utils.py
[convention] 48-48: Line too long (112/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 8-8: Unable to import 'hypothesis'
(E0401)
[error] 9-9: Unable to import 'hypothesis.strategies'
(E0401)
[error] 10-10: Unable to import 'multiformats'
(E0401)
[convention] 28-28: Missing function or method docstring
(C0116)
[warning] 63-63: Catching too general exception Exception
(W0718)
[warning] 76-76: Catching too general exception Exception
(W0718)
[warning] 116-116: Catching too general exception Exception
(W0718)
[warning] 133-133: Catching too general exception Exception
(W0718)
[warning] 141-141: Catching too general exception Exception
(W0718)
tests/test_kubo_cas.py
[convention] 63-63: Line too long (129/100)
(C0301)
[convention] 93-93: Line too long (130/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'aiohttp'
(E0401)
[error] 4-4: Unable to import 'dag_cbor'
(E0401)
[error] 6-6: Unable to import 'dag_cbor'
(E0401)
[error] 7-7: Unable to import 'hypothesis'
(E0401)
[error] 10-10: Unable to import 'py_hamt'
(E0401)
[error] 11-11: Unable to import 'py_hamt.store'
(E0401)
[convention] 16-16: Missing function or method docstring
(C0116)
[convention] 172-172: Import outside toplevel (http.client)
(C0415)
[warning] 180-180: Catching too general exception Exception
(W0718)
[warning] 193-193: Catching too general exception Exception
(W0718)
[warning] 209-209: Catching too general exception Exception
(W0718)
[convention] 218-218: Missing function or method docstring
(C0116)
🪛 LanguageTool
README.md
[uncategorized] ~48-~48: Possible missing comma found.
Context: ...check formatting and linting. Under the hood it will be using the pre-commit comma...
(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~50-~50: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...
(BOTH_AS_WELL_AS)
[style] ~60-~60: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~60-~60: Possible missing comma found.
Context: .../run-checks.yamluses thesetup-ipfs` step which ensures that a local ipfs daemon ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~60-~60: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
README.md
54-54: Blank line inside blockquote
null
(MD028, no-blanks-blockquote)
🔇 Additional comments (9)
tests/testing_utils.py (1)
150-197: IPFS fixture implementation is well-designed.The fixture provides good fallback logic with proper resource cleanup. The Docker-based approach with readiness checking is robust and handles different deployment scenarios well.
tests/conftest.py (2)
8-13: Excellent async session management.The session-scoped async fixture provides efficient resource sharing across tests while ensuring proper cleanup. This is a best practice for aiohttp testing.
16-35: Well-implemented conditional test execution.The
--ipfsflag approach with automatic test skipping provides good control over IPFS-dependent tests. This is particularly useful for CI environments and local development.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 16-16: Missing function or method docstring
(C0116)
[convention] 25-25: Missing function or method docstring
(C0116)
[convention] 29-29: Missing function or method docstring
(C0116)
README.md (3)
13-15: Clear description update reflects architectural changes.The updated description properly emphasizes the content-addressed storage focus and IPFS/IPLD targeting, which aligns with the async refactoring objectives.
27-41: Comprehensive development setup instructions.The addition of explicit setup commands using
uv, virtual environment creation, and pre-commit hook installation provides clear guidance for new contributors.
58-82: Excellent clarification of IPFS testing scenarios.The detailed breakdown of different testing scenarios (GitHub Actions, local with Docker, local with daemon, quick local) provides valuable guidance for developers in various environments.
🧰 Tools
🪛 LanguageTool
[style] ~60-~60: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~60-~60: Possible missing comma found.
Context: .../run-checks.yamluses thesetup-ipfs` step which ensures that a local ipfs daemon ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~60-~60: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...(AI_HYDRA_LEO_MISSING_COMMA)
tests/test_kubo_cas.py (3)
15-43: Comprehensive error handling tests for InMemoryCAS.The tests properly validate both KeyError for missing keys and TypeError for invalid key types with appropriate error messages. Good defensive testing.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 16-16: Missing function or method docstring
(C0116)
135-160: Well-structured test with proper fixture usage.The test correctly uses the
create_ipfsfixture andglobal_client_sessionfor reliable IPFS testing across different environments (Docker/local daemon).
213-235: Good test coverage with proper session management.The test demonstrates proper async session management by creating its own
aiohttp.ClientSessionand using proper type safety with codec literals.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 218-218: Missing function or method docstring
(C0116)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
py_hamt/store.py (2)
23-23: Fix the type alias syntax to resolve parsing error.The type alias syntax is causing a parsing error with Pylint. Use the traditional syntax that's compatible with older Python versions.
- type CodecInput = Literal["raw", "dag-cbor"] + CodecInput = Literal["raw", "dag-cbor"]🧰 Tools
🪛 Pylint (3.3.7)
[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'
(E0001)
231-241: Add test coverage for async context manager methods.The async context manager methods (
aclose,__aenter__,__aexit__) need test coverage to ensure session cleanup works correctly, as identified in previous reviews.
🧹 Nitpick comments (2)
py_hamt/store.py (2)
140-140: Add validation for the concurrency parameter.Consider adding validation to ensure the concurrency parameter is a positive integer to prevent runtime issues with the semaphore.
+ if concurrency <= 0: + raise ValueError("concurrency must be a positive integer")
258-260: Add comment explaining DAG_PB special handling.The special case for
DAG_PB_MARKERcould benefit from a comment explaining why this codec is not overridden (likely because it's UnixFS sharded data as mentioned in the AI summary).+ # Don't override DAG_PB codec as it indicates UnixFS sharded data if cid.codec.code != self.DAG_PB_MARKER: cid = cid.set(codec=codec)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
py_hamt/store.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
py_hamt/store.py (3)
py_hamt/hamt.py (7)
save(133-134)save(155-156)save(267-273)load(137-138)load(158-167)load(275-284)get(592-599)py_hamt/encryption_hamt_store.py (1)
get(146-170)py_hamt/zarr_hamt_store.py (1)
get(92-118)
🪛 Pylint (3.3.7)
py_hamt/store.py
[error] 23-23: Parsing failed: 'invalid syntax (py_hamt.store, line 23)'
(E0001)
🔇 Additional comments (2)
py_hamt/store.py (2)
52-71: Excellent defensive programming and error handling.The type checking with detailed explanation of the Liskov substitution principle issue is well-documented. The improved error handling using KeyError with chained exceptions is much better than generic exceptions.
262-269: Clean async implementation with proper throttling.The load method effectively uses aiohttp with semaphore throttling and appropriate error handling. This addresses the previous blocking requests issue mentioned in past reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
py_hamt/store.py (2)
23-23: Fix the type alias syntax error.The type alias syntax is still causing parsing errors. Based on the past review comment, this needs to be corrected.
- CodecInput = Literal["raw", "dag-cbor"] +CodecInput = Literal["raw", "dag-cbor"]Move the type alias outside the class to resolve the parsing issue.
231-242: Add test coverage for async context manager methods.The async context manager methods (
aclose,__aenter__,__aexit__) still need test coverage as identified in the past review comments. The pipeline failure indicates these lines aren't covered by tests.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 231-231: Missing function or method docstring
(C0116)
🧹 Nitpick comments (3)
py_hamt/store.py (3)
44-44: Use dict literal for better performance.- self.store = dict() + self.store = {}🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 44-44: Consider using '{}' instead of a call to 'dict'.
(R1735)
47-51: Consider adding codec validation.The
codecparameter is not used in the implementation, but the method signature suggests it should affect how data is stored or identified.async def save(self, data: bytes, codec: ContentAddressedStore.CodecInput) -> bytes: + # TODO: Consider incorporating codec into hash calculation for different content types hash: bytes = self.hash_alg.digest(data, size=32) self.store[hash] = data return hash🧰 Tools
🪛 Pylint (3.3.7)
[warning] 48-48: Redefining built-in 'hash'
(W0622)
1-8: Add module docstring.+""" +Content-addressed storage implementations for HAMT. + +Provides abstract base class and concrete implementations for storing +and retrieving data in content-addressed storage systems like IPFS. +""" import asyncio🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 5-5: Unable to import 'aiohttp'
(E0401)
[error] 6-6: Unable to import 'dag_cbor.ipld'
(E0401)
[error] 7-7: Unable to import 'multiformats'
(E0401)
[error] 8-8: Unable to import 'multiformats.multihash'
(E0401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
py_hamt/store.py(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store.py
[convention] 13-13: Line too long (104/100)
(C0301)
[convention] 15-15: Line too long (145/100)
(C0301)
[convention] 18-18: Line too long (194/100)
(C0301)
[convention] 20-20: Line too long (119/100)
(C0301)
[convention] 29-29: Line too long (116/100)
(C0301)
[convention] 38-38: Line too long (261/100)
(C0301)
[convention] 57-57: Line too long (103/100)
(C0301)
[convention] 58-58: Line too long (109/100)
(C0301)
[convention] 60-60: Line too long (134/100)
(C0301)
[convention] 83-83: Line too long (168/100)
(C0301)
[convention] 133-133: Line too long (190/100)
(C0301)
[convention] 146-146: Line too long (188/100)
(C0301)
[convention] 149-149: Line too long (177/100)
(C0301)
[convention] 150-150: Line too long (106/100)
(C0301)
[convention] 152-152: Line too long (170/100)
(C0301)
[convention] 164-164: Line too long (183/100)
(C0301)
[convention] 173-173: Line too long (205/100)
(C0301)
[convention] 174-174: Line too long (174/100)
(C0301)
[convention] 178-178: Line too long (136/100)
(C0301)
[convention] 182-182: Line too long (178/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 5-5: Unable to import 'aiohttp'
(E0401)
[error] 6-6: Unable to import 'dag_cbor.ipld'
(E0401)
[error] 7-7: Unable to import 'multiformats'
(E0401)
[error] 8-8: Unable to import 'multiformats.multihash'
(E0401)
[warning] 33-33: Redefining built-in 'id'
(W0622)
[refactor] 44-44: Consider using '{}' instead of a call to 'dict'.
(R1735)
[warning] 48-48: Redefining built-in 'hash'
(W0622)
[warning] 52-52: Redefining built-in 'id'
(W0622)
[refactor] 74-74: Too many instance attributes (8/7)
(R0902)
[refactor] 134-134: Too many arguments (8/5)
(R0913)
[refactor] 134-134: Too many positional arguments (6/5)
(R0917)
[convention] 231-231: Missing function or method docstring
(C0116)
[warning] 262-262: Redefining built-in 'id'
(W0622)
🔇 Additional comments (3)
py_hamt/store.py (3)
189-192: Verify URL construction and pin parameter.The URL construction uses
pin=falsewhich aligns with the past discussion about not wanting to pin content on saves to avoid repinning entire datasets.The implementation correctly avoids pinning as discussed in previous comments, preventing potential performance issues when dealing with large HAMTs.
194-227: Well-designed session management per event loop.This is an excellent solution to the async session management challenges mentioned in the PR objectives. The per-loop session management elegantly handles the issue where zarr runs its own event loop in a separate thread.
The implementation properly addresses the session management complexities across different event loops that were discussed in the PR comments.
246-270: Excellent migration to async HTTP client.The implementation successfully addresses the past review comment about blocking requests in async methods. The use of
aiohttpwith proper session management and semaphore-based concurrency control is well-executed.Key improvements:
- Non-blocking HTTP requests with
aiohttp- Proper semaphore usage for rate limiting
- Session reuse across requests
- Appropriate error handling with
raise_for_status()🧰 Tools
🪛 Pylint (3.3.7)
[warning] 262-262: Redefining built-in 'id'
(W0622)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
tests/test_kubo_cas.py (3)
1-1: Add module docstring.The module is missing a docstring that should describe its purpose and functionality.
+""" +Tests for KuboCAS and InMemoryCAS content-addressed storage implementations. + +This module contains async tests for the refactored storage layer, including +property-based tests for IPLD data serialization/deserialization and error +handling scenarios. +""" + from typing import Literal, cast🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
15-19: Add function docstring for consistency.This function lacks a docstring explaining its purpose.
@pytest.mark.asyncio async def test_memory_store_exception(): + """Test that InMemoryCAS.load raises KeyError for empty bytes key.""" s = InMemoryCAS()🧰 Tools
🪛 Pylint (3.3.7)
[convention] 16-16: Missing function or method docstring
(C0116)
124-129: Add function docstring.This function lacks a docstring explaining its purpose and how it differs from other KuboCAS tests.
@pytest.mark.asyncio @given(data=ipld_strategy()) @settings( deadline=500 ) # this sometimes takes longer than the default 250 ms in GitHub CI async def test_kubo_cas(create_ipfs, data: IPLDKind): # noqa + """ + Test KuboCAS with a custom aiohttp session for complete code coverage. + + This test creates its own ClientSession instead of using the global one + to ensure all code paths are tested. + """ rpc_base_url, gateway_base_url = create_ipfs🧰 Tools
🪛 Pylint (3.3.7)
[convention] 129-129: Missing function or method docstring
(C0116)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_kubo_cas.py(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/test_kubo_cas.py
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'aiohttp'
(E0401)
[error] 4-4: Unable to import 'dag_cbor'
(E0401)
[error] 6-6: Unable to import 'dag_cbor'
(E0401)
[error] 7-7: Unable to import 'hypothesis'
(E0401)
[error] 10-10: Unable to import 'py_hamt'
(E0401)
[error] 11-11: Unable to import 'py_hamt.store'
(E0401)
[convention] 16-16: Missing function or method docstring
(C0116)
[convention] 83-83: Import outside toplevel (http.client)
(C0415)
[warning] 91-91: Catching too general exception Exception
(W0718)
[warning] 104-104: Catching too general exception Exception
(W0718)
[warning] 120-120: Catching too general exception Exception
(W0718)
[convention] 129-129: Missing function or method docstring
(C0116)
🔇 Additional comments (1)
tests/test_kubo_cas.py (1)
82-93: Good implementation of daemon availability check.The addition of the daemon availability check properly addresses the previous review concern about test brittleness when no IPFS daemon is running on default ports. The timeout and graceful skip behavior make the test more robust.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 83-83: Import outside toplevel (http.client)
(C0415)
[warning] 91-91: Catching too general exception Exception
(W0718)
| loaded_encoded_data = await kubo_cas_default.load(cid) | ||
| result = dag_cbor.decode(loaded_encoded_data) | ||
| assert data == result | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use more specific exception handling.
Catching the general Exception in these test scenarios is too broad and may mask unexpected errors. Consider catching specific exceptions related to IPFS operations or let unexpected exceptions propagate to help with debugging.
- except Exception as e:
+ except (aiohttp.ClientError, TimeoutError) as e:
pytest.fail(
f"Error during KuboCAS default URL test (codec: {codec}): {e}"
)And similarly for the second occurrence:
- except Exception as e:
+ except (aiohttp.ClientError, TimeoutError) as e:
pytest.fail(f"Error during KuboCAS None URL test (codec: {codec}): {e}")Also applies to: 120-120
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 104-104: Catching too general exception Exception
(W0718)
🤖 Prompt for AI Agents
In tests/test_kubo_cas.py at lines 104 and 120, replace the broad 'except
Exception' clauses with more specific exception types related to IPFS
operations. Identify the particular exceptions that the IPFS client or related
code can raise and catch only those. This will prevent masking unexpected errors
and improve test reliability and debugging.
| response = conn.getresponse() | ||
| if response.status != 200: | ||
| pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)") | ||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use more specific exception handling.
Catching the general Exception is too broad. Consider catching specific connection-related exceptions.
- except Exception:
+ except (ConnectionError, OSError, TimeoutError):
pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception: | |
| except (ConnectionError, OSError, TimeoutError): | |
| pytest.skip("No IPFS daemon running on default ports (127.0.0.1:5001)") |
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 91-91: Catching too general exception Exception
(W0718)
🤖 Prompt for AI Agents
In tests/test_kubo_cas.py at line 91, replace the broad `except Exception:`
clause with more specific exception handling for connection-related errors.
Identify the exact exceptions raised during connection failures (such as
connection errors or timeouts) and catch only those to improve error handling
precision.
| Requires a local IPFS daemon on default ports. | ||
| """ | ||
| # Check if local IPFS daemon is available on default ports | ||
| import http.client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Move import to module level.
The http.client import should be moved to the top of the file with other imports for better code organization.
+import http.client
from typing import Literal, cast
import aiohttpThen remove the import from inside the function:
"""
# Check if local IPFS daemon is available on default ports
- import http.client
try:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import http.client | |
| # At the top of tests/test_kubo_cas.py, group all imports together | |
| import http.client | |
| from typing import Literal, cast | |
| import aiohttp | |
| # …other existing imports… |
| import http.client | |
| # In the test_kubo_default_urls function, remove the inline import | |
| """ | |
| # Check if local IPFS daemon is available on default ports | |
| try: | |
| # …rest of the test… |
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 83-83: Import outside toplevel (http.client)
(C0415)
🤖 Prompt for AI Agents
In tests/test_kubo_cas.py at line 83, the import statement for http.client is
inside a function. Move this import to the top of the file with the other import
statements to improve code organization and readability. Then remove the import
statement from inside the function to avoid redundant imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
README.md (3)
45-52: Refine comma usage and conjunction
- Add a comma after “Under the hood,”
- Change “both handwritten as well as generated” to “both handwritten and generated”
Apply this diff:- This will run tests with code coverage, and check formatting and linting. Under the hood it will be using the `pre-commit` command ... + This will run tests with code coverage, and check formatting and linting. Under the hood, it will be using the `pre-commit` command ... - We use `pytest` with 100% code coverage, and with test inputs that are both handwritten as well as generated by `hypothesis`. + We use `pytest` with 100% code coverage and with test inputs that are both handwritten and generated by `hypothesis`.🧰 Tools
🪛 LanguageTool
[uncategorized] ~50-~50: Possible missing comma found.
Context: ...check formatting and linting. Under the hood it will be using thepre-commitcomma...(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~52-~52: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated byhypothesis. This allows ...(BOTH_AS_WELL_AS)
54-59: Remove blank line inside blockquote
The extra blank line between blockquote markers breaks markdownlint. Remove the empty line at line 56.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
56-56: Blank line inside blockquote
null(MD028, no-blanks-blockquote)
62-62: Simplify phrasing and add comma
- Replace “in order to be able to run” with “to run”
- Insert a comma after “installed,”
- Due to the dependency on [IPFS](...) in order to be able to run all integration tests which use IPFS a local ipfs daemon is required. + Due to the dependency on [IPFS](...), to run all integration tests which use IPFS, a local ipfs daemon is required.🧰 Tools
🪛 LanguageTool
[style] ~62-~62: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...(IN_ORDER_TO_PREMIUM)
[uncategorized] ~62-~62: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...(AI_HYDRA_LEO_MISSING_COMMA)
AGENTS.md (2)
19-23: Fix typo in bullet
“accesed” → “accessed”- `hamt.py`- where the HAMT data structure is constructed and accesed + `hamt.py` - where the HAMT data structure is constructed and accessed🧰 Tools
🪛 LanguageTool
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation -/py_hamt: Source code of the py-hamt library ...(UNLIKELY_OPENING_PUNCTUATION)
111-111: Add comma for clarity
Insert a comma before the coordinating conjunction to separate clauses.- If there is an error with formatting for ruff and it can be autofixed you can normally run + If there is an error with formatting for ruff, and it can be autofixed, you can normally run🧰 Tools
🪛 LanguageTool
[uncategorized] ~111-~111: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...(COMMA_COMPOUND_SENTENCE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
AGENTS.md(1 hunks)README.md(2 hunks)
🧰 Additional context used
🪛 LanguageTool
AGENTS.md
[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...
(THIS_NNS)
[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...
(COMMA_COMPOUND_SENTENCE_2)
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. - /tests: Test files that should be maintained an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~111-~111: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...
(COMMA_COMPOUND_SENTENCE)
README.md
[uncategorized] ~50-~50: Possible missing comma found.
Context: ...check formatting and linting. Under the hood it will be using the pre-commit comma...
(AI_HYDRA_LEO_MISSING_COMMA)
[grammar] ~52-~52: Probable usage error. Use “and” after ‘both’.
Context: ...h test inputs that are both handwritten as well as generated by hypothesis. This allows ...
(BOTH_AS_WELL_AS)
[style] ~62-~62: Consider a more concise word here.
Context: ...on IPFS in order to be able to run all integration tests wh...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~62-~62: Possible missing comma found.
Context: ...Docker](https://www.docker.com/) is not installed then tests will simply run the unit tes...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.17.2)
README.md
56-56: Blank line inside blockquote
null
(MD028, no-blanks-blockquote)
🔇 Additional comments (11)
README.md (10)
9-9: Badge looks good
The Codecov badge URL is valid and up to date.
13-15: Updated project description is clear
The new summary accurately reflects the HAMT’s purpose and removed outdated examples.
23-26: Usage references updated
Linking to the API docs and external repos is appropriate and keeps the README concise.
28-36: Dev setup with uv is well documented
Steps to installuv, sync, activate the venv, and install hooks read clearly.
38-44: Pre-commit instructions are clear
Includingrun-checks.shusage here aligns with the updated CI and local workflow.
64-69: Test matrix summary is helpful
The examples for GitHub Actions and local scenarios are clear and comprehensive.
88-88: CPU/memory profiling section intact
Instructions forcProfile,snakeviz, andmemrayare accurate.
105-108: Documentation generation instructions are correct
Usinguv run pdoc py_hamtis consistent with the rest of the dev workflow.
110-113: LLMs section added appropriately
Pointing AI users toAGENTS.mdis clear and purposeful.
114-116: Dependency management is well covered
uv add/uv removeguidance is consistent with the project's use ofuv.🧰 Tools
🪛 LanguageTool
[typographical] ~115-~115: Consider adding a comma here.
Context: ...ytest --group dev`. For more information please see the [uv documentation](https://docs...(PLEASE_COMMA)
AGENTS.md (1)
1-120: New AI agent guide looks comprehensive
The file concisely covers architecture, conventions, setup, testing, and PR guidelines for AI agents.🧰 Tools
🪛 LanguageTool
[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...(THIS_NNS)
[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...(COMMA_COMPOUND_SENTENCE_2)
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation -/py_hamt: Source code of the py-hamt library ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. -/tests: Test files that should be maintained an...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~111-~111: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...(COMMA_COMPOUND_SENTENCE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
AGENTS.md (9)
7-7: Capitalize “Python” in project description
Standardize language name by capitalizing “Python”.Apply this diff:
- This library is a python implementation of a HAMT, inspired by [rvagg's IAMap project written in JavaScript](https://github.com/rvagg/iamap). + This library is a Python implementation of a HAMT, inspired by [rvagg's IAMap project written in JavaScript](https://github.com/rvagg/iamap).
11-11: Standardize “Zarr” casing
Use proper casing for the Zarr project name.- dClimate primarily created this for storing large [zarrs](https://zarr.dev/) on IPFS. + dClimate primarily created this for storing large [Zarr](https://zarr.dev/) datasets on IPFS.
15-15: Add comma and standardize “RPC” in architecture description
Improve readability and consistency.- An IPFS daemon running locally can be used or a remote one can be accessed via gateway(reads) & rpc endpoints(writes) on their IPFS defaults. + An IPFS daemon running locally can be used, or a remote one can be accessed via gateway (reads) & RPC endpoints (writes) on their IPFS defaults.🧰 Tools
🪛 LanguageTool
[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...(COMMA_COMPOUND_SENTENCE_2)
19-25: Fix list indentation and correct typos under “Project Structure”
Align nested list items with two spaces and fix spelling.- - `/py_hamt`: Source code of the py-hamt library - - `encryption_hamt_store.py` - Example using total encryption - - `hamt.py`- where the HAMT data structure is constructed and accesed - - `store.py`- Where various stores live that writes the data. Primarily used KuboCAS - - `zarr_hamt_store.py` - ZarrHAMTStore class used to directly write or read zarrs leveraging the HAMT data structure onto a Store usually a Content Addressed store like KuboCAS located in store.py. - - `/tests`: Test files that should be maintained and extended where possible. + - `/py_hamt`: Source code of the py-hamt library + - `encryption_hamt_store.py` - Example using total encryption + - `hamt.py` - where the HAMT data structure is constructed and accessed + - `store.py` - where various stores live that write the data (primarily KuboCAS) + - `zarr_hamt_store.py` - ZarrHAMTStore class for reading/writing Zarr leveraging the HAMT data structure on a Content Addressed store (e.g., KuboCAS) + - `/tests`: Test files that should be maintained and extended where possible.🧰 Tools
🪛 LanguageTool
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation -/py_hamt: Source code of the py-hamt library ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. -/tests: Test files that should be maintained an...(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
20-20: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
36-38: Refine wording for readability
Consider hyphenation for clarity.- Use functional easy to understand functions wherever possible. + Use functional, easy-to-understand functions wherever possible.
57-59: Specify language for code fence
Add a language identifier to fenced code blocks for proper syntax highlighting.- ``` - pre-commit install - ``` + ```bash + pre-commit install + ```🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
57-57: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
63-63: Fix spacing in inline code span
Add a space before the inline code for readability.- run all tests with` pytest --ipfs` + run all tests with `pytest --ipfs`🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
63-63: Spaces inside code span elements
null(MD038, no-space-in-code)
106-108: Add language specifiers to fenced code blocks
Ensure proper syntax highlighting by specifying the language for these fences.- ``` - pytest --cov - ``` + ```bash + pytest --cov + ``` - ``` - uv run pre-commit run --all-files --show-diff-on-failure - ``` + ```bash + uv run pre-commit run --all-files --show-diff-on-failure + ``` - ``` - uv run ruff check --fix - ``` + ```bash + uv run ruff check --fix + ```Also applies to: 111-113, 121-123
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
106-106: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
119-119: Add comma for compound sentence clarity
Insert a comma before “and” to separate independent clauses.- If there is an error with formatting for ruff and it can be autofixed you can normally run + If there is an error with formatting for ruff, and it can be autofixed, you can normally run🧰 Tools
🪛 LanguageTool
[uncategorized] ~119-~119: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...(COMMA_COMPOUND_SENTENCE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
AGENTS.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
AGENTS.md
[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...
(THIS_NNS)
[uncategorized] ~15-~15: Use a comma before “or” if it connects two independent clauses (unless they are closely connected and short).
Context: ... IPFS daemon running locally can be used or a remote one can be accessed via gatewa...
(COMMA_COMPOUND_SENTENCE_2)
[uncategorized] ~19-~19: Loose punctuation mark.
Context: ...or OpenAI Codex Navigation - /py_hamt: Source code of the py-hamt library ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~24-~24: Loose punctuation mark.
Context: ... KuboCAS located in store.py. - /tests: Test files that should be maintained an...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~119-~119: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...ere is an error with formatting for ruff and it can be autofixed you can normally ru...
(COMMA_COMPOUND_SENTENCE)
🪛 markdownlint-cli2 (0.17.2)
AGENTS.md
20-20: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
57-57: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
63-63: Spaces inside code span elements
null
(MD038, no-space-in-code)
106-106: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
112-112: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
121-121: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🔇 Additional comments (2)
AGENTS.md (2)
2-3: Skip false positive: correct singular usage
The determiner “this” correctly refers to the singular file “AGENTS.md”, not its plural agents.🧰 Tools
🪛 LanguageTool
[grammar] ~2-~2: The singular determiner ‘this’ may not agree with the plural noun ‘agents’. Did you mean “these”?
Context: # Project Agents.md Guide for py-hamt This AGENTS.md file provides comprehensive g...(THIS_NNS)
26-34: Docs conventions section is clear
The coding conventions and guidelines are well structured and aligned with project requirements.
This includes many breaking changes, hence the major version update to 3.0.0. The high level overview is that this library was rewritten for better performance for large key-value mapping sets. Major changes to note include:
Addresses #12
Summary by CodeRabbit
ZarrHAMTStoreandSimpleEncryptedZarrHAMTStorefor async Zarr v3 storage backed by HAMT, including full encryption support.ContentAddressedStore,KuboCAS(IPFS-backed), andInMemoryCAS.aiohttp, updatedzarrandxarrayspecs, addedmypyandpandas-stubs..DS_Storeto.gitignore.IPFSZarr3, encryption transformers).