Skip to content

Conversation

@Faolain
Copy link
Contributor

@Faolain Faolain commented Jun 12, 2025

From Claude Code:

 Perfect! We've successfully migrated the code from aiohttp to httpx. Let's summarize the
  changes we made:

  1. Created a new implementation in store_httpx.py to replace the aiohttp-based store.py
  2. Updated imports in init.py to use classes from store_httpx.py
  3. Updated imports in other modules (hamt.py, zarr_hamt_store.py, encryption_hamt_store.py)
   to use ContentAddressedStore from the main import
  4. Updated all tests to use httpx APIs (like _loop_client() instead of _loop_session())
  5. Fixed method names (aclose() instead of close()) and properties (is_closed instead of
  closed)
  6. Updated the BasicAuth assertions in tests to use isinstance checks
  7. Added new tests to achieve 100% code coverage
  8. Removed the old store.py file which was no longer used
  9. Updated the fsgs.py script to use httpx instead of aiohttp

  These changes have completely replaced aiohttp with httpx, which should resolve the
  original issue with "Task was destroyed but it is pending" warnings while maintaining good
  performance.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a script for asynchronously accessing and querying Zarr datasets stored on IPFS via a local Kubo node.
  • Refactor

    • Replaced the HTTP client library from aiohttp to httpx for improved performance and compatibility.
    • Updated internal and public interfaces to use httpx.AsyncClient.
    • Improved client cleanup and resource management.
  • Bug Fixes

    • Enhanced reliability of client resource cleanup during object destruction.
  • Tests

    • Updated and expanded test coverage to reflect the switch to httpx and to verify improved cleanup behavior.
    • Added performance benchmark test for reading from HAMT-backed Zarr datasets.
    • Added tests to ensure proper client closure on object deletion and robustness against cleanup errors.
    • Streamlined import paths in tests for consistency.
  • Chores

    • Updated dependencies and project version.
    • Streamlined import paths for easier maintenance.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 12, 2025

Walkthrough

This update migrates the codebase from using aiohttp to httpx for asynchronous HTTP client functionality, primarily affecting the KuboCAS class and its usage throughout the project and tests. It also updates import paths for ContentAddressedStore and InMemoryCAS, introduces a new script for Zarr dataset querying, and increments the package version.

Changes

File(s) Change Summary
py_hamt/store_httpx.py Migrated all HTTP client logic from aiohttp to httpx in KuboCAS; updated client management, save/load, cleanup, and docstrings. Added __del__ for cleanup.
py_hamt/__init__.py, py_hamt/hamt.py Changed imports to use .store_httpx for ContentAddressedStore, KuboCAS, and InMemoryCAS. Added print statement and updated __all__.
pyproject.toml Bumped version to 3.1.0; replaced aiohttp dependency with httpx.
fsgs.py Added new script for async Zarr dataset querying via IPFS using KuboCAS and ZarrHAMTStore.
tests/conftest.py Changed test fixture to use httpx.AsyncClient instead of aiohttp.ClientSession.
tests/test_kubocas_session.py Updated tests to use httpx.AsyncClient; renamed functions/variables from "session" to "client"; added test_del_closes_client.
tests/test_async.py Updated to use client-based API; replaced session checks with client checks; added tests for cleanup edge cases.
tests/test_kubo_cas.py, tests/test_kubocas_auth.py Switched tests to use httpx clients and updated related assertions and setup.
tests/test_hamt.py, tests/test_branch_anchors.py, tests/performance_tests.py, tests/test_zarr_ipfs.py Consolidated imports to use InMemoryCAS from py_hamt and streamlined import lines.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant fsgs.py
    participant KuboCAS
    participant httpx.AsyncClient
    participant ZarrHAMTStore
    participant xarray

    User->>fsgs.py: Run script
    fsgs.py->>KuboCAS: Create instance (async, uses httpx)
    KuboCAS->>httpx.AsyncClient: Connect to IPFS node
    fsgs.py->>ZarrHAMTStore: Wrap HAMT store
    fsgs.py->>xarray: Open Zarr dataset via store
    xarray-->>fsgs.py: Provide dataset variables
    fsgs.py->>fsgs.py: Query time coordinate, select precipitation
    fsgs.py-->>User: Print results
Loading

Possibly related PRs

  • dClimate/py-hamt#66: Related to session/client management and cleanup in lazily loaded xarray datasets backed by KuboCAS.
  • dClimate/py-hamt#7: Major rewrite of HAMT implementation and storage abstraction that this PR builds upon.
  • dClimate/py-hamt#44: Major async refactor of HAMT core data structures and logic, closely related to this PR's async and client migration.

Suggested reviewers

  • abidsikder
  • Faolain

Poem

🐇
With aiohttp now in the past,
httpx clients run so fast!
HAMTs and stores, imports anew,
Async Zarr queries shining through.
Tests are fresh, the code is clean,
Version bumps to 3.1.
Hopping forward, what a scene!

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Commit Unit Tests in branch fix/codex-async-2
  • Post Copyable Unit Tests in Comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
py_hamt/store_httpx.py (1)

214-227: Consider connection limits for runaway concurrency

httpx.AsyncClient defaults to unlimited simultaneous connections. Because
KuboCAS is often used inside data pipelines, a bad loop could open hundreds
of sockets despite the semaphore on requests.

client = httpx.AsyncClient(
    timeout=60.0,
    limits=httpx.Limits(max_connections=64, max_keepalive_connections=32),
    headers=self._default_headers,
    auth=self._default_auth,
)

Nice-to-have, but protects the daemon and keeps file-descriptor usage sane.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0e528 and cb97a34.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • fsgs.py (1 hunks)
  • py_hamt/__init__.py (2 hunks)
  • py_hamt/encryption_hamt_store.py (1 hunks)
  • py_hamt/store.py (7 hunks)
  • py_hamt/store_httpx.py (1 hunks)
  • py_hamt/zarr_hamt_store.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_kubocas_session.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
py_hamt/zarr_hamt_store.py (1)
py_hamt/store_httpx.py (1)
  • ContentAddressedStore (11-34)
py_hamt/store.py (1)
py_hamt/store_httpx.py (1)
  • aclose (231-246)
py_hamt/store_httpx.py (3)
py_hamt/store.py (10)
  • ContentAddressedStore (11-34)
  • save (26-30)
  • save (47-50)
  • save (313-327)
  • load (33-34)
  • load (52-71)
  • load (329-336)
  • InMemoryCAS (37-71)
  • KuboCAS (74-336)
  • aclose (246-268)
py_hamt/encryption_hamt_store.py (1)
  • get (147-171)
py_hamt/zarr_hamt_store.py (1)
  • get (93-119)
🪛 Ruff (0.11.9)
py_hamt/zarr_hamt_store.py

9-9: py_hamt.store_httpx.ContentAddressedStore imported but unused

Remove unused import: py_hamt.store_httpx.ContentAddressedStore

(F401)

py_hamt/encryption_hamt_store.py

9-9: py_hamt.store_httpx.ContentAddressedStore imported but unused

Remove unused import: py_hamt.store_httpx.ContentAddressedStore

(F401)

py_hamt/store.py

3-3: typing.Dict imported but unused

Remove unused import

(F401)


3-3: typing.Optional imported but unused

Remove unused import

(F401)


5-5: httpx imported but unused

Remove unused import: httpx

(F401)


223-223: Undefined name aiohttp

(F821)

py_hamt/store_httpx.py

3-3: typing.Optional imported but unused

Remove unused import

(F401)


3-3: typing.Union imported but unused

Remove unused import

(F401)

🪛 GitHub Actions: Triggered on push from Faolain to branch/tag fix/codex-async-2
py_hamt/zarr_hamt_store.py

[error] 9-9: Ruff F401: py_hamt.store_httpx.ContentAddressedStore imported but unused.

py_hamt/encryption_hamt_store.py

[error] 9-9: Ruff F401: py_hamt.store_httpx.ContentAddressedStore imported but unused.

py_hamt/store.py

[error] 3-3: Ruff F401: typing.Dict imported but unused.


[error] 3-3: Ruff F401: typing.Optional imported but unused.


[error] 5-5: Ruff F401: httpx imported but unused.


[error] 137-137: Ruff F821: Undefined name aiohttp.


[error] 143-143: Ruff F821: Undefined name aiohttp.


[error] 195-195: Ruff F821: Undefined name aiohttp.


[error] 215-215: Ruff F821: Undefined name aiohttp.


[error] 223-223: Ruff F821: Undefined name aiohttp.


[error] 230-230: Ruff F821: Undefined name aiohttp.


[error] 231-231: Ruff F821: Undefined name aiohttp.


[error] 315-315: Ruff F821: Undefined name aiohttp.


[error] 137-315: Mypy errors: Name "aiohttp" is not defined at multiple lines (137, 143, 194, 215, 223, 230, 231, 315).

py_hamt/store_httpx.py

[error] 3-3: Ruff F401: typing.Optional imported but unused.


[error] 3-3: Ruff F401: typing.Union imported but unused.

🪛 Pylint (3.3.7)
py_hamt/store.py

[error] 223-223: Undefined variable 'aiohttp'

(E0602)

py_hamt/store_httpx.py

[refactor] 44-44: Consider using '{}' instead of a call to 'dict'.

(R1735)


[refactor] 74-74: Too many instance attributes (9/7)

(R0902)


[refactor] 134-134: Too many arguments (8/5)

(R0913)


[refactor] 134-134: Too many positional arguments (6/5)

(R0917)

🔇 Additional comments (7)
pyproject.toml (1)

3-3: Version bump and dependency migration look good.

The version increment from 3.0.1 to 3.1.0 appropriately reflects a minor feature update. The dependency change from aiohttp to httpx aligns with the PR objectives.

Also applies to: 8-8

py_hamt/store.py (2)

88-110: Documentation doesn't match implementation.

The documentation examples show httpx.AsyncClient usage, but the actual implementation still uses aiohttp.ClientSession. This will confuse users trying to follow the examples.

Either complete the migration to httpx or revert the documentation to match the current aiohttp implementation.

Likely an incorrect or invalid review comment.


278-309: Good addition of cleanup logic in del.

The destructor provides proper cleanup of internally created sessions, handling both running and non-running event loops appropriately. The suppression of exceptions during interpreter shutdown is a good practice.

py_hamt/zarr_hamt_store.py (1)

9-9: Import is used in docstring examples - static analysis false positive.

The ContentAddressedStore import is referenced in the docstring example on line 33. This import change correctly aligns with the migration to the store_httpx module.

🧰 Tools
🪛 Ruff (0.11.9)

9-9: py_hamt.store_httpx.ContentAddressedStore imported but unused

Remove unused import: py_hamt.store_httpx.ContentAddressedStore

(F401)

🪛 GitHub Actions: Triggered on push from Faolain to branch/tag fix/codex-async-2

[error] 9-9: Ruff F401: py_hamt.store_httpx.ContentAddressedStore imported but unused.

py_hamt/encryption_hamt_store.py (1)

9-9: Import is used in docstring examples - static analysis false positive.

The ContentAddressedStore import is referenced in the docstring example on line 47. This import change correctly aligns with the migration to the store_httpx module.

🧰 Tools
🪛 Ruff (0.11.9)

9-9: py_hamt.store_httpx.ContentAddressedStore imported but unused

Remove unused import: py_hamt.store_httpx.ContentAddressedStore

(F401)

🪛 GitHub Actions: Triggered on push from Faolain to branch/tag fix/codex-async-2

[error] 9-9: Ruff F401: py_hamt.store_httpx.ContentAddressedStore imported but unused.

py_hamt/__init__.py (1)

3-3: Import change aligns with the migration.

The import from .store_httpx is consistent with the module restructuring.

py_hamt/store_httpx.py (1)

256-274: Destructor may raise RuntimeError under a running loop

Calling asyncio.run() inside __del__ when another event loop is already
running triggers RuntimeError: asyncio.run() cannot be called from a running event loop on some shutdown paths (e.g. pytest).
Guard the call explicitly:

-            if loop is None or not loop.is_running():
-                asyncio.run(self.aclose())
+            if loop is None or not loop.is_running():
+                # use a fresh event loop to avoid RuntimeError
+                new_loop = asyncio.new_event_loop()
+                new_loop.run_until_complete(self.aclose())
+                new_loop.close()

(or simply contextlib.suppress(RuntimeError) around asyncio.run).

"SimpleEncryptedZarrHAMTStore",
]

print("Running py-hamt from source!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove the print statement from module initialization.

Print statements in module initialization will execute every time the module is imported, which is inappropriate for a library and will pollute user applications' stdout.

Apply this diff to remove the print statement:

-
-print("Running py-hamt from source!")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("Running py-hamt from source!")
🤖 Prompt for AI Agents
In py_hamt/__init__.py at line 15, remove the print statement "print('Running
py-hamt from source!')" to prevent unwanted output during module import,
ensuring the library does not produce side effects like printing to stdout when
imported.

fsgs.py Outdated
Comment on lines 5 to 16
import aiohttp
import xarray as xr
from xarray import Dataset

from py_hamt import HAMT, KuboCAS, ZarrHAMTStore

print(f"Python version: {sys.version}")
print(f"Python path: {os.path.dirname(sys.executable)}")


print(f"aiohttp version: {aiohttp.__version__}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Left-over aiohttp dependency contradicts the httpx migration

aiohttp is imported and its version printed, yet the codebase has migrated to httpx.
This:

import aiohttp
...
print(f"aiohttp version: {aiohttp.__version__}")

is now dead weight and re-introduces an unnecessary dependency.

-import aiohttp
...
-print(f"aiohttp version: {aiohttp.__version__}")
+import httpx
+print(f"httpx version: {httpx.__version__}")

Remove the aiohttp import from pyproject.toml/requirements.txt as well if it is no longer required elsewhere.

🤖 Prompt for AI Agents
In fsgs.py around lines 5 to 16, the aiohttp import and its version print
statement are no longer needed due to migration to httpx. Remove the line
importing aiohttp and the print statement referencing aiohttp.__version__.
Additionally, check the project's dependency files like pyproject.toml or
requirements.txt and remove aiohttp from there if it is not used elsewhere.

Comment on lines 281 to 288
# Create multipart form data
files = {"file": data}

# Send the POST request
client = self._loop_client()
response = await client.post(self.rpc_url, files=files)
response.raise_for_status()
cid_str: str = response.json()["Hash"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Provide filename & content-type when uploading to Kubo

IPFS RPC /api/v0/add expects a filename; omitting it works sometimes but is
undocumented and brittle. Pass the full 3-tuple so httpx builds the correct
multipart part.

-            files = {"file": data}
+            files = {
+                "file": ("blob", data, "application/octet-stream")
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create multipart form data
files = {"file": data}
# Send the POST request
client = self._loop_client()
response = await client.post(self.rpc_url, files=files)
response.raise_for_status()
cid_str: str = response.json()["Hash"]
# Create multipart form data
- files = {"file": data}
+ files = {
+ "file": ("blob", data, "application/octet-stream")
+ }
# Send the POST request
client = self._loop_client()
response = await client.post(self.rpc_url, files=files)
response.raise_for_status()
cid_str: str = response.json()["Hash"]
🤖 Prompt for AI Agents
In py_hamt/store_httpx.py around lines 281 to 288, the multipart form data for
uploading to Kubo is missing the filename and content-type, which can cause
unreliable behavior. Update the 'files' dictionary to pass a 3-tuple containing
the filename, the data, and the content-type string so that httpx constructs the
multipart form correctly and meets the IPFS RPC expectations.

@codecov-commenter
Copy link

codecov-commenter commented Jun 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (0b0e528) to head (a06eb49).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #71   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          603       619   +16     
=========================================
+ Hits           603       619   +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
py_hamt/store_httpx.py (1)

281-283: Multipart form should include filename & content-type [duplicate of prior feedback]

IPFS /api/v0/add expects a filename; passing raw bytes is brittle. Use the 3-tuple form:

-files = {"file": data}
+files = {"file": ("blob", data, "application/octet-stream")}
🧹 Nitpick comments (4)
tests/conftest.py (1)

10-13: Add a reasonable timeout to the shared httpx.AsyncClient

Without an explicit timeout the global client can hang indefinitely if a daemon stops responding, stalling the whole test session.

-    async with httpx.AsyncClient() as client:
+    async with httpx.AsyncClient(timeout=httpx.Timeout(10.0)) as client:
tests/test_kubo_cas.py (1)

131-137: Explicit per-test client is fine but optional

Creating a one-off httpx.AsyncClient() solely for coverage is acceptable, though re-using the session fixture would shave a bit of overhead.

tests/test_kubocas_auth.py (1)

49-52: Avoid relying on private AsyncClient._auth.

_auth is an internal attribute and not part of httpx’ public API; future versions may rename or remove it.
Use the documented client.auth property instead:

-assert isinstance(client._auth, httpx.BasicAuth)
+assert isinstance(client.auth, httpx.BasicAuth)
tests/test_async.py (1)

43-46: Tests couple to private internals.

Assertions on cas._client_per_loop bind the test suite to an internal implementation detail, making refactors risky.
Prefer public-API checks (e.g., cas._closed after aclose()) or expose a helper on the class for testability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb97a34 and d741ece.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • fsgs.py (1 hunks)
  • py_hamt/__init__.py (1 hunks)
  • py_hamt/hamt.py (1 hunks)
  • py_hamt/store_httpx.py (7 hunks)
  • tests/conftest.py (2 hunks)
  • tests/performance_tests.py (1 hunks)
  • tests/test_async.py (13 hunks)
  • tests/test_branch_anchors.py (1 hunks)
  • tests/test_hamt.py (1 hunks)
  • tests/test_kubo_cas.py (5 hunks)
  • tests/test_kubocas_auth.py (1 hunks)
  • tests/test_kubocas_session.py (3 hunks)
  • tests/test_zarr_ipfs.py (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • py_hamt/hamt.py
  • tests/performance_tests.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • py_hamt/init.py
  • fsgs.py
  • tests/test_kubocas_session.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
tests/test_hamt.py (1)
py_hamt/store_httpx.py (1)
  • InMemoryCAS (37-71)
tests/test_kubocas_auth.py (2)
py_hamt/store_httpx.py (3)
  • KuboCAS (74-304)
  • _loop_client (213-226)
  • aclose (231-246)
tests/test_kubocas_session.py (1)
  • _maybe_await (10-14)
tests/test_kubo_cas.py (2)
py_hamt/store_httpx.py (1)
  • KuboCAS (74-304)
tests/conftest.py (1)
  • global_client_session (9-13)
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py

[refactor] 134-134: Too many arguments (8/5)

(R0913)


[refactor] 134-134: Too many positional arguments (6/5)

(R0917)

🔇 Additional comments (4)
tests/test_hamt.py (1)

12-12:

✅ Verification successful

Ensure InMemoryCAS is re-exported from the package root

The test now relies on py_hamt.__init__ re-exporting InMemoryCAS. If the symbol is not included in __all__, this import will fail outside the test environment.


🏁 Script executed:

#!/bin/bash
# Confirm that InMemoryCAS is re-exported at package root
rg -n $'^class InMemoryCAS' -g 'py_hamt/__init__.py'
rg -n $'InMemoryCAS' py_hamt/__init__.py

Length of output: 184


InMemoryCAS is correctly re-exported from the package root

The py_hamt/__init__.py file includes both the import and an entry for "InMemoryCAS" in its __all__, so from py_hamt import InMemoryCAS will work as expected.

  • Evidence:
    • Line 3: from .store_httpx import ContentAddressedStore, InMemoryCAS, KuboCAS
    • Line 10: "InMemoryCAS", in the __all__ list
tests/test_branch_anchors.py (1)

4-4:

✅ Verification successful

Export check for blake3_hashfn and friends

blake3_hashfn, HAMT, and InMemoryCAS must all be re-exported by py_hamt/__init__.py. Double-check that the public surface matches the new import to avoid ImportErrors.

Same script as above can be extended:


🏁 Script executed:

#!/bin/bash
rg -n $'blake3_hashfn' py_hamt/__init__.py

Length of output: 104


🏁 Script executed:

#!/bin/bash
rg -n 'HAMT' py_hamt/__init__.py
rg -n 'InMemoryCAS' py_hamt/__init__.py

Length of output: 393


Public API exports are correct

Verified that py_hamt/__init__.py both imports and includes HAMT, blake3_hashfn, and InMemoryCAS in its __all__. No missing exports—no changes needed.

tests/test_zarr_ipfs.py (1)

11-11: Verify that KuboCAS is part of the top-level public API

The test imports KuboCAS from the package root. Make sure __init__.py re-exports it; otherwise users following the tests as examples will hit an import failure.

tests/test_kubo_cas.py (1)

57-63: LGTM – correct use of shared client

Passing the session-scoped global_client_session as client= ensures KuboCAS won’t attempt to close it prematurely. No concerns here.

Comment on lines +196 to +201
if client is not None:
# user supplied → bind it to *their* current loop
self._session_per_loop[asyncio.get_running_loop()] = session
self._owns_session: bool = False
self._client_per_loop[asyncio.get_running_loop()] = client
self._owns_client: bool = False
else:
self._owns_session = True # well create sessions lazily
self._owns_client = True # we'll create clients lazily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

get_running_loop() may raise when called outside an event loop.

If a user supplies an AsyncClient in synchronous code, asyncio.get_running_loop() raises RuntimeError, breaking __init__.

-if client is not None:
-    self._client_per_loop[asyncio.get_running_loop()] = client
+if client is not None:
+    try:
+        loop = asyncio.get_running_loop()
+    except RuntimeError:
+        # Defer binding until first use
+        loop = None
+    if loop is not None:
+        self._client_per_loop[loop] = client
+    self._user_client = client

Then have _loop_client() return self._user_client when present.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In py_hamt/store_httpx.py around lines 196 to 201, calling
asyncio.get_running_loop() when a user supplies an AsyncClient in synchronous
code can raise RuntimeError. To fix this, wrap the get_running_loop() call in a
try-except block to catch RuntimeError and handle it gracefully. Also, modify
the _loop_client() method to return self._user_client directly if it is present,
avoiding reliance on the event loop in that case.

@Faolain Faolain changed the title Fix/codex async 2 refactor: switch from aiohttp to httpx Jun 12, 2025
@Faolain Faolain mentioned this pull request Jun 12, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/performance_tests.py (3)

57-58: Remove needless f-strings

There are no interpolations, so the f prefix only triggers F541.

-rpc_base_url = f"https://ipfs-gateway.dclimate.net"
-gateway_base_url = f"https://ipfs-gateway.dclimate.net"
+rpc_base_url = "https://ipfs-gateway.dclimate.net"
+gateway_base_url = "https://ipfs-gateway.dclimate.net"
🧰 Tools
🪛 Ruff (0.11.9)

57-57: f-string without any placeholders

Remove extraneous f prefix

(F541)


58-58: f-string without any placeholders

Remove extraneous f prefix

(F541)


97-97: Plain string is enough

Same F541 issue as above.

-print(f"\n--- [HAMT] Read Stats ---")
+print("\n--- [HAMT] Read Stats ---")
🧰 Tools
🪛 Ruff (0.11.9)

97-97: f-string without any placeholders

Remove extraneous f prefix

(F541)


50-56: Mark long-running, external benchmark so CI can skip it

test_benchmark_hamt_store pulls a multi-MB dataset from a public gateway; this can easily time-out or rate-limit CI.
Consider guarding it behind a custom marker or env-flag:

@pytest.mark.asyncio(loop_scope="session")
@pytest.mark.performance  # or pytest.skipif(...)
async def test_benchmark_hamt_store():
    ...

and exclude -m "not performance" in the default test command.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d741ece and 2716c60.

📒 Files selected for processing (1)
  • tests/performance_tests.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
tests/performance_tests.py

13-13: numpy imported but unused

Remove unused import: numpy

(F401)


14-14: pandas imported but unused

Remove unused import: pandas

(F401)


16-16: dag_cbor.ipld.IPLDKind imported but unused

Remove unused import: dag_cbor.ipld.IPLDKind

(F401)


17-17: multiformats.CID imported but unused

Remove unused import: multiformats.CID

(F401)


57-57: f-string without any placeholders

Remove extraneous f prefix

(F541)


58-58: f-string without any placeholders

Remove extraneous f prefix

(F541)


74-74: Local variable start is assigned to but never used

Remove assignment to unused variable start

(F841)


97-97: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag fix/codex-async-2
tests/performance_tests.py

[error] 1-1: Pre-commit hook 'end-of-file-fixer' failed and modified the file to fix end of file issues.


[error] 7-98: Ruff linting errors: Import block is un-sorted or un-formatted (I001); unused imports: numpy (F401), pandas (F401), dag_cbor.ipld.IPLDKind (F401), multiformats.CID (F401); f-string without placeholders (F541); local variable 'start' assigned but never used (F841). 8 errors fixable with '--fix' option.


[error] 1-1: Ruff format hook reformatted the file to fix style issues.

Comment on lines 71 to 78
hamt = await HAMT.build(
cas=kubo_cas, root_node_id=root_cid, values_are_bytes=True, read_only=True
)
start = time.perf_counter()
ipfs_ds: xr.Dataset
zhs = ZarrHAMTStore(hamt, read_only=True)
ipfs_ds = xr.open_zarr(store=zhs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Delete dead timing block & unused start variable

The first open/load sequence is repeated a few lines later with proper timing (start_read).
Keeping the earlier one only adds network overhead and triggers F841.

-        start = time.perf_counter()
-        ipfs_ds: xr.Dataset
-        zhs = ZarrHAMTStore(hamt, read_only=True)
-        ipfs_ds = xr.open_zarr(store=zhs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hamt = await HAMT.build(
cas=kubo_cas, root_node_id=root_cid, values_are_bytes=True, read_only=True
)
start = time.perf_counter()
ipfs_ds: xr.Dataset
zhs = ZarrHAMTStore(hamt, read_only=True)
ipfs_ds = xr.open_zarr(store=zhs)
hamt = await HAMT.build(
cas=kubo_cas, root_node_id=root_cid, values_are_bytes=True, read_only=True
)
🧰 Tools
🪛 Ruff (0.11.9)

74-74: Local variable start is assigned to but never used

Remove assignment to unused variable start

(F841)

🤖 Prompt for AI Agents
In tests/performance_tests.py around lines 71 to 78, remove the initial timing
block that assigns the variable 'start' and the associated open/load sequence of
the dataset, as it is redundant and unused. This will eliminate the dead code
and the unused variable warning (F841), ensuring only the properly timed
open/load sequence later in the code remains.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
py_hamt/store_httpx.py (2)

284-290: Multipart upload still omits filename / content-type

Past review pointed out that /api/v0/add expects a 3-tuple (filename, bytes, mime).
files = {"file": data} is still present, so the request may break on some Kubo versions.

-files = {"file": data}
+files = {"file": ("blob", data, "application/octet-stream")}

196-201: ⚠️ Potential issue

Guard asyncio.get_running_loop() to support sync-context construction

If a caller passes an already-initialised httpx.AsyncClient from regular (non-async) code, asyncio.get_running_loop() raises RuntimeError, breaking __init__.
Wrap the call in a try/except and defer the binding until the first async use.

-if client is not None:
-    # user supplied → bind it to *their* current loop
-    self._client_per_loop[asyncio.get_running_loop()] = client
-    self._owns_client: bool = False
+if client is not None:
+    # User-supplied client; try to associate with current loop,
+    # but don’t fail when called outside an event loop.
+    try:
+        loop = asyncio.get_running_loop()
+        self._client_per_loop[loop] = client
+    except RuntimeError:
+        # Defer binding; _loop_client() will fall back to this attr.
+        pass
+    self._user_client = client
+    self._owns_client = False

Make _loop_client() return self._user_client when present so the client is reused irrespective of the loop.

🧹 Nitpick comments (1)
py_hamt/store_httpx.py (1)

235-249: Leak-prone early return in aclose()

When self._owns_client is False we skip both closing and clearing _client_per_loop.
The user may still close their client, yet the lingering strong reference in the dict prevents GC and keeps the mapping alive across event loops.

Consider always purging _client_per_loop to avoid retaining foreign resources:

-        if not self._owns_client:
-            # User supplied the client; they are responsible for closing it.
-            return
+        if not self._owns_client:
+            # User supplied the client; just drop our references.
+            self._client_per_loop.clear()
+            return
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2716c60 and 1703652.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • py_hamt/store_httpx.py (7 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py

[refactor] 134-134: Too many arguments (8/5)

(R0913)


[refactor] 134-134: Too many positional arguments (6/5)

(R0917)

🔇 Additional comments (1)
py_hamt/store_httpx.py (1)

258-276: Destructor may schedule I/O on a closed loop – verify behaviour

__del__ tries to loop.create_task(self.aclose()) when the interpreter is shutting down.
If the loop is already in the process of closing, this can raise RuntimeError: Event loop is closed.

Double-check with real teardown scenarios; a conservative pattern is:

if loop and loop.is_running() and not loop.is_closed():
    loop.call_soon_threadsafe(asyncio.create_task, self.aclose())

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
py_hamt/store_httpx.py (2)

196-201: ⚠️ Potential issue

asyncio.get_running_loop() still unguarded – can crash in sync context
The call is unchanged from the previous review and will raise RuntimeError if a user passes a client while no event-loop is running (e.g. library initialisation in synchronous code).

-if client is not None:
-    # user supplied → bind it to *their* current loop
-    self._client_per_loop[asyncio.get_running_loop()] = client
+if client is not None:
+    # user supplied → bind it to *their* current loop (if any)
+    try:
+        loop = asyncio.get_running_loop()
+    except RuntimeError:
+        # defer binding until first use
+        loop = None
+    if loop is not None:
+        self._client_per_loop[loop] = client
+    self._user_client = client            # remember for _loop_client()

303-305: ⚠️ Potential issue

Multipart upload still omits filename / content-type
The earlier remark has not been applied; Kubo’s /api/v0/add expects a 3-tuple (filename, data, mime). Current code relies on undocumented behaviour and may mis-detect binary blobs.

-            files = {"file": data}
+            files = {"file": ("blob", data, "application/octet-stream")}
🧹 Nitpick comments (1)
py_hamt/store_httpx.py (1)

221-223: Use logging instead of print() for library diagnostics
print() writes to stdout and cannot be suppressed by callers; a standard logging.getLogger(__name__) call honours user log-level configuration.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1703652 and ef588fe.

📒 Files selected for processing (2)
  • py_hamt/store_httpx.py (7 hunks)
  • tests/performance_tests.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/performance_tests.py
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py

[refactor] 134-134: Too many arguments (8/5)

(R0913)


[refactor] 134-134: Too many positional arguments (6/5)

(R0917)

🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag fix/codex-async-2
py_hamt/store_httpx.py

[error] 220-227: Coverage failure: 4 statements missing coverage between lines 220-227. Total coverage 99% is less than required 100%.

Comment on lines 221 to 227
f"\n--- [KuboCAS] Request limit of {self._request_limit_per_client} reached. "
"Retiring HTTP client and creating new one for subsequent requests. ---\n"
)
loop = asyncio.get_running_loop()
self._client_per_loop.pop(loop, None) # graceful retirement.
# Reset counter for the new client that will be created.
self._request_counter = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Retired clients are popped but never closed → file-descriptor leak
_cycle_client_if_needed() removes the current loop’s client from _client_per_loop yet does not aclose() it. After ~900 requests those sockets remain open until process shutdown, defeating the whole point of cycling.

-            loop = asyncio.get_running_loop()
-            self._client_per_loop.pop(loop, None)  # graceful retirement.
+            loop = asyncio.get_running_loop()
+            old_client = self._client_per_loop.pop(loop, None)
+            if old_client is not None and not old_client.is_closed:
+                await old_client.aclose()           # proactive cleanup
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f"\n--- [KuboCAS] Request limit of {self._request_limit_per_client} reached. "
"Retiring HTTP client and creating new one for subsequent requests. ---\n"
)
loop = asyncio.get_running_loop()
self._client_per_loop.pop(loop, None) # graceful retirement.
# Reset counter for the new client that will be created.
self._request_counter = 0
f"\n--- [KuboCAS] Request limit of {self._request_limit_per_client} reached. "
"Retiring HTTP client and creating new one for subsequent requests. ---\n"
)
loop = asyncio.get_running_loop()
old_client = self._client_per_loop.pop(loop, None)
if old_client is not None and not old_client.is_closed:
await old_client.aclose() # proactive cleanup
# Reset counter for the new client that will be created.
self._request_counter = 0
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag fix/codex-async-2

[error] 220-227: Coverage failure: 4 statements missing coverage between lines 220-227. Total coverage 99% is less than required 100%.

🤖 Prompt for AI Agents
In py_hamt/store_httpx.py around lines 221 to 227, the code removes the retired
HTTP client from the _client_per_loop dictionary but does not properly close it,
causing a file descriptor leak. To fix this, retrieve the client instance before
popping it from _client_per_loop and call its aclose() method asynchronously to
close the client and release resources before resetting the request counter.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
py_hamt/store_httpx.py (2)

221-228: ⚠️ Potential issue

Retired HTTP clients are dropped but never closed → FD leak
_cycle_client_if_needed() removes the exhausted client from _client_per_loop yet never aclose()s it. Sockets survive until process shutdown, leaking descriptors after ~900 requests.

-            loop = asyncio.get_running_loop()
-            self._client_per_loop.pop(loop, None)  # graceful retirement.
+            loop = asyncio.get_running_loop()
+            old_client = self._client_per_loop.pop(loop, None)
+            if old_client is not None and not old_client.is_closed:
+                await old_client.aclose()   # proactive cleanup

Close before forgetting to keep resource usage bounded.


197-203: 🛠️ Refactor suggestion

asyncio.get_running_loop() may explode outside an event-loop – make construction robust
__init__ calls asyncio.get_running_loop() unguarded. When KuboCAS is instantiated from sync code this raises RuntimeError, preventing object creation even if the user passes a perfectly valid AsyncClient.

-            self._client_per_loop[asyncio.get_running_loop()] = client
-            self._owns_client: bool = False
+            try:
+                loop = asyncio.get_running_loop()
+            except RuntimeError:
+                # No loop yet – delay binding until the first async use
+                loop = None
+
+            if loop is not None:
+                self._client_per_loop[loop] = client
+
+            self._user_client = client     # remember for later
+            self._owns_client = False

Correspondingly, have _loop_client() short-circuit when a user-supplied client exists:

-        try:
+        if not self._owns_client and getattr(self, "_user_client", None):
+            return self._user_client
+
+        try:
             return self._client_per_loop[loop]

Prevents hard-to-trace init failures.

🧹 Nitpick comments (2)
py_hamt/store_httpx.py (2)

304-306: Send filename & content-type when uploading – avoids flaky Kubo behaviour
IPFS /api/v0/add unofficially tolerates raw bytes, but the documented contract is a 3-tuple (filename, data, mimetype). Passing only bytes can intermittently 400‐out on some gateway builds.

-            files = {"file": data}
+            files = {"file": ("blob", data, "application/octet-stream")}

One-line change, zero behavioural risk.


254-269: aclose() skips user-supplied client – warn callers
When _owns_client is False, aclose() returns early, leaving the caller’s AsyncClient open. That’s fine by contract, but a docstring note (“user must close their client”) would prevent misuse.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef588fe and 711347b.

📒 Files selected for processing (2)
  • py_hamt/store_httpx.py (8 hunks)
  • tests/test_zarr_ipfs.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_zarr_ipfs.py
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py

[refactor] 134-134: Too many arguments (9/5)

(R0913)


[refactor] 134-134: Too many positional arguments (6/5)

(R0917)

🔇 Additional comments (1)
py_hamt/store_httpx.py (1)

278-296: Destructor runs asyncio.run() during shutdown – may raise
Calling asyncio.run() from __del__ while another loop is closing frequently triggers RuntimeError: Event loop is closed. Consider guarding with if not asyncio.get_event_loop_policy().get_event_loop().is_closed() or simply skip cleanup at interpreter teardown (Python already closes sockets).

@Faolain Faolain merged commit 89b8510 into main Jun 16, 2025
2 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants