-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: switch from aiohttp to httpx #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update migrates the codebase from using Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant fsgs.py
participant KuboCAS
participant httpx.AsyncClient
participant ZarrHAMTStore
participant xarray
User->>fsgs.py: Run script
fsgs.py->>KuboCAS: Create instance (async, uses httpx)
KuboCAS->>httpx.AsyncClient: Connect to IPFS node
fsgs.py->>ZarrHAMTStore: Wrap HAMT store
fsgs.py->>xarray: Open Zarr dataset via store
xarray-->>fsgs.py: Provide dataset variables
fsgs.py->>fsgs.py: Query time coordinate, select precipitation
fsgs.py-->>User: Print results
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (1)
py_hamt/store_httpx.py (1)
214-227: Consider connection limits for runaway concurrency
httpx.AsyncClientdefaults to unlimited simultaneous connections. Because
KuboCASis often used inside data pipelines, a bad loop could open hundreds
of sockets despite the semaphore on requests.client = httpx.AsyncClient( timeout=60.0, limits=httpx.Limits(max_connections=64, max_keepalive_connections=32), headers=self._default_headers, auth=self._default_auth, )Nice-to-have, but protects the daemon and keeps file-descriptor usage sane.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
fsgs.py(1 hunks)py_hamt/__init__.py(2 hunks)py_hamt/encryption_hamt_store.py(1 hunks)py_hamt/store.py(7 hunks)py_hamt/store_httpx.py(1 hunks)py_hamt/zarr_hamt_store.py(1 hunks)pyproject.toml(1 hunks)tests/test_kubocas_session.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
py_hamt/zarr_hamt_store.py (1)
py_hamt/store_httpx.py (1)
ContentAddressedStore(11-34)
py_hamt/store.py (1)
py_hamt/store_httpx.py (1)
aclose(231-246)
py_hamt/store_httpx.py (3)
py_hamt/store.py (10)
ContentAddressedStore(11-34)save(26-30)save(47-50)save(313-327)load(33-34)load(52-71)load(329-336)InMemoryCAS(37-71)KuboCAS(74-336)aclose(246-268)py_hamt/encryption_hamt_store.py (1)
get(147-171)py_hamt/zarr_hamt_store.py (1)
get(93-119)
🪛 Ruff (0.11.9)
py_hamt/zarr_hamt_store.py
9-9: py_hamt.store_httpx.ContentAddressedStore imported but unused
Remove unused import: py_hamt.store_httpx.ContentAddressedStore
(F401)
py_hamt/encryption_hamt_store.py
9-9: py_hamt.store_httpx.ContentAddressedStore imported but unused
Remove unused import: py_hamt.store_httpx.ContentAddressedStore
(F401)
py_hamt/store.py
3-3: typing.Dict imported but unused
Remove unused import
(F401)
3-3: typing.Optional imported but unused
Remove unused import
(F401)
5-5: httpx imported but unused
Remove unused import: httpx
(F401)
223-223: Undefined name aiohttp
(F821)
py_hamt/store_httpx.py
3-3: typing.Optional imported but unused
Remove unused import
(F401)
3-3: typing.Union imported but unused
Remove unused import
(F401)
🪛 GitHub Actions: Triggered on push from Faolain to branch/tag fix/codex-async-2
py_hamt/zarr_hamt_store.py
[error] 9-9: Ruff F401: py_hamt.store_httpx.ContentAddressedStore imported but unused.
py_hamt/encryption_hamt_store.py
[error] 9-9: Ruff F401: py_hamt.store_httpx.ContentAddressedStore imported but unused.
py_hamt/store.py
[error] 3-3: Ruff F401: typing.Dict imported but unused.
[error] 3-3: Ruff F401: typing.Optional imported but unused.
[error] 5-5: Ruff F401: httpx imported but unused.
[error] 137-137: Ruff F821: Undefined name aiohttp.
[error] 143-143: Ruff F821: Undefined name aiohttp.
[error] 195-195: Ruff F821: Undefined name aiohttp.
[error] 215-215: Ruff F821: Undefined name aiohttp.
[error] 223-223: Ruff F821: Undefined name aiohttp.
[error] 230-230: Ruff F821: Undefined name aiohttp.
[error] 231-231: Ruff F821: Undefined name aiohttp.
[error] 315-315: Ruff F821: Undefined name aiohttp.
[error] 137-315: Mypy errors: Name "aiohttp" is not defined at multiple lines (137, 143, 194, 215, 223, 230, 231, 315).
py_hamt/store_httpx.py
[error] 3-3: Ruff F401: typing.Optional imported but unused.
[error] 3-3: Ruff F401: typing.Union imported but unused.
🪛 Pylint (3.3.7)
py_hamt/store.py
[error] 223-223: Undefined variable 'aiohttp'
(E0602)
py_hamt/store_httpx.py
[refactor] 44-44: Consider using '{}' instead of a call to 'dict'.
(R1735)
[refactor] 74-74: Too many instance attributes (9/7)
(R0902)
[refactor] 134-134: Too many arguments (8/5)
(R0913)
[refactor] 134-134: Too many positional arguments (6/5)
(R0917)
🔇 Additional comments (7)
pyproject.toml (1)
3-3: Version bump and dependency migration look good.The version increment from 3.0.1 to 3.1.0 appropriately reflects a minor feature update. The dependency change from aiohttp to httpx aligns with the PR objectives.
Also applies to: 8-8
py_hamt/store.py (2)
88-110: Documentation doesn't match implementation.The documentation examples show
httpx.AsyncClientusage, but the actual implementation still usesaiohttp.ClientSession. This will confuse users trying to follow the examples.Either complete the migration to httpx or revert the documentation to match the current aiohttp implementation.
Likely an incorrect or invalid review comment.
278-309: Good addition of cleanup logic in del.The destructor provides proper cleanup of internally created sessions, handling both running and non-running event loops appropriately. The suppression of exceptions during interpreter shutdown is a good practice.
py_hamt/zarr_hamt_store.py (1)
9-9: Import is used in docstring examples - static analysis false positive.The
ContentAddressedStoreimport is referenced in the docstring example on line 33. This import change correctly aligns with the migration to thestore_httpxmodule.🧰 Tools
🪛 Ruff (0.11.9)
9-9:
py_hamt.store_httpx.ContentAddressedStoreimported but unusedRemove unused import:
py_hamt.store_httpx.ContentAddressedStore(F401)
🪛 GitHub Actions: Triggered on push from Faolain to branch/tag fix/codex-async-2
[error] 9-9: Ruff F401:
py_hamt.store_httpx.ContentAddressedStoreimported but unused.py_hamt/encryption_hamt_store.py (1)
9-9: Import is used in docstring examples - static analysis false positive.The
ContentAddressedStoreimport is referenced in the docstring example on line 47. This import change correctly aligns with the migration to thestore_httpxmodule.🧰 Tools
🪛 Ruff (0.11.9)
9-9:
py_hamt.store_httpx.ContentAddressedStoreimported but unusedRemove unused import:
py_hamt.store_httpx.ContentAddressedStore(F401)
🪛 GitHub Actions: Triggered on push from Faolain to branch/tag fix/codex-async-2
[error] 9-9: Ruff F401:
py_hamt.store_httpx.ContentAddressedStoreimported but unused.py_hamt/__init__.py (1)
3-3: Import change aligns with the migration.The import from
.store_httpxis consistent with the module restructuring.py_hamt/store_httpx.py (1)
256-274: Destructor may raiseRuntimeErrorunder a running loopCalling
asyncio.run()inside__del__when another event loop is already
running triggersRuntimeError: asyncio.run() cannot be called from a running event loopon some shutdown paths (e.g. pytest).
Guard the call explicitly:- if loop is None or not loop.is_running(): - asyncio.run(self.aclose()) + if loop is None or not loop.is_running(): + # use a fresh event loop to avoid RuntimeError + new_loop = asyncio.new_event_loop() + new_loop.run_until_complete(self.aclose()) + new_loop.close()(or simply
contextlib.suppress(RuntimeError)aroundasyncio.run).
| "SimpleEncryptedZarrHAMTStore", | ||
| ] | ||
|
|
||
| print("Running py-hamt from source!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the print statement from module initialization.
Print statements in module initialization will execute every time the module is imported, which is inappropriate for a library and will pollute user applications' stdout.
Apply this diff to remove the print statement:
-
-print("Running py-hamt from source!")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("Running py-hamt from source!") |
🤖 Prompt for AI Agents
In py_hamt/__init__.py at line 15, remove the print statement "print('Running
py-hamt from source!')" to prevent unwanted output during module import,
ensuring the library does not produce side effects like printing to stdout when
imported.
fsgs.py
Outdated
| import aiohttp | ||
| import xarray as xr | ||
| from xarray import Dataset | ||
|
|
||
| from py_hamt import HAMT, KuboCAS, ZarrHAMTStore | ||
|
|
||
| print(f"Python version: {sys.version}") | ||
| print(f"Python path: {os.path.dirname(sys.executable)}") | ||
|
|
||
|
|
||
| print(f"aiohttp version: {aiohttp.__version__}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Left-over aiohttp dependency contradicts the httpx migration
aiohttp is imported and its version printed, yet the codebase has migrated to httpx.
This:
import aiohttp
...
print(f"aiohttp version: {aiohttp.__version__}")is now dead weight and re-introduces an unnecessary dependency.
-import aiohttp
...
-print(f"aiohttp version: {aiohttp.__version__}")
+import httpx
+print(f"httpx version: {httpx.__version__}")Remove the aiohttp import from pyproject.toml/requirements.txt as well if it is no longer required elsewhere.
🤖 Prompt for AI Agents
In fsgs.py around lines 5 to 16, the aiohttp import and its version print
statement are no longer needed due to migration to httpx. Remove the line
importing aiohttp and the print statement referencing aiohttp.__version__.
Additionally, check the project's dependency files like pyproject.toml or
requirements.txt and remove aiohttp from there if it is not used elsewhere.
| # Create multipart form data | ||
| files = {"file": data} | ||
|
|
||
| # Send the POST request | ||
| client = self._loop_client() | ||
| response = await client.post(self.rpc_url, files=files) | ||
| response.raise_for_status() | ||
| cid_str: str = response.json()["Hash"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Provide filename & content-type when uploading to Kubo
IPFS RPC /api/v0/add expects a filename; omitting it works sometimes but is
undocumented and brittle. Pass the full 3-tuple so httpx builds the correct
multipart part.
- files = {"file": data}
+ files = {
+ "file": ("blob", data, "application/octet-stream")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Create multipart form data | |
| files = {"file": data} | |
| # Send the POST request | |
| client = self._loop_client() | |
| response = await client.post(self.rpc_url, files=files) | |
| response.raise_for_status() | |
| cid_str: str = response.json()["Hash"] | |
| # Create multipart form data | |
| - files = {"file": data} | |
| + files = { | |
| + "file": ("blob", data, "application/octet-stream") | |
| + } | |
| # Send the POST request | |
| client = self._loop_client() | |
| response = await client.post(self.rpc_url, files=files) | |
| response.raise_for_status() | |
| cid_str: str = response.json()["Hash"] |
🤖 Prompt for AI Agents
In py_hamt/store_httpx.py around lines 281 to 288, the multipart form data for
uploading to Kubo is missing the filename and content-type, which can cause
unreliable behavior. Update the 'files' dictionary to pass a 3-tuple containing
the filename, the data, and the content-type string so that httpx constructs the
multipart form correctly and meets the IPFS RPC expectations.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #71 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 603 619 +16
=========================================
+ Hits 603 619 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
py_hamt/store_httpx.py (1)
281-283: Multipart form should include filename & content-type [duplicate of prior feedback]IPFS
/api/v0/addexpects a filename; passing raw bytes is brittle. Use the 3-tuple form:-files = {"file": data} +files = {"file": ("blob", data, "application/octet-stream")}
🧹 Nitpick comments (4)
tests/conftest.py (1)
10-13: Add a reasonable timeout to the sharedhttpx.AsyncClientWithout an explicit timeout the global client can hang indefinitely if a daemon stops responding, stalling the whole test session.
- async with httpx.AsyncClient() as client: + async with httpx.AsyncClient(timeout=httpx.Timeout(10.0)) as client:tests/test_kubo_cas.py (1)
131-137: Explicit per-test client is fine but optionalCreating a one-off
httpx.AsyncClient()solely for coverage is acceptable, though re-using the session fixture would shave a bit of overhead.tests/test_kubocas_auth.py (1)
49-52: Avoid relying on privateAsyncClient._auth.
_authis an internal attribute and not part of httpx’ public API; future versions may rename or remove it.
Use the documentedclient.authproperty instead:-assert isinstance(client._auth, httpx.BasicAuth) +assert isinstance(client.auth, httpx.BasicAuth)tests/test_async.py (1)
43-46: Tests couple to private internals.Assertions on
cas._client_per_loopbind the test suite to an internal implementation detail, making refactors risky.
Prefer public-API checks (e.g.,cas._closedafteraclose()) or expose a helper on the class for testability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
fsgs.py(1 hunks)py_hamt/__init__.py(1 hunks)py_hamt/hamt.py(1 hunks)py_hamt/store_httpx.py(7 hunks)tests/conftest.py(2 hunks)tests/performance_tests.py(1 hunks)tests/test_async.py(13 hunks)tests/test_branch_anchors.py(1 hunks)tests/test_hamt.py(1 hunks)tests/test_kubo_cas.py(5 hunks)tests/test_kubocas_auth.py(1 hunks)tests/test_kubocas_session.py(3 hunks)tests/test_zarr_ipfs.py(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- py_hamt/hamt.py
- tests/performance_tests.py
🚧 Files skipped from review as they are similar to previous changes (3)
- py_hamt/init.py
- fsgs.py
- tests/test_kubocas_session.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
tests/test_hamt.py (1)
py_hamt/store_httpx.py (1)
InMemoryCAS(37-71)
tests/test_kubocas_auth.py (2)
py_hamt/store_httpx.py (3)
KuboCAS(74-304)_loop_client(213-226)aclose(231-246)tests/test_kubocas_session.py (1)
_maybe_await(10-14)
tests/test_kubo_cas.py (2)
py_hamt/store_httpx.py (1)
KuboCAS(74-304)tests/conftest.py (1)
global_client_session(9-13)
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py
[refactor] 134-134: Too many arguments (8/5)
(R0913)
[refactor] 134-134: Too many positional arguments (6/5)
(R0917)
🔇 Additional comments (4)
tests/test_hamt.py (1)
12-12:✅ Verification successful
Ensure
InMemoryCASis re-exported from the package rootThe test now relies on
py_hamt.__init__re-exportingInMemoryCAS. If the symbol is not included in__all__, this import will fail outside the test environment.
🏁 Script executed:
#!/bin/bash # Confirm that InMemoryCAS is re-exported at package root rg -n $'^class InMemoryCAS' -g 'py_hamt/__init__.py' rg -n $'InMemoryCAS' py_hamt/__init__.pyLength of output: 184
InMemoryCAS is correctly re-exported from the package root
The
py_hamt/__init__.pyfile includes both the import and an entry for"InMemoryCAS"in its__all__, sofrom py_hamt import InMemoryCASwill work as expected.
- Evidence:
- Line 3:
from .store_httpx import ContentAddressedStore, InMemoryCAS, KuboCAS- Line 10:
"InMemoryCAS",in the__all__listtests/test_branch_anchors.py (1)
4-4:✅ Verification successful
Export check for
blake3_hashfnand friends
blake3_hashfn,HAMT, andInMemoryCASmust all be re-exported bypy_hamt/__init__.py. Double-check that the public surface matches the new import to avoidImportErrors.Same script as above can be extended:
🏁 Script executed:
#!/bin/bash rg -n $'blake3_hashfn' py_hamt/__init__.pyLength of output: 104
🏁 Script executed:
#!/bin/bash rg -n 'HAMT' py_hamt/__init__.py rg -n 'InMemoryCAS' py_hamt/__init__.pyLength of output: 393
Public API exports are correct
Verified that
py_hamt/__init__.pyboth imports and includesHAMT,blake3_hashfn, andInMemoryCASin its__all__. No missing exports—no changes needed.tests/test_zarr_ipfs.py (1)
11-11: Verify thatKuboCASis part of the top-level public APIThe test imports
KuboCASfrom the package root. Make sure__init__.pyre-exports it; otherwise users following the tests as examples will hit an import failure.tests/test_kubo_cas.py (1)
57-63: LGTM – correct use of shared clientPassing the session-scoped
global_client_sessionasclient=ensuresKuboCASwon’t attempt to close it prematurely. No concerns here.
| if client is not None: | ||
| # user supplied → bind it to *their* current loop | ||
| self._session_per_loop[asyncio.get_running_loop()] = session | ||
| self._owns_session: bool = False | ||
| self._client_per_loop[asyncio.get_running_loop()] = client | ||
| self._owns_client: bool = False | ||
| else: | ||
| self._owns_session = True # we’ll create sessions lazily | ||
| self._owns_client = True # we'll create clients lazily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_running_loop() may raise when called outside an event loop.
If a user supplies an AsyncClient in synchronous code, asyncio.get_running_loop() raises RuntimeError, breaking __init__.
-if client is not None:
- self._client_per_loop[asyncio.get_running_loop()] = client
+if client is not None:
+ try:
+ loop = asyncio.get_running_loop()
+ except RuntimeError:
+ # Defer binding until first use
+ loop = None
+ if loop is not None:
+ self._client_per_loop[loop] = client
+ self._user_client = clientThen have _loop_client() return self._user_client when present.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In py_hamt/store_httpx.py around lines 196 to 201, calling
asyncio.get_running_loop() when a user supplies an AsyncClient in synchronous
code can raise RuntimeError. To fix this, wrap the get_running_loop() call in a
try-except block to catch RuntimeError and handle it gracefully. Also, modify
the _loop_client() method to return self._user_client directly if it is present,
avoiding reliance on the event loop in that case.
…e connections as aiohttp
… into fix/codex-async-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/performance_tests.py (3)
57-58: Remove needless f-stringsThere are no interpolations, so the
fprefix only triggersF541.-rpc_base_url = f"https://ipfs-gateway.dclimate.net" -gateway_base_url = f"https://ipfs-gateway.dclimate.net" +rpc_base_url = "https://ipfs-gateway.dclimate.net" +gateway_base_url = "https://ipfs-gateway.dclimate.net"🧰 Tools
🪛 Ruff (0.11.9)
57-57: f-string without any placeholders
Remove extraneous
fprefix(F541)
58-58: f-string without any placeholders
Remove extraneous
fprefix(F541)
97-97: Plain string is enoughSame
F541issue as above.-print(f"\n--- [HAMT] Read Stats ---") +print("\n--- [HAMT] Read Stats ---")🧰 Tools
🪛 Ruff (0.11.9)
97-97: f-string without any placeholders
Remove extraneous
fprefix(F541)
50-56: Mark long-running, external benchmark so CI can skip it
test_benchmark_hamt_storepulls a multi-MB dataset from a public gateway; this can easily time-out or rate-limit CI.
Consider guarding it behind a custom marker or env-flag:@pytest.mark.asyncio(loop_scope="session") @pytest.mark.performance # or pytest.skipif(...) async def test_benchmark_hamt_store(): ...and exclude
-m "not performance"in the default test command.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/performance_tests.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
tests/performance_tests.py
13-13: numpy imported but unused
Remove unused import: numpy
(F401)
14-14: pandas imported but unused
Remove unused import: pandas
(F401)
16-16: dag_cbor.ipld.IPLDKind imported but unused
Remove unused import: dag_cbor.ipld.IPLDKind
(F401)
17-17: multiformats.CID imported but unused
Remove unused import: multiformats.CID
(F401)
57-57: f-string without any placeholders
Remove extraneous f prefix
(F541)
58-58: f-string without any placeholders
Remove extraneous f prefix
(F541)
74-74: Local variable start is assigned to but never used
Remove assignment to unused variable start
(F841)
97-97: f-string without any placeholders
Remove extraneous f prefix
(F541)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag fix/codex-async-2
tests/performance_tests.py
[error] 1-1: Pre-commit hook 'end-of-file-fixer' failed and modified the file to fix end of file issues.
[error] 7-98: Ruff linting errors: Import block is un-sorted or un-formatted (I001); unused imports: numpy (F401), pandas (F401), dag_cbor.ipld.IPLDKind (F401), multiformats.CID (F401); f-string without placeholders (F541); local variable 'start' assigned but never used (F841). 8 errors fixable with '--fix' option.
[error] 1-1: Ruff format hook reformatted the file to fix style issues.
tests/performance_tests.py
Outdated
| hamt = await HAMT.build( | ||
| cas=kubo_cas, root_node_id=root_cid, values_are_bytes=True, read_only=True | ||
| ) | ||
| start = time.perf_counter() | ||
| ipfs_ds: xr.Dataset | ||
| zhs = ZarrHAMTStore(hamt, read_only=True) | ||
| ipfs_ds = xr.open_zarr(store=zhs) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Delete dead timing block & unused start variable
The first open/load sequence is repeated a few lines later with proper timing (start_read).
Keeping the earlier one only adds network overhead and triggers F841.
- start = time.perf_counter()
- ipfs_ds: xr.Dataset
- zhs = ZarrHAMTStore(hamt, read_only=True)
- ipfs_ds = xr.open_zarr(store=zhs)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hamt = await HAMT.build( | |
| cas=kubo_cas, root_node_id=root_cid, values_are_bytes=True, read_only=True | |
| ) | |
| start = time.perf_counter() | |
| ipfs_ds: xr.Dataset | |
| zhs = ZarrHAMTStore(hamt, read_only=True) | |
| ipfs_ds = xr.open_zarr(store=zhs) | |
| hamt = await HAMT.build( | |
| cas=kubo_cas, root_node_id=root_cid, values_are_bytes=True, read_only=True | |
| ) |
🧰 Tools
🪛 Ruff (0.11.9)
74-74: Local variable start is assigned to but never used
Remove assignment to unused variable start
(F841)
🤖 Prompt for AI Agents
In tests/performance_tests.py around lines 71 to 78, remove the initial timing
block that assigns the variable 'start' and the associated open/load sequence of
the dataset, as it is redundant and unused. This will eliminate the dead code
and the unused variable warning (F841), ensuring only the properly timed
open/load sequence later in the code remains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
py_hamt/store_httpx.py (2)
284-290: Multipart upload still omits filename / content-typePast review pointed out that
/api/v0/addexpects a 3-tuple(filename, bytes, mime).
files = {"file": data}is still present, so the request may break on some Kubo versions.-files = {"file": data} +files = {"file": ("blob", data, "application/octet-stream")}
196-201:⚠️ Potential issueGuard
asyncio.get_running_loop()to support sync-context constructionIf a caller passes an already-initialised
httpx.AsyncClientfrom regular (non-async) code,asyncio.get_running_loop()raisesRuntimeError, breaking__init__.
Wrap the call in atry/exceptand defer the binding until the first async use.-if client is not None: - # user supplied → bind it to *their* current loop - self._client_per_loop[asyncio.get_running_loop()] = client - self._owns_client: bool = False +if client is not None: + # User-supplied client; try to associate with current loop, + # but don’t fail when called outside an event loop. + try: + loop = asyncio.get_running_loop() + self._client_per_loop[loop] = client + except RuntimeError: + # Defer binding; _loop_client() will fall back to this attr. + pass + self._user_client = client + self._owns_client = FalseMake
_loop_client()returnself._user_clientwhen present so the client is reused irrespective of the loop.
🧹 Nitpick comments (1)
py_hamt/store_httpx.py (1)
235-249: Leak-prone early return inaclose()When
self._owns_clientisFalsewe skip both closing and clearing_client_per_loop.
The user may still close their client, yet the lingering strong reference in the dict prevents GC and keeps the mapping alive across event loops.Consider always purging
_client_per_loopto avoid retaining foreign resources:- if not self._owns_client: - # User supplied the client; they are responsible for closing it. - return + if not self._owns_client: + # User supplied the client; just drop our references. + self._client_per_loop.clear() + return
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
py_hamt/store_httpx.py(7 hunks)pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py
[refactor] 134-134: Too many arguments (8/5)
(R0913)
[refactor] 134-134: Too many positional arguments (6/5)
(R0917)
🔇 Additional comments (1)
py_hamt/store_httpx.py (1)
258-276: Destructor may schedule I/O on a closed loop – verify behaviour
__del__tries toloop.create_task(self.aclose())when the interpreter is shutting down.
If the loop is already in the process of closing, this can raiseRuntimeError: Event loop is closed.Double-check with real teardown scenarios; a conservative pattern is:
if loop and loop.is_running() and not loop.is_closed(): loop.call_soon_threadsafe(asyncio.create_task, self.aclose())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
py_hamt/store_httpx.py (2)
196-201:⚠️ Potential issue
asyncio.get_running_loop()still unguarded – can crash in sync context
The call is unchanged from the previous review and will raiseRuntimeErrorif a user passes a client while no event-loop is running (e.g. library initialisation in synchronous code).-if client is not None: - # user supplied → bind it to *their* current loop - self._client_per_loop[asyncio.get_running_loop()] = client +if client is not None: + # user supplied → bind it to *their* current loop (if any) + try: + loop = asyncio.get_running_loop() + except RuntimeError: + # defer binding until first use + loop = None + if loop is not None: + self._client_per_loop[loop] = client + self._user_client = client # remember for _loop_client()
303-305:⚠️ Potential issueMultipart upload still omits filename / content-type
The earlier remark has not been applied; Kubo’s/api/v0/addexpects a 3-tuple(filename, data, mime). Current code relies on undocumented behaviour and may mis-detect binary blobs.- files = {"file": data} + files = {"file": ("blob", data, "application/octet-stream")}
🧹 Nitpick comments (1)
py_hamt/store_httpx.py (1)
221-223: Use logging instead ofprint()for library diagnostics
print()writes to stdout and cannot be suppressed by callers; a standardlogging.getLogger(__name__)call honours user log-level configuration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
py_hamt/store_httpx.py(7 hunks)tests/performance_tests.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/performance_tests.py
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py
[refactor] 134-134: Too many arguments (8/5)
(R0913)
[refactor] 134-134: Too many positional arguments (6/5)
(R0917)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag fix/codex-async-2
py_hamt/store_httpx.py
[error] 220-227: Coverage failure: 4 statements missing coverage between lines 220-227. Total coverage 99% is less than required 100%.
py_hamt/store_httpx.py
Outdated
| f"\n--- [KuboCAS] Request limit of {self._request_limit_per_client} reached. " | ||
| "Retiring HTTP client and creating new one for subsequent requests. ---\n" | ||
| ) | ||
| loop = asyncio.get_running_loop() | ||
| self._client_per_loop.pop(loop, None) # graceful retirement. | ||
| # Reset counter for the new client that will be created. | ||
| self._request_counter = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retired clients are popped but never closed → file-descriptor leak
_cycle_client_if_needed() removes the current loop’s client from _client_per_loop yet does not aclose() it. After ~900 requests those sockets remain open until process shutdown, defeating the whole point of cycling.
- loop = asyncio.get_running_loop()
- self._client_per_loop.pop(loop, None) # graceful retirement.
+ loop = asyncio.get_running_loop()
+ old_client = self._client_per_loop.pop(loop, None)
+ if old_client is not None and not old_client.is_closed:
+ await old_client.aclose() # proactive cleanup📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| f"\n--- [KuboCAS] Request limit of {self._request_limit_per_client} reached. " | |
| "Retiring HTTP client and creating new one for subsequent requests. ---\n" | |
| ) | |
| loop = asyncio.get_running_loop() | |
| self._client_per_loop.pop(loop, None) # graceful retirement. | |
| # Reset counter for the new client that will be created. | |
| self._request_counter = 0 | |
| f"\n--- [KuboCAS] Request limit of {self._request_limit_per_client} reached. " | |
| "Retiring HTTP client and creating new one for subsequent requests. ---\n" | |
| ) | |
| loop = asyncio.get_running_loop() | |
| old_client = self._client_per_loop.pop(loop, None) | |
| if old_client is not None and not old_client.is_closed: | |
| await old_client.aclose() # proactive cleanup | |
| # Reset counter for the new client that will be created. | |
| self._request_counter = 0 |
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag fix/codex-async-2
[error] 220-227: Coverage failure: 4 statements missing coverage between lines 220-227. Total coverage 99% is less than required 100%.
🤖 Prompt for AI Agents
In py_hamt/store_httpx.py around lines 221 to 227, the code removes the retired
HTTP client from the _client_per_loop dictionary but does not properly close it,
causing a file descriptor leak. To fix this, retrieve the client instance before
popping it from _client_per_loop and call its aclose() method asynchronously to
close the client and release resources before resetting the request counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
py_hamt/store_httpx.py (2)
221-228:⚠️ Potential issueRetired HTTP clients are dropped but never closed → FD leak
_cycle_client_if_needed()removes the exhausted client from_client_per_loopyet neveraclose()s it. Sockets survive until process shutdown, leaking descriptors after ~900 requests.- loop = asyncio.get_running_loop() - self._client_per_loop.pop(loop, None) # graceful retirement. + loop = asyncio.get_running_loop() + old_client = self._client_per_loop.pop(loop, None) + if old_client is not None and not old_client.is_closed: + await old_client.aclose() # proactive cleanupClose before forgetting to keep resource usage bounded.
197-203: 🛠️ Refactor suggestion
asyncio.get_running_loop()may explode outside an event-loop – make construction robust
__init__callsasyncio.get_running_loop()unguarded. WhenKuboCASis instantiated from sync code this raisesRuntimeError, preventing object creation even if the user passes a perfectly validAsyncClient.- self._client_per_loop[asyncio.get_running_loop()] = client - self._owns_client: bool = False + try: + loop = asyncio.get_running_loop() + except RuntimeError: + # No loop yet – delay binding until the first async use + loop = None + + if loop is not None: + self._client_per_loop[loop] = client + + self._user_client = client # remember for later + self._owns_client = FalseCorrespondingly, have
_loop_client()short-circuit when a user-supplied client exists:- try: + if not self._owns_client and getattr(self, "_user_client", None): + return self._user_client + + try: return self._client_per_loop[loop]Prevents hard-to-trace init failures.
🧹 Nitpick comments (2)
py_hamt/store_httpx.py (2)
304-306: Send filename & content-type when uploading – avoids flaky Kubo behaviour
IPFS/api/v0/addunofficially tolerates raw bytes, but the documented contract is a 3-tuple(filename, data, mimetype). Passing only bytes can intermittently 400‐out on some gateway builds.- files = {"file": data} + files = {"file": ("blob", data, "application/octet-stream")}One-line change, zero behavioural risk.
254-269:aclose()skips user-supplied client – warn callers
When_owns_clientisFalse,aclose()returns early, leaving the caller’sAsyncClientopen. That’s fine by contract, but a docstring note (“user must close their client”) would prevent misuse.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
py_hamt/store_httpx.py(8 hunks)tests/test_zarr_ipfs.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_zarr_ipfs.py
🧰 Additional context used
🪛 Pylint (3.3.7)
py_hamt/store_httpx.py
[refactor] 134-134: Too many arguments (9/5)
(R0913)
[refactor] 134-134: Too many positional arguments (6/5)
(R0917)
🔇 Additional comments (1)
py_hamt/store_httpx.py (1)
278-296: Destructor runsasyncio.run()during shutdown – may raise
Callingasyncio.run()from__del__while another loop is closing frequently triggersRuntimeError: Event loop is closed. Consider guarding withif not asyncio.get_event_loop_policy().get_event_loop().is_closed()or simply skip cleanup at interpreter teardown (Python already closes sockets).
From Claude Code:
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Chores