-
Notifications
You must be signed in to change notification settings - Fork 0
fix: Crypto #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Crypto #26
Conversation
WalkthroughThis update enhances the Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client Code
participant H as HAMT.__new__
participant TH as TransformedHamt
participant N as HAMT
C->>H: Instantiate HAMT (with transformer?)
alt Transformer provided
H-->>TH: Return TransformedHamt instance
else
H-->>N: Return HAMT instance
end
sequenceDiagram
participant T as Test Case
participant S as HAMT Store
participant E as Encryption Transformer
T->>S: __setitem__(key, value)
S->>E: Apply transformer_encode on value
E-->>S: Return encoded bytes
T->>S: __getitem__(key)
S->>E: Apply transformer_decode on retrieved bytes
E-->>S: Return original value
T-->>S: Validate retrieved data
Possibly related PRs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/test_zarr_encryption_ipfs.py (2)
155-159: Consider storing “crypto” attribute in the Zarr metadata.
Currently, you’re adding the encryption attribute directly to the xarray objects in memory, but the TODO note suggests that you aim to have this stored in Zarr metadata. You may explore xarray’s “encoding” dictionary when calling .to_zarr() or upon creating variables, which can persist custom metadata to Zarr.
234-235: Use more specific exception handling and valid attribute access.
Static analysis highlights two concerns:
• Calling pytest.raises(Exception) is considered bad practice. Prefer a more specific exception type.
• Accessing loaded_failure["temp"].values purely to trigger an exception can be seen as “useless” by linter rules. It may help to capture that value or clarify its usage.Below is a suggested fix, specifying a more targeted exception and assigning the attribute to a dummy variable:
- with pytest.raises(Exception): - loaded_failure["temp"].values + with pytest.raises(ValueError): # or any relevant specific exception + _ = loaded_failure["temp"].values🧰 Tools
🪛 Ruff (0.8.2)
234-234:
pytest.raises(Exception)should be considered evil(B017)
235-235: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
py_hamt/hamt.py (2)
126-135: Remove or clarify unused code.
This block of code has been commented out. If it’s no longer needed, consider removing it entirely to keep the codebase clear and maintainable. If you plan to revive it, please add a TODO comment stating the next steps.
562-565: Consider a separate decoding utility for clarity.
The inline decryption logic is functional, but factoring the decode operation into a helper method may improve testability and readability, especially if additional decoding steps become necessary in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
py_hamt/hamt.py(7 hunks)pyproject.toml(1 hunks)tests/test_zarr_encryption_ipfs.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
234-234: pytest.raises(Exception) should be considered evil
(B017)
235-235: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto-mod
tests/test_zarr_encryption_ipfs.py
[error] 174-174: TypeError: Unsupported type for store_like: 'HAMT'
[warning] 197-197: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
🔇 Additional comments (4)
py_hamt/hamt.py (3)
217-220: Ensure transformers meet concurrency expectations.
The new fields (transformer_encode and transformer_decode) enable custom transformations on inserted/retrieved values. If these transformers hold any state, be cautious of potential threading issues since the HAMT can be accessed concurrently.
268-269: Validate default transformer parameters.
Providing None by default for transformers is clear, but ensure the upstream usage checks for and gracefully handles the case when no transformation is required.
363-367: Confirm the usage of raw storing.
You’re encoding values with dag-cbor, transforming them, then calling store.save_raw(). This is valid if your store supports raw bytes, but verify that you truly want to skip the DAG-CBOR layer for retrieval.pyproject.toml (1)
33-37: Dev-dependency on pycryptodome looks good.
This addition aligns well with your new encryption functionality. No issues found.
| transformer_encode=encode, | ||
| transformer_decode=decode, | ||
| ) | ||
| test_ds.to_zarr(store=hamt1, mode="w") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address “Unsupported type for store_like” pipeline error.
According to the pipeline failure logs, xarray raises a “TypeError: Unsupported type for store_like: 'HAMT'” on this line. You may need to provide a store implementation that xarray recognizes, or adapt the HAMT object to fulfill xarray’s requirements for a Zarr store-like.
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto-mod
[error] 174-174: TypeError: Unsupported type for store_like: 'HAMT'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (2)
py_hamt/zarr_encryption_transformer.py (2)
28-42: Docstring mismatch with function signature.
The docstring references parameters like “key_hash” that don’t exist in the actual function signature. This can cause confusion for future maintainers.You might consider updating the docstring to reflect the correct parameters and usage for “create_zarr_encryption_transformers”.
58-69: Consider large-object processing.
You are loading the entire byte array into memory (via io.BytesIO), which may severely impact memory usage when encrypting large datasets. Chunk-based processing or streaming encryption could be more efficient.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
py_hamt/__init__.py(1 hunks)py_hamt/hamt.py(6 hunks)py_hamt/zarr_encryption_transformer.py(1 hunks)pyproject.toml(1 hunks)tests/test_zarr_encryption_ipfs.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- py_hamt/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
153-153: pytest.raises(Exception) should be considered evil
(B017)
154-154: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto-mod
tests/test_zarr_encryption_ipfs.py
[error] 93-93: TypeError: Unsupported type for store_like: 'HAMT'
🔇 Additional comments (5)
py_hamt/hamt.py (2)
205-208: LGTM! Well-documented transformer attributes.The transformer function attributes are well-typed and clearly documented.
205-553: Verify transformer function signatures in the constructor.It's important to validate that the provided transformer functions match the expected signature to catch potential issues early.
Apply this diff to add signature validation:
def __init__( self, store: Store, hash_fn: Callable[[bytes], bytes] = blake3_hashfn, read_only: bool = False, root_node_id: IPLDKind = None, max_cache_size_bytes=10_000_000, # default to 10 megabytes transformer_encode: None | Callable[[str, bytes], bytes] = None, transformer_decode: None | Callable[[str, bytes], bytes] = None, ): + def validate_transformer(func: Callable | None, name: str) -> None: + if func is not None: + import inspect + sig = inspect.signature(func) + if list(sig.parameters.keys()) != ['key', 'value']: + raise ValueError( + f"{name} must have exactly two parameters: 'key' and 'value'" + ) + if sig.return_annotation != bytes and sig.return_annotation != inspect.Signature.empty: + raise ValueError( + f"{name} must return bytes" + ) + + validate_transformer(transformer_encode, 'transformer_encode') + validate_transformer(transformer_decode, 'transformer_decode') self.store = store self.hash_fn = hash_fnpy_hamt/zarr_encryption_transformer.py (2)
48-53: Validate the key path logic.
The helper function _should_transform_key only checks the first path segment against encrypted_vars. This may lead to edge cases if the data is organized in deeper paths or multiple grouping levels.Would you like to run a repository-wide check to see whether keys can have multiple nested segments? If needed, we can refine the logic to match the relevant portion of the path. Here is a script that searches for Zarr path usage:
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Recursively search for usage of Zarr paths and print relevant context. rg -A 5 '[.]to_zarr|open_zarr'Length of output: 4765
Key path logic verified against repository usage.
The repository-wide search (using the rg command) did not reveal any cases where Zarr keys contain multiple nested segments. The tests (in files like tests/test_zarr_ipfs.py and tests/test_zarr_encryption_ipfs.py) consistently use top-level keys (e.g. "temp", "precip"), which aligns with the current _should_transform_key implementation.
- The function correctly excludes paths based on the file name and index pattern before checking the first segment.
- No evidence was found in the repository to suggest that keys with deeper nesting are used or required.
If future requirements arise that involve multi-level nested keys, the logic may need to be revisited.
64-78: Verify 24-byte nonce support.
ChaCha20-Poly1305 typically uses a 96-bit (12-byte) nonce. You appear to be doing XChaCha20, which can use a 192-bit (24-byte) nonce. Some libraries or versions might not fully support XChaCha20.✅ Verification successful
🌐 Web query:
Is PyCryptodome’s ChaCha20_Poly1305 implementation documented to support 24-byte nonces (XChaCha20)?💡 Result:
Yes, PyCryptodome's `ChaCha20_Poly1305` implementation explicitly supports 24-byte nonces for XChaCha20-Poly1305 according to its documentation[1][7]. The library distinguishes between three variants based on nonce length: | Nonce Length | Variant | Standardization Status | |--------------|-----------------------|---------------------------| | 8 bytes | Original ChaCha20 | Bernstein's specification | | 12 bytes | RFC7539 TLS version | Standardized | | **24 bytes** | **XChaCha20-Poly1305**| Draft specification[1][7] | The 24-byte nonce implementation follows the extended nonce construction described in security literature[3][5], despite the IETF standardization draft having expired[3][8]. When using this variant: - Requires passing a **24-byte nonce** to `new()`[1][4] - Uses HChaCha20 internally to derive subkeys from extended nonces[3] - Maintains compatibility with other implementations like libsodium[3][8] The documentation explicitly states this support in both API references[1][7] and code examples[4], though notes it's still technically a draft specification[1][7]. Security analysis suggests this construction reduces nonce reuse risks compared to shorter variants[3][8]. Citations: - 1: https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20_poly1305.html - 2: https://docs.rs/chacha20poly1305/latest/chacha20poly1305/ - 3: https://www.scottbrady.io/c-sharp/xchacha20-poly1305-dotnet - 4: https://binref.github.io/units/crypto/cipher/chacha.html - 5: https://en.wikipedia.org/wiki/ChaCha20-Poly1305 - 6: https://github.com/Legrandin/pycryptodome/issues/811 - 7: https://github.com/Legrandin/pycryptodome/blob/master/Doc/src/cipher/chacha20_poly1305.rst - 8: https://github.com/openbao/openbao/issues/235 - 9: https://binref.github.io - 10: https://groups.google.com/g/pycryptodome/c/4OSIWNzjq_w
Verification: 24-byte nonce support confirmed.
The PyCryptodome documentation confirms that its ChaCha20_Poly1305 implementation explicitly supports a 24-byte nonce (XChaCha20-Poly1305). The snippet is correct and uses the 24-byte nonce appropriately.tests/test_zarr_encryption_ipfs.py (1)
75-78: Retain consistent indexing of encryption attributes.
You manually set "crypto" attributes on "temp" and "precip" variables, which might need consistent handling across all relevant variables. If future expansions add more variables, you may risk missing encryption.Would you like to check if any other variables or coordinates require encryption attributes? This script can help identify all variable names and see if we missed any:
✅ Verification successful
🏁 Script executed:
#!/bin/bash rg -A 5 'xr.Dataset|xr.DataArray'Length of output: 1667
I'm going to dig deeper into the repository to check if any other variables or coordinates receive encryption attributes—or if there’s any inconsistency in setting them apart from "precip" and "temp".
🏁 Script executed:
#!/bin/bash # Search for all occurrences where the "crypto" attribute is set in the codebase. rg '\.attrs\["crypto"\]' -n .Length of output: 233
Encryption attribute handling is consistent.
After checking the codebase, only "precip" and "temp" in tests/test_zarr_encryption_ipfs.py have been assigned crypto attributes. No other variables or coordinates set these attributes, confirming that the manual handling is currently applied consistently.
| transformer_encode: None | Callable[[str, bytes], bytes] = None, | ||
| transformer_decode: None | Callable[[str, bytes], bytes] = None, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update __deepcopy__ to include transformer functions.
The __deepcopy__ method (lines 281-295) needs to be updated to include the transformer functions when creating a copy of the HAMT instance.
Apply this diff to fix the issue:
def __deepcopy__(self, memo) -> "HAMT":
if not self.read_only:
self.lock.acquire(blocking=True)
copy_hamt = HAMT(
store=self.store,
hash_fn=self.hash_fn,
read_only=self.read_only,
root_node_id=self.root_node_id,
+ transformer_encode=self.transformer_encode,
+ transformer_decode=self.transformer_decode,
)
if not self.read_only:
self.lock.release()
return copy_hamtAlso applies to: 269-271
| def __init__(self, encryption_key: str): | ||
| self.encryption_key = encryption_key | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unused constructor.
The init method is unused and likely dead code. This could create confusion, as the transcription suggests it might be a leftover.
Apply this diff to remove the unused constructor:
- def __init__(self, encryption_key: str):
- self.encryption_key = encryption_key📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, encryption_key: str): | |
| self.encryption_key = encryption_key |
| with pytest.raises(Exception): | ||
| loaded_failure["temp"].values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use a more specific exception type.
Using pytest.raises(Exception) is overly broad, and static analysis flags the attribute access as “useless.” A better practice is to specify the exact exception type expected when accessing encrypted data without the transformer.
Example fix:
- with pytest.raises(Exception):
- loaded_failure["temp"].values
+ with pytest.raises(ValueError):
+ _ = loaded_failure["temp"].values📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with pytest.raises(Exception): | |
| loaded_failure["temp"].values | |
| with pytest.raises(ValueError): | |
| _ = loaded_failure["temp"].values |
🧰 Tools
🪛 Ruff (0.8.2)
153-153: pytest.raises(Exception) should be considered evil
(B017)
154-154: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #26 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 3 4 +1
Lines 391 436 +45
=========================================
+ Hits 391 436 +45 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
py_hamt/zarr_encryption_transformer.py (1)
54-56:⚠️ Potential issueRemove unused constructor.
The
__init__method is unused and likely dead code. This could create confusion, as it suggests it might be a leftover.Apply this diff to remove the unused constructor:
- def __init__(self, encryption_key: str): - self.encryption_key = encryption_keytests/test_zarr_encryption_ipfs.py (1)
160-161: 🛠️ Refactor suggestionUse a more specific exception type.
Using
pytest.raises(Exception)is overly broad, and the attribute access is flagged as useless.- with pytest.raises(Exception): - loaded_failure["temp"].values + with pytest.raises(ValueError): + _ = loaded_failure["temp"].values🧰 Tools
🪛 Ruff (0.8.2)
160-160:
pytest.raises(Exception)should be considered evil(B017)
161-161: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
🧹 Nitpick comments (2)
py_hamt/zarr_encryption_transformer.py (1)
78-82: Remove commented out code.The commented out code block is unused and should be removed to maintain code cleanliness.
- # if out is not None: - # outbuf = io.BytesIO(plaintext) - # outbuf.readinto(out) - # return outtests/test_zarr_encryption_ipfs.py (1)
74-77: Document the workaround for zarray configuration.The TODO comment indicates a limitation with xarray. Consider documenting this workaround in the code comments or README to help future maintainers.
- # TODO: THIS SHOULD BE DONE IN THE ZARRAY but it doesn't appear like xarray allows this + # Note: Ideally, this configuration should be in the zarray metadata. + # However, xarray currently doesn't provide an API to modify zarray metadata directly. + # This is a workaround by setting the crypto attribute on the variables themselves.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
py_hamt/hamt.py(5 hunks)py_hamt/zarr_encryption_transformer.py(1 hunks)pyproject.toml(2 hunks)tests/test_zarr_encryption_ipfs.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- py_hamt/hamt.py
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
160-160: pytest.raises(Exception) should be considered evil
(B017)
161-161: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto-mod
py_hamt/zarr_encryption_transformer.py
[error] 51-55: Coverage failure: total of 99 is less than fail-under=100. Lines 51 and 55 are missing coverage.
🔇 Additional comments (4)
py_hamt/zarr_encryption_transformer.py (3)
1-22: LGTM!The imports are appropriate, and the exclusion list is well-documented with clear explanations for each excluded file type.
24-30: LGTM!The type definition and function signature are well-structured with appropriate type hints.
57-68: LGTM!The encryption implementation is secure, using appropriate nonce size and authenticated encryption with ChaCha20-Poly1305.
tests/test_zarr_encryption_ipfs.py (1)
18-68: LGTM!The fixture is well-structured with appropriate test data generation and cleanup. The docstring clearly explains the return values.
| def _should_transform_key(key: str) -> bool: | ||
| if Path(key).name in encryption_exclude_files: | ||
| return False | ||
| if INDEX_KEY_PATTERN.fullmatch(key): | ||
| return False | ||
| return key.split("/")[0] in encrypted_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add test coverage for _should_transform_key function.
The pipeline indicates missing coverage for the _should_transform_key function. Add test cases to cover both the index key pattern match and variable name check.
🏁 Script executed:
#!/bin/bash
# Description: Verify test coverage for _should_transform_key function
# Test: Search for existing test cases
rg -A 5 "_should_transform_key"Length of output: 1487
Action: Add dedicated test cases for _should_transform_key function
The search confirms that _should_transform_key is only referenced within the transformer file and isn’t directly covered by any unit tests. To ensure proper behavior, please add tests that validate:
- The function returns False when Path(key).name is in encryption_exclude_files.
- The function returns False when INDEX_KEY_PATTERN.fullmatch(key) is true.
- The function returns the expected boolean from key.split("/")[0] in encrypted_vars for all other cases.
| """An encryption filter for ZARR data. | ||
| This class is serialized and stored along with the Zarr it is used with, so instead | ||
| of storing the encryption key, we store the hash of the encryption key, so it can be | ||
| looked up in the key registry at run time as needed. | ||
| Parameters | ||
| ---------- | ||
| key_hash: str | ||
| The hex digest of an encryption key. A key may be generated using | ||
| :func:`generate_encryption_key`. The hex digest is obtained by registering the | ||
| key using :func:`register_encryption_key`. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Update docstring to match function parameters.
The docstring refers to a key_hash parameter, but the function accepts encryption_key. Update the docstring to reflect the actual parameters.
- Parameters
- ----------
- key_hash: str
- The hex digest of an encryption key. A key may be generated using
- :func:`generate_encryption_key`. The hex digest is obtained by registering the
- key using :func:`register_encryption_key`.
+ Parameters
+ ----------
+ encryption_key: bytes
+ The encryption key used for ChaCha20-Poly1305 cipher.
+ encrypted_vars: [str]
+ List of variable names to be encrypted.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """An encryption filter for ZARR data. | |
| This class is serialized and stored along with the Zarr it is used with, so instead | |
| of storing the encryption key, we store the hash of the encryption key, so it can be | |
| looked up in the key registry at run time as needed. | |
| Parameters | |
| ---------- | |
| key_hash: str | |
| The hex digest of an encryption key. A key may be generated using | |
| :func:`generate_encryption_key`. The hex digest is obtained by registering the | |
| key using :func:`register_encryption_key`. | |
| """ | |
| """An encryption filter for ZARR data. | |
| This class is serialized and stored along with the Zarr it is used with, so instead | |
| of storing the encryption key, we store the hash of the encryption key, so it can be | |
| looked up in the key registry at run time as needed. | |
| Parameters | |
| ---------- | |
| encryption_key: bytes | |
| The encryption key used for ChaCha20-Poly1305 cipher. | |
| encrypted_vars: [str] | |
| List of variable names to be encrypted. | |
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
py_hamt/zarr_encryption_transformer.py(1 hunks)
🔇 Additional comments (4)
py_hamt/zarr_encryption_transformer.py (4)
1-21: LGTM! Well-structured imports and constants.The imports are properly organized, and the encryption exclusion list is well-documented with clear explanations for each excluded file.
23-23: LGTM! Clear type definition.The TransformerFN type alias clearly defines the expected signature for transformer functions.
30-40: Update docstring to match function parameters.The docstring still refers to incorrect parameters and needs to be updated.
45-48: Add test coverage for _should_transform_key function.The function still lacks test coverage as previously noted.
| def decode(key: str, val: bytes): | ||
| if not _should_transform_key(key): | ||
| return val | ||
|
|
||
| nonce, tag, ciphertext = val[:24], val[24:40], val[40:] | ||
| cipher = ChaCha20_Poly1305.new(key=encryption_key, nonce=nonce) | ||
| cipher.update(header) | ||
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for malformed input in decode function.
The decode function should handle potential errors when processing malformed input:
- Input shorter than expected (< 40 bytes)
- Invalid tag during verification
def decode(key: str, val: bytes):
if not _should_transform_key(key):
return val
+ if len(val) < 40:
+ raise ValueError("Input too short: encrypted data must be at least 40 bytes")
+
nonce, tag, ciphertext = val[:24], val[24:40], val[40:]
cipher = ChaCha20_Poly1305.new(key=encryption_key, nonce=nonce)
cipher.update(header)
- plaintext = cipher.decrypt_and_verify(ciphertext, tag)
+ try:
+ plaintext = cipher.decrypt_and_verify(ciphertext, tag)
+ except ValueError as e:
+ raise ValueError(f"Decryption failed: {str(e)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def decode(key: str, val: bytes): | |
| if not _should_transform_key(key): | |
| return val | |
| nonce, tag, ciphertext = val[:24], val[24:40], val[40:] | |
| cipher = ChaCha20_Poly1305.new(key=encryption_key, nonce=nonce) | |
| cipher.update(header) | |
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | |
| def decode(key: str, val: bytes): | |
| if not _should_transform_key(key): | |
| return val | |
| if len(val) < 40: | |
| raise ValueError("Input too short: encrypted data must be at least 40 bytes") | |
| nonce, tag, ciphertext = val[:24], val[24:40], val[40:] | |
| cipher = ChaCha20_Poly1305.new(key=encryption_key, nonce=nonce) | |
| cipher.update(header) | |
| try: | |
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | |
| except ValueError as e: | |
| raise ValueError(f"Decryption failed: {str(e)}") |
| # if out is not None: | ||
| # outbuf = io.BytesIO(plaintext) | ||
| # outbuf.readinto(out) | ||
| # return out | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove commented out code.
The commented out code should be removed as it's not being used and could create confusion.
- # if out is not None:
- # outbuf = io.BytesIO(plaintext)
- # outbuf.readinto(out)
- # return out📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # if out is not None: | |
| # outbuf = io.BytesIO(plaintext) | |
| # outbuf.readinto(out) | |
| # return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/test_zarr_encryption_ipfs.py (1)
160-161:⚠️ Potential issueUse a more specific exception type.
The static analysis tool correctly flags the use of a broad Exception type. This was previously noted in past review comments and should be addressed.
Apply this fix:
-with pytest.raises(Exception): - loaded_failure["temp"].values +with pytest.raises(ValueError, match="Encryption transformer required"): + _ = loaded_failure["temp"].values🧰 Tools
🪛 Ruff (0.8.2)
160-160:
pytest.raises(Exception)should be considered evil(B017)
161-161: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
🧹 Nitpick comments (3)
tests/test_zarr_encryption_ipfs.py (3)
18-68: Add type hints and data validation.The fixture is well-structured, but consider these improvements:
- Add type hints to the function parameters and variables.
- Add validation for realistic data ranges (e.g., temperature and precipitation values).
Example improvements:
@pytest.fixture -def random_zarr_dataset(): +def random_zarr_dataset() -> tuple[str, xr.Dataset]: """Creates a random xarray Dataset and saves it to a temporary zarr store. Returns: tuple: (dataset_path, expected_data) - dataset_path: Path to the zarr store - expected_data: The original xarray Dataset for comparison """ # Create temporary directory for zarr store temp_dir = tempfile.mkdtemp() zarr_path = os.path.join(temp_dir, "test.zarr") # Coordinates of the random data times = pd.date_range("2024-01-01", periods=100) lats = np.linspace(-90, 90, 18) lons = np.linspace(-180, 180, 36) # Create random variables with different shapes - temp = np.random.randn(len(times), len(lats), len(lons)) + # Generate temperatures between -50°C and 50°C + temp = np.random.uniform(-50, 50, size=(len(times), len(lats), len(lons))) - precip = np.random.gamma(2, 0.5, size=(len(times), len(lats), len(lons))) + # Generate non-negative precipitation values with realistic distribution + precip = np.clip(np.random.gamma(2, 0.5, size=(len(times), len(lats), len(lons))), 0, None)
70-70: Add docstring to test function.Add a descriptive docstring explaining the test's purpose, parameters, and expected behavior.
Example:
def test_upload_then_read(random_zarr_dataset: tuple[str, xr.Dataset]): + """Test ZARR dataset upload and read with and without encryption. + + Tests the following scenarios: + 1. Upload and read with encryption + 2. Upload and read without encryption + 3. Attempt to read encrypted data without transformer + + Args: + random_zarr_dataset (tuple[str, xr.Dataset]): Fixture providing zarr path and dataset + """
81-106: Improve performance measurement structure.The performance measurement code could be more structured and reusable.
Consider extracting the timing logic into a helper function:
+def measure_operation(operation_name: str, func: callable, *args, **kwargs) -> float: + """Measure the execution time of an operation. + + Args: + operation_name (str): Name of the operation being measured + func (callable): Function to measure + *args, **kwargs: Arguments to pass to the function + + Returns: + float: Execution time in seconds + """ + start_time = time.time() + result = func(*args, **kwargs) + total_time = time.time() - start_time + print(f"{operation_name} took {total_time:.2f} seconds") + return result + start_time = time.time() encryption_key = get_random_bytes(32) encode, decode = create_zarr_encryption_transformers( encryption_key, ["temp", "precip"] ) -hamt1 = HAMT( - store=IPFSStore(pin_on_add=False), - transformer_encode=encode, - transformer_decode=decode, -) -test_ds.to_zarr(store=hamt1, mode="w") - -end_time = time.time() -total_time = end_time - start_time -print(f"Adding with encryption took {total_time:.2f} seconds") +hamt1 = measure_operation( + "Adding with encryption", + lambda: HAMT( + store=IPFSStore(pin_on_add=False), + transformer_encode=encode, + transformer_decode=decode, + ) +) +test_ds.to_zarr(store=hamt1, mode="w")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_zarr_encryption_ipfs.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
160-160: pytest.raises(Exception) should be considered evil
(B017)
161-161: Found useless attribute access. Either assign it to a variable or remove it.
(B018)
🔇 Additional comments (2)
tests/test_zarr_encryption_ipfs.py (2)
1-16: LGTM!The imports are well-organized and appropriate for testing ZARR data encryption/decryption functionality.
74-77: Address TODO comment about crypto attributes.The TODO comment indicates a limitation in xarray's functionality. Consider:
- Opening an issue to track this limitation.
- Adding a comment explaining why this workaround is necessary.
Would you like me to help create an issue to track this limitation or suggest alternative approaches?
Summary by CodeRabbit
New Features
• Enhanced core storage functionality to support customizable value transformations during data write and retrieval.
• Integrated secure encryption/decryption support for dataset handling.
Tests
• Added comprehensive tests to validate secure handling, upload, and retrieval flows, ensuring proper error management when encryption prerequisites aren’t met.
Chores
• Updated the development configuration with an additional cryptographic dependency.
• Expanded the public API to include new encryption transformer functionality.