-
Notifications
You must be signed in to change notification settings - Fork 0
fix: py-hamt crypto #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request refactors the HAMT implementation by introducing a factory method in the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HAMT
participant HAMTOriginal
participant TransformedHamt
Client->>HAMT: Instantiate HAMT(store, transformer?)
alt Transformer Provided
HAMT-->>TransformedHamt: Return instance with transformation logic
else
HAMT-->>HAMTOriginal: Return standard instance
end
sequenceDiagram
participant Test as Testing Framework
participant EncryptionFilter
participant IPFS
Test->>EncryptionFilter: Upload dataset (with encryption)
EncryptionFilter->>IPFS: Send encrypted data
Test->>EncryptionFilter: Download dataset (expect decryption)
EncryptionFilter->>IPFS: Request encrypted data
alt Correct Key
EncryptionFilter-->>Test: Return decrypted dataset
else Incorrect Key
EncryptionFilter-->>Test: Raise decryption exception
end
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
py_hamt/hamt.py (1)
615-638: Add type hints and improve error handling in TransformedHamt.The implementation needs improvements:
- Add type hints for the transformer parameter.
- Add validation for the transformer object.
Apply this diff to improve the implementation:
- def __init__(self, *args, transformer=None, **kwargs): + def __init__(self, *args, transformer: object = None, **kwargs): + if transformer is not None and not hasattr(transformer, 'encode') or not hasattr(transformer, 'decode'): + raise ValueError("transformer must implement encode and decode methods") super().__init__(*args, **kwargs) self.transformer = transformer
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
py_hamt/__init__.py(1 hunks)py_hamt/hamt.py(2 hunks)pyproject.toml(1 hunks)tests/test_zarr_encryption_ipfs.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
178-178: pytest.raises(Exception) should be considered evil
(B017)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
tests/test_zarr_encryption_ipfs.py
[error] 128-128: TypeError: Unsupported type for store_like: 'TransformedHamt'
[warning] 197-197: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
🔇 Additional comments (4)
py_hamt/__init__.py (1)
1-11: LGTM!The changes correctly expose the new
HamtFactoryclass through the module's public interface.tests/test_zarr_encryption_ipfs.py (2)
63-113: LGTM!The fixture correctly generates random test data and handles cleanup.
115-192: Fix TypeError with TransformedHamt and improve test coverage.The test has the following issues:
- Pipeline failure indicates TypeError with TransformedHamt.
- The test uses a bare
Exceptionin pytest.raises.- Missing test cases for concurrent access and large datasets.
Apply this diff to fix the issues:
- test_ds.to_zarr(store=hamt1, mode="w") + test_ds.to_zarr(store=dict(hamt1), mode="w") - with pytest.raises(Exception): + with pytest.raises((ValueError, TypeError), match="(Decryption failed|Invalid key)"): xr.open_zarr(store=hamt1_read_bad)✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Check if the error is reproducible with the latest version # Test: Search for similar issues or related code rg -A 5 'TypeError.*TransformedHamt' rg -A 5 'store_like.*TransformedHamt'Length of output: 78
I'll generate a new shell script to search the repository for any usage or definition of
TransformedHamtto gather more context on the TypeError being reported. This should help verify if the proposed changes are addressing an existing issue.
🏁 Script executed:
#!/bin/bash # Search for the occurrence of 'TransformedHamt' in the repository rg -n "TransformedHamt" .Length of output: 306
Fix TypeError with TransformedHamt usage and update exception handling in tests.
The evidence confirms that the codebase instantiates and uses a
TransformedHamt(seepy_hamt/hamt.py), and the integration test still callstest_ds.to_zarr(store=hamt1, mode="w"). This likely causes a TypeError when the transformed HAMT isn’t correctly converted to a dict-like object for the Zarr write operation. Additionally, the bare exception in the test does not pinpoint the decryption or key error.
- Call
test_ds.to_zarrwithstore=dict(hamt1)to prevent the TypeError.- Replace
pytest.raises(Exception)with a more specific tuple—e.g., using(ValueError, TypeError)and an appropriate match string like"(Decryption failed|Invalid key)".- Consider adding tests for concurrent access and handling very large datasets.
These changes address both the pipeline failure and improve test precision.
🧰 Tools
🪛 Ruff (0.8.2)
178-178:
pytest.raises(Exception)should be considered evil(B017)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
[error] 128-128: TypeError: Unsupported type for store_like: 'TransformedHamt'
py_hamt/hamt.py (1)
126-135: LGTM!The factory pattern is well-implemented and provides a clean interface for creating HAMT instances.
| class EncryptionFilter: | ||
| """An encryption filter for ZARR data. | ||
| This class is serialized and stored along with the Zarr it is used with, so instead | ||
| of storing the encryption key, we store the hash of the encryption key, so it can be | ||
| looked up in the key registry at run time as needed. | ||
| Parameters | ||
| ---------- | ||
| key_hash: str | ||
| The hex digest of an encryption key. A key may be generated using | ||
| :func:`generate_encryption_key`. The hex digest is obtained by registering the | ||
| key using :func:`register_encryption_key`. | ||
| """ | ||
|
|
||
| codec_id = "xchacha20poly1305" | ||
| header = b"dClimate-Zarr" | ||
|
|
||
| def __init__(self, encryption_key: str): | ||
| self.encryption_key = encryption_key | ||
|
|
||
| def encode(self, buf): | ||
| raw = io.BytesIO() | ||
| raw.write(buf) | ||
| nonce = get_random_bytes(24) # XChaCha uses 24 byte (192 bit) nonce | ||
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | ||
| cipher.update(self.header) | ||
| ciphertext, tag = cipher.encrypt_and_digest(raw.getbuffer()) | ||
|
|
||
| return nonce + tag + ciphertext | ||
|
|
||
| def decode(self, buf, out=None): | ||
| nonce, tag, ciphertext = buf[:24], buf[24:40], buf[40:] | ||
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | ||
| cipher.update(self.header) | ||
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | ||
|
|
||
| if out is not None: | ||
| outbuf = io.BytesIO(plaintext) | ||
| outbuf.readinto(out) | ||
| return out | ||
|
|
||
| return plaintext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance security by adding key validation and error handling.
The EncryptionFilter class needs improvements:
- Add validation for the encryption key length (should be 32 bytes for ChaCha20-Poly1305).
- Add error handling for encryption/decryption failures.
Apply this diff to improve the implementation:
def __init__(self, encryption_key: str):
+ if not isinstance(encryption_key, bytes) or len(encryption_key) != 32:
+ raise ValueError("encryption_key must be 32 bytes")
self.encryption_key = encryption_key
def encode(self, buf):
+ try:
raw = io.BytesIO()
raw.write(buf)
nonce = get_random_bytes(24) # XChaCha uses 24 byte (192 bit) nonce
cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce)
cipher.update(self.header)
ciphertext, tag = cipher.encrypt_and_digest(raw.getbuffer())
return nonce + tag + ciphertext
+ except Exception as e:
+ raise ValueError(f"Encryption failed: {str(e)}")
def decode(self, buf, out=None):
+ try:
nonce, tag, ciphertext = buf[:24], buf[24:40], buf[40:]
cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce)
cipher.update(self.header)
plaintext = cipher.decrypt_and_verify(ciphertext, tag)
+ except ValueError:
+ raise ValueError("Decryption failed: Invalid key or corrupted data")
+ except Exception as e:
+ raise ValueError(f"Decryption failed: {str(e)}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class EncryptionFilter: | |
| """An encryption filter for ZARR data. | |
| This class is serialized and stored along with the Zarr it is used with, so instead | |
| of storing the encryption key, we store the hash of the encryption key, so it can be | |
| looked up in the key registry at run time as needed. | |
| Parameters | |
| ---------- | |
| key_hash: str | |
| The hex digest of an encryption key. A key may be generated using | |
| :func:`generate_encryption_key`. The hex digest is obtained by registering the | |
| key using :func:`register_encryption_key`. | |
| """ | |
| codec_id = "xchacha20poly1305" | |
| header = b"dClimate-Zarr" | |
| def __init__(self, encryption_key: str): | |
| self.encryption_key = encryption_key | |
| def encode(self, buf): | |
| raw = io.BytesIO() | |
| raw.write(buf) | |
| nonce = get_random_bytes(24) # XChaCha uses 24 byte (192 bit) nonce | |
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | |
| cipher.update(self.header) | |
| ciphertext, tag = cipher.encrypt_and_digest(raw.getbuffer()) | |
| return nonce + tag + ciphertext | |
| def decode(self, buf, out=None): | |
| nonce, tag, ciphertext = buf[:24], buf[24:40], buf[40:] | |
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | |
| cipher.update(self.header) | |
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | |
| if out is not None: | |
| outbuf = io.BytesIO(plaintext) | |
| outbuf.readinto(out) | |
| return out | |
| return plaintext | |
| class EncryptionFilter: | |
| """An encryption filter for ZARR data. | |
| This class is serialized and stored along with the Zarr it is used with, so instead | |
| of storing the encryption key, we store the hash of the encryption key, so it can be | |
| looked up in the key registry at run time as needed. | |
| Parameters | |
| ---------- | |
| key_hash: str | |
| The hex digest of an encryption key. A key may be generated using | |
| :func:`generate_encryption_key`. The hex digest is obtained by registering the | |
| key using :func:`register_encryption_key`. | |
| """ | |
| codec_id = "xchacha20poly1305" | |
| header = b"dClimate-Zarr" | |
| def __init__(self, encryption_key: str): | |
| if not isinstance(encryption_key, bytes) or len(encryption_key) != 32: | |
| raise ValueError("encryption_key must be 32 bytes") | |
| self.encryption_key = encryption_key | |
| def encode(self, buf): | |
| try: | |
| raw = io.BytesIO() | |
| raw.write(buf) | |
| nonce = get_random_bytes(24) # XChaCha uses 24 byte (192 bit) nonce | |
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | |
| cipher.update(self.header) | |
| ciphertext, tag = cipher.encrypt_and_digest(raw.getbuffer()) | |
| return nonce + tag + ciphertext | |
| except Exception as e: | |
| raise ValueError(f"Encryption failed: {str(e)}") | |
| def decode(self, buf, out=None): | |
| try: | |
| nonce, tag, ciphertext = buf[:24], buf[24:40], buf[40:] | |
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | |
| cipher.update(self.header) | |
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | |
| except ValueError: | |
| raise ValueError("Decryption failed: Invalid key or corrupted data") | |
| except Exception as e: | |
| raise ValueError(f"Decryption failed: {str(e)}") | |
| if out is not None: | |
| outbuf = io.BytesIO(plaintext) | |
| outbuf.readinto(out) | |
| return out | |
| return plaintext |
| [tool.uv] | ||
| dev-dependencies = [ | ||
| "pycryptodome", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move pycryptodome to runtime dependencies.
Since the encryption functionality is part of the core feature set and not just for development/testing, pycryptodome should be moved to the runtime dependencies section.
Apply this diff to fix the dependency:
-[tool.uv]
-dev-dependencies = [
- "pycryptodome",
-]
+[project]
+dependencies = [
"dag-cbor>=0.3.3",
"msgspec>=0.18.6",
"multiformats[full]>=0.3.1.post4",
"requests>=2.32.3",
+ "pycryptodome>=3.20.0",
]Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
py_hamt/hamt.py (1)
615-638: Document special keys and transformer interface requirements.The implementation would benefit from:
- Documentation explaining why
.zarrayand.attributeskeys are special.- Type hints or interface definition for the transformer object.
class TransformedHamt(HAMTOriginal): """ A wrapper around the HAMT class that applies a transformation function when setting or getting items, except for specific keys like ".attributes". + + Special keys: + - ".zarray": Skipped during encoding to preserve Zarr metadata + - ".attributes": Skipped during decoding to preserve Zarr attributes + + The transformer object must implement: + - encode(value: bytes) -> bytes + - decode(value: bytes) -> bytes """ def __init__(self, *args, transformer=None, **kwargs): + """ + Args: + transformer: An object implementing encode/decode methods for value transformation + """ super().__init__(*args, **kwargs) self.transformer = transformer
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
py_hamt/hamt.py(2 hunks)tests/test_zarr_encryption_ipfs.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_encryption_ipfs.py
178-178: pytest.raises(Exception) should be considered evil
(B017)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
tests/test_zarr_encryption_ipfs.py
[error] 128-128: TypeError: Unsupported type for store_like: 'TransformedHamt'
[warning] UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
🔇 Additional comments (4)
tests/test_zarr_encryption_ipfs.py (3)
20-60: Enhance security by adding key validation and error handling.The
EncryptionFilterclass needs improvements:
- Add validation for the encryption key length (should be 32 bytes for ChaCha20-Poly1305).
- Add error handling for encryption/decryption failures.
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
[warning] UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
63-113: LGTM! Well-structured test fixture.The fixture correctly generates test data and handles cleanup using the yield pattern.
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
[warning] UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
128-129:⚠️ Potential issueFix TypeError with TransformedHamt store.
The pipeline failure indicates that xarray doesn't recognize TransformedHamt as a valid store type. This needs to be addressed to ensure compatibility.
Let's check xarray's documentation for supported store types:
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
[error] 128-128: TypeError: Unsupported type for store_like: 'TransformedHamt'
[warning] UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
py_hamt/hamt.py (1)
126-135: LGTM! Clean factory pattern implementation.The factory method clearly handles the instantiation logic for both HAMT variants.
tests/test_zarr_encryption_ipfs.py
Outdated
| with pytest.raises(Exception): | ||
| xr.open_zarr(store=hamt1_read_bad) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use specific exception types instead of broad Exception catch.
Replace the broad Exception catch with the specific exception type that should be raised when using an incorrect encryption key.
- with pytest.raises(Exception):
+ with pytest.raises(ValueError, match="Decryption failed:"):
xr.open_zarr(store=hamt1_read_bad)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with pytest.raises(Exception): | |
| xr.open_zarr(store=hamt1_read_bad) | |
| with pytest.raises(ValueError, match="Decryption failed:"): | |
| xr.open_zarr(store=hamt1_read_bad) |
🧰 Tools
🪛 Ruff (0.8.2)
178-178: pytest.raises(Exception) should be considered evil
(B017)
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
[warning] UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tests/test_zarr_encryption_ipfs.py (2)
20-60:⚠️ Potential issueEnhance security by adding key validation and error handling.
The
EncryptionFilterclass needs improvements:
- Add validation for the encryption key length (should be 32 bytes for ChaCha20-Poly1305).
- Add error handling for encryption/decryption failures.
Additionally, consider:
- Making the header configurable or using a unique value per instance to prevent potential replay attacks.
- Adding a version field to support future algorithm changes.
Apply this diff to improve the implementation:
+ def __init__(self, encryption_key: bytes, header: bytes = b"dClimate-Zarr"): + if not isinstance(encryption_key, bytes) or len(encryption_key) != 32: + raise ValueError("encryption_key must be 32 bytes") + self.encryption_key = encryption_key + self.header = header + self.version = 1 # For future algorithm changes def encode(self, buf): + try: raw = io.BytesIO() raw.write(buf) nonce = get_random_bytes(24) # XChaCha uses 24 byte (192 bit) nonce cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) cipher.update(self.header) ciphertext, tag = cipher.encrypt_and_digest(raw.getbuffer()) return nonce + tag + ciphertext + except Exception as e: + raise ValueError(f"Encryption failed: {str(e)}") def decode(self, buf, out=None): + try: nonce, tag, ciphertext = buf[:24], buf[24:40], buf[40:] cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) cipher.update(self.header) plaintext = cipher.decrypt_and_verify(ciphertext, tag) + except ValueError: + raise ValueError("Decryption failed: Invalid key or corrupted data") + except Exception as e: + raise ValueError(f"Decryption failed: {str(e)}")
173-178:⚠️ Potential issueUse specific exception types instead of broad Exception catch.
Replace the broad
Exceptioncatch with the specific exception type that should be raised when using an incorrect encryption key.Apply this diff to improve error handling:
- with pytest.raises(Exception, match="Codec ID not found in transformer"): + with pytest.raises(ValueError, match="Codec ID not found in transformer"): HAMT( store=IPFSStore(), root_node_id=hamt1_root, read_only=True, )
🧹 Nitpick comments (3)
py_hamt/hamt.py (3)
285-309: Improve error handling and performance in check_crypto.The
check_cryptomethod has several areas for improvement:
- Cache the codec_id check to avoid repeated attribute lookups.
- Use early returns to simplify the logic.
- Improve error messages.
Apply this diff to improve the implementation:
def check_crypto(self): - has_codec_id = hasattr(self, "codec_id") - found_filter = False + codec_id = getattr(self, "codec_id", None) + found_metadata = False for key in self: if key.endswith("/.zarray"): + found_metadata = True metadata_bytes = self[key] metadata = json.loads(metadata_bytes.decode("utf-8")) crypto = metadata.get("crypto", None) if crypto: - found_filter = True - if not has_codec_id: - raise Exception("Codec ID not found in transformer") - if crypto[0]["id"] != self.codec_id: - raise Exception("Codec ID does not match transformer") - if not found_filter and has_codec_id: - raise Exception("Codec ID found in transformer but no crypto found in metadata") + if not codec_id: + raise ValueError("Encrypted data found but no transformer provided") + if crypto[0]["id"] != codec_id: + raise ValueError(f"Expected codec ID '{codec_id}' but found '{crypto[0]['id']}'") + return + if codec_id and not found_metadata: + raise ValueError("Transformer provided but no encrypted data found")
667-674: Optimize dictionary key check.Using
dict.keys()is unnecessary when checking for key existence.Apply this diff to improve performance:
- if "crypto" not in metadata.keys(): + if "crypto" not in metadata: metadata['crypto'] = [] # Append codec filter if not already present if codec_filter not in metadata['crypto']: metadata['crypto'].append(codec_filter) # Encode back to JSON bytes return json.dumps(metadata).encode('utf-8')🧰 Tools
🪛 Ruff (0.8.2)
667-667: Use
key not in dictinstead ofkey not in dict.keys()Remove
.keys()(SIM118)
680-691: Simplify conditional logic.The conditional logic can be simplified by returning the negated condition directly.
Apply this diff to improve readability:
def _should_transform_key(self, key): - if any(meta_key in key for meta_key in self.ZARR_METADATA_KEYS): - return False # Don't transform metadata keys - - if self.INDEX_KEY_PATTERN.fullmatch(key): - return False # Don't transform single index keys like lat/0 - - return True # Transform everything else (e.g., chunked data like precip/1.0.0) + # Don't transform metadata keys or single index keys + return not ( + any(meta_key in key for meta_key in self.ZARR_METADATA_KEYS) + or self.INDEX_KEY_PATTERN.fullmatch(key) + )🧰 Tools
🪛 Ruff (0.8.2)
685-690: Return the negated condition directly
(SIM103)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
py_hamt/hamt.py(4 hunks)tests/test_zarr_encryption_ipfs.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
tests/test_zarr_encryption_ipfs.py
[error] 129-129: TypeError: Unsupported type for store_like: 'TransformedHamt'
🪛 Ruff (0.8.2)
py_hamt/hamt.py
652-652: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
667-667: Use key not in dict instead of key not in dict.keys()
Remove .keys()
(SIM118)
685-690: Return the negated condition directly
(SIM103)
🔇 Additional comments (2)
tests/test_zarr_encryption_ipfs.py (1)
63-112: LGTM! Well-structured test fixture.The fixture is well-implemented with:
- Proper cleanup using
yieldandshutil.rmtree.- Realistic test data with appropriate shapes and coordinates.
- Clear documentation of return values.
py_hamt/hamt.py (1)
128-137: LGTM! Well-implemented factory pattern.The factory pattern is correctly implemented to create appropriate HAMT instances based on the presence of a transformer.
| transformer=EncryptionFilter(encryption_key=encryption_key), | ||
| encrypt_vars=["temp", "precip"], | ||
| ) | ||
| test_ds.to_zarr(store=hamt1, mode="w") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix TypeError: Unsupported type for store_like.
The pipeline failure indicates that test_ds.to_zarr(store=hamt1, mode="w") is failing because TransformedHamt is not a supported store type.
Run the following script to check if TransformedHamt implements all required methods for a zarr store:
🏁 Script executed:
#!/bin/bash
# Description: Check if TransformedHamt implements all required methods for a zarr store.
# Test: Search for required zarr store methods in TransformedHamt.
ast-grep --pattern $'class TransformedHamt {
$$$
__getitem__($$$) {
$$$
}
__setitem__($$$) {
$$$
}
__delitem__($$$) {
$$$
}
__iter__($$$) {
$$$
}
__len__($$$) {
$$$
}
$$$
}'Length of output: 218
I'll generate a new shell script to locate the definition of TransformedHamt and inspect its implementation to verify if it includes the required methods for a Zarr store.
Please run this script and share the output so we can verify whether TransformedHamt implements the required methods for a Zarr store.
🏁 Script executed:
#!/bin/bash
# 1. Find the file containing the definition of TransformedHamt.
FILE=$(rg -l "class TransformedHamt" . | head -n 1)
if [ -z "$FILE" ]; then
echo "No file found containing 'class TransformedHamt'"
exit 1
fi
echo "TransformedHamt class found in file: $FILE"
# 2. Print context of the class definition.
echo "Extracting class definition (20 lines context):"
rg -n "class TransformedHamt" -A 20 "$FILE"
# 3. Check for the required methods in that file.
echo "Searching for required methods in $FILE:"
echo "---- __getitem__ ----"
rg -n "def __getitem__" "$FILE"
echo "---- __setitem__ ----"
rg -n "def __setitem__" "$FILE"
echo "---- __delitem__ ----"
rg -n "def __delitem__" "$FILE"
echo "---- __iter__ ----"
rg -n "def __iter__" "$FILE"
echo "---- __len__ ----"
rg -n "def __len__" "$FILE"Length of output: 2274
TransformedHamt Implements the Required Methods but Lacks Proper Interface Inheritance
Our investigation confirms that the TransformedHamt class (in py_hamt/hamt.py) contains implementations for the standard mapping methods (__getitem__, __setitem__, __delitem__, __iter__, and __len__). However, despite this, the pipeline still fails with a TypeError ("Unsupported type for store_like") when passing an instance of this class to to_zarr.
• Though the methods exist, Zarr likely verifies the store by checking for a valid mutable mapping type—typically done via an inheritance from (or registration with) collections.abc.MutableMapping.
• It appears that TransformedHamt (or its base class HAMTOriginal) is not declared as a subclass of MutableMapping, so even with the required methods, it isn’t recognized as a valid Zarr store.
To resolve this issue, please review and modify the class inheritance hierarchy so that TransformedHamt (and/or its parent) explicitly extends collections.abc.MutableMapping. This should meet the interface expectations required by Zarr.
🧰 Tools
🪛 GitHub Actions: Triggered on push from TheGreatAlgo to branch/tag crypto
[error] 129-129: TypeError: Unsupported type for store_like: 'TransformedHamt'
| def __init__(self, *args, transformer=None, encrypt_vars=[], **kwargs): | ||
| self.transformer = transformer | ||
| self.codec_id = transformer.codec_id | ||
| self.encrypt_vars = encrypt_vars | ||
| super().__init__(*args, **kwargs) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix mutable default argument.
Using mutable default arguments can lead to unexpected behavior.
Apply this diff to fix the issue:
- def __init__(self, *args, transformer=None, encrypt_vars=[], **kwargs):
+ def __init__(self, *args, transformer=None, encrypt_vars=None, **kwargs):
self.transformer = transformer
self.codec_id = transformer.codec_id
- self.encrypt_vars = encrypt_vars
+ self.encrypt_vars = encrypt_vars if encrypt_vars is not None else []
super().__init__(*args, **kwargs)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, *args, transformer=None, encrypt_vars=[], **kwargs): | |
| self.transformer = transformer | |
| self.codec_id = transformer.codec_id | |
| self.encrypt_vars = encrypt_vars | |
| super().__init__(*args, **kwargs) | |
| def __init__(self, *args, transformer=None, encrypt_vars=None, **kwargs): | |
| self.transformer = transformer | |
| self.codec_id = transformer.codec_id | |
| self.encrypt_vars = encrypt_vars if encrypt_vars is not None else [] | |
| super().__init__(*args, **kwargs) |
🧰 Tools
🪛 Ruff (0.8.2)
652-652: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
py_hamt/hamt.py (1)
652-657:⚠️ Potential issueFix mutable default argument.
Using mutable default arguments can lead to unexpected behavior when the same list is shared across instances.
Apply this diff to fix the issue:
- def __init__(self, *args, transformer=None, encrypt_vars=[], **kwargs): + def __init__(self, *args, transformer=None, encrypt_vars=None, **kwargs): self.transformer = transformer self.codec_id = transformer.codec_id - self.encrypt_vars = encrypt_vars + self.encrypt_vars = encrypt_vars if encrypt_vars is not None else [] super().__init__(*args, **kwargs)🧰 Tools
🪛 Ruff (0.8.2)
652-652: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
🧹 Nitpick comments (3)
py_hamt/hamt.py (3)
285-309: Enhance error messages and optimize metadata check.The crypto validation is thorough, but consider these improvements:
- Error messages could be more descriptive by including the actual vs expected codec IDs
- The metadata check could be optimized by using
key not in metadatainstead ofkey not in metadata.keys()Apply this diff to improve the implementation:
- if "crypto" not in metadata.keys(): + if "crypto" not in metadata: metadata['crypto'] = [] if crypto: found_filter = True if not has_codec_id: - raise Exception("Codec ID not found in transformer") + raise Exception(f"Transformer required: crypto filter found in {key}") if crypto[0]["id"] != self.codec_id: - raise Exception("Codec ID does not match transformer") + raise Exception(f"Codec ID mismatch in {key}: expected {self.codec_id}, got {crypto[0]['id']}")
680-699: Simplify transformation logic and improve return statement.The transformation logic can be simplified and the return statement can be more direct.
Apply this diff to improve the implementation:
def _should_transform_key(self, key): """Determines whether the key should be transformed.""" if any(meta_key in key for meta_key in self.ZARR_METADATA_KEYS): return False # Don't transform metadata keys if self.INDEX_KEY_PATTERN.fullmatch(key): return False # Don't transform single index keys parts = key.split('/') metadata = self[parts[0] + "/.zarray"] if not metadata: return False metadata = json.loads(metadata.decode('utf-8')) crypto = metadata.get("crypto", []) - # If no crypto then don't transform data - if not crypto: - return False - return True # Transform everything else + return bool(crypto) # Transform only if crypto filters exist🧰 Tools
🪛 Ruff (0.8.2)
697-699: Return the negated condition directly
(SIM103)
658-674: Enhance metadata update robustness.The metadata update logic could be more robust by validating the crypto filter structure.
Apply this diff to improve the implementation:
def _update_metadata_crypto(self, value): """Update the metadata to include the codec in the crypto list.""" try: metadata = json.loads(value.decode('utf-8')) + if not isinstance(metadata, dict): + return value except (json.JSONDecodeError, UnicodeDecodeError, AttributeError): return value # Return original if not decodable codec_filter = {'id': self.codec_id} - if "crypto" not in metadata.keys(): + if "crypto" not in metadata: metadata['crypto'] = [] + elif not isinstance(metadata['crypto'], list): + metadata['crypto'] = [] if codec_filter not in metadata['crypto']: metadata['crypto'].append(codec_filter) return json.dumps(metadata).encode('utf-8')🧰 Tools
🪛 Ruff (0.8.2)
667-667: Use
key not in dictinstead ofkey not in dict.keys()Remove
.keys()(SIM118)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
py_hamt/hamt.py(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
py_hamt/hamt.py
652-652: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
667-667: Use key not in dict instead of key not in dict.keys()
Remove .keys()
(SIM118)
697-699: Return the negated condition directly
(SIM103)
🔇 Additional comments (1)
py_hamt/hamt.py (1)
128-137: LGTM! Well-structured factory pattern implementation.The factory pattern implementation cleanly separates the HAMT variants and provides a good abstraction for client code.
| class HAMT: | ||
| def __new__(cls, store, transformer=None, **kwargs): | ||
| """ | ||
| Instantiating HAMT directly returns an instance of HAMT or TransformedHamt, | ||
| depending on whether a transformer is provided. | ||
| """ | ||
| if transformer: | ||
| return TransformedHamt(store=store, transformer=transformer, **kwargs) | ||
| return HAMTOriginal(store=store, **kwargs) | ||
|
|
||
|
|
||
| class HAMTOriginal(MutableMapping): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having to differentiate and add this layer of indirection, we can take in the transformer functions in the HAMT class initiation and then apply them right before values are either sent or retrieved form the store.
| [tool.uv] | ||
| dev-dependencies = [ | ||
| "pycryptodome", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coderabbit also pointed this out, but yes this should be in the project dependencies rather than dev dependencies.
| class EncryptionFilter: | ||
| """An encryption filter for ZARR data. | ||
| This class is serialized and stored along with the Zarr it is used with, so instead | ||
| of storing the encryption key, we store the hash of the encryption key, so it can be | ||
| looked up in the key registry at run time as needed. | ||
| Parameters | ||
| ---------- | ||
| key_hash: str | ||
| The hex digest of an encryption key. A key may be generated using | ||
| :func:`generate_encryption_key`. The hex digest is obtained by registering the | ||
| key using :func:`register_encryption_key`. | ||
| """ | ||
|
|
||
| codec_id = "xchacha20poly1305" | ||
| header = b"dClimate-Zarr" | ||
|
|
||
| def __init__(self, encryption_key: str): | ||
| self.encryption_key = encryption_key | ||
|
|
||
| def encode(self, buf): | ||
| raw = io.BytesIO() | ||
| raw.write(buf) | ||
| nonce = get_random_bytes(24) # XChaCha uses 24 byte (192 bit) nonce | ||
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | ||
| cipher.update(self.header) | ||
| ciphertext, tag = cipher.encrypt_and_digest(raw.getbuffer()) | ||
|
|
||
| return nonce + tag + ciphertext | ||
|
|
||
| def decode(self, buf, out=None): | ||
| nonce, tag, ciphertext = buf[:24], buf[24:40], buf[40:] | ||
| cipher = ChaCha20_Poly1305.new(key=self.encryption_key, nonce=nonce) | ||
| cipher.update(self.header) | ||
| plaintext = cipher.decrypt_and_verify(ciphertext, tag) | ||
|
|
||
| if out is not None: | ||
| outbuf = io.BytesIO(plaintext) | ||
| outbuf.readinto(out) | ||
| return out | ||
|
|
||
| return plaintext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than use a class, we should use a factory function that returns closures. This allows the transformers input on the HAMT class to be more flexible and accept a wider array of generic functions, rather than limit to either just this class type, or force clients to inherit from this class type.
Things to do:
It encrypts the entire thing right now. I have it working because I know what decryption is expected but for others its impossible. So i need to add filtering to ensure only the data keys are encrypted and not the metadata
I also need to force add the metadata for the encryption used so zarr.js on the frontend can understand
Summary by CodeRabbit
New Features
Chores
Tests