-
Notifications
You must be signed in to change notification settings - Fork 0
Auto coordinate and variable exclusion in encryption transformers for zarr v3 #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces a new parameter, Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Transformer
participant DecryptionHandler
Caller->>Transformer: Call create_zarr_encryption_transformers(encryption_key, header, exclude_vars, detect_exclude)
Transformer->>Transformer: Adjust exclude_vars using detect_exclude parameter
alt Encryption succeeds
Transformer-->>Caller: Return configured transformer
else Decryption error occurs
Transformer->>DecryptionHandler: Handle error and check detect_exclude mode
DecryptionHandler-->>Transformer: Add unencrypted variable to exclusion list (if "auto-from-read")
Transformer-->>Caller: Return error response or updated transformer
end
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #41 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 531 561 +30
=========================================
+ Hits 531 561 +30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_zarr_ipfs.py (2)
206-211: Empty header is acceptable but consider broader test coverage.
Defining a deliberately empty header for testing is reasonable, but you may expand test coverage by using various header values to demonstrate diverse encryption scenarios.
229-247: Use a more specific exception instead ofException.
Raising or catchingExceptionis discouraged (B017). Prefer a more precise exception type (e.g.ValueError,OSError, or a custom class) for clearer error handling and fewer false positives.- with pytest.raises(Exception): + with pytest.raises(ValueError):🧰 Tools
🪛 Ruff (0.8.2)
234-234:
pytest.raises(Exception)should be considered evil(B017)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
py_hamt/zarr_encryption_transformers.py(5 hunks)tests/test_zarr_ipfs.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_zarr_ipfs.py
234-234: pytest.raises(Exception) should be considered evil
(B017)
🔇 Additional comments (4)
tests/test_zarr_ipfs.py (2)
190-192: Usedetect_excludeeffectively for coordinate variables.
By providingtest_dstodetect_exclude, all coordinate variables intest_dswill be auto-excluded from encryption. This aligns well with the partial encryption strategy.
199-199: Good approach to ensure read-only usage.
Re-instantiatingIPFSZarr3withread_only=Trueis a clean way to verify the dataset can be read correctly from immutable storage.py_hamt/zarr_encryption_transformers.py (2)
74-78: Auto-adding coordinates is correct and convenient.
Including all coordinates fromdetect_excludedatasets inexclude_var_setensures typical dimension variables remain unencrypted. This maintains partial encryption consistency.
105-152: Robust partial-encryption handling.
The logic correctly handles unencrypted metadata, gracefully falls back for partially encrypted datasets when in"auto-from-read"mode, and re-raises otherwise. This approach helps maintain functionality even if some variables are left unencrypted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
py_hamt/zarr_encryption_transformers.py (1)
137-152: Good resilience for mixed encrypted/unencrypted datasets.The error handling for decryption failures when
detect_exclude = "auto-from-read"allows for graceful recovery when encountering unencrypted variables in a dataset. This makes the solution more robust for real-world scenarios where partial encryption is used.However, consider catching specific encryption-related exceptions rather than all exceptions to avoid masking unrelated errors.
- except Exception as e: + except (ValueError, KeyError) as e: # Or specific ChaCha20_Poly1305 exceptions
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
py_hamt/zarr_encryption_transformers.py(5 hunks)
🔇 Additional comments (7)
py_hamt/zarr_encryption_transformers.py (7)
16-18: Fix mismatch between docstring and parameter type.The docstring (line 33) states "set
detect_excludeto None" to disable auto detection, but the type definition only allowsFalse. Update either the parameter type to includeNoneor update the docstring to maintain consistency.
74-78: Well implemented automatic coordinate detection from Dataset.The implementation correctly extracts coordinate variables from the provided dataset and adds them to the exclusion set. This provides a convenient way for users to automatically exclude coordinate variables without having to list them manually.
79-92: Good approach for variable identification in zarr v3 paths.The implementation correctly extracts the variable name from the zarr v3 path structure by finding the first directory name. This is a robust way to identify which variables should be excluded from encryption.
🧰 Tools
🪛 Ruff (0.8.2)
89-92: Return the condition
not key[-9:] == "zarr.json"directlyReplace with
return not key[-9:] == "zarr.json"(SIM103)
105-133: Great feature for coordinate auto-detection from metadata.The implementation intelligently extracts coordinate variables from zarr metadata files, including both consolidated metadata and individual variable metadata. This provides a fallback mechanism for detecting coordinates when a dataset isn't provided upfront.
31-34: Comprehensive documentation of the new feature.The documentation clearly explains the purpose and behavior of the
detect_excludeparameter, including its three possible values. This helps users understand and utilize this powerful feature appropriately.
37-60: Helpful example demonstrating auto-detection features.The example code effectively demonstrates how to use both the explicit coordinate exclusion feature with a dataset and the automatic detection feature when reading with incorrect encryption parameters. This gives users a clear pattern to follow.
105-106:❓ Verification inconclusive
Be cautious about thread safety.
The
seen_metadataset is a mutable state shared between multiple decrypt calls. This could cause issues in multi-threaded environments.Consider if this function might be used in multi-threaded contexts and if so, document this limitation or implement a thread-safe approach.
🏁 Script executed:
#!/bin/bash # Search for any threading or concurrent uses in the codebase rg -i "thread|concurrent|asyncio|parallel" --type pyLength of output: 1105
THREAD SAFETY REVIEW: Revisit Concurrency Considerations in
py_hamt/zarr_encryption_transformers.py
- The mutable state (
seen_metadata: set[str] = set()) is shared across decrypt calls. This could lead to race conditions if these calls are executed concurrently.- While the rest of the codebase (e.g., in
py_hamt/hamt.py) explicitly addresses thread safety—using locks and clear documentation—this file does not currently implement any synchronization.- Action Required:
- Determine whether these decryption routines may be invoked from multiple threads in your use cases.
- If concurrent access is possible:
- Consider adding synchronization (e.g., wrapping access to
seen_metadatawith aLock), or- Clearly document the thread-safety limitation to set proper expectations.
Summary by CodeRabbit
New Features
Documentation