New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Enable external material and rotation for encryption keys #25986
Comments
Todd Farmer / @toddfarmer: |
@adamreeve is working on this. |
@adamreeve could you comment "take" on this issue? Once you do, this issue should be assigned to you and I can merge the PR. (Also I'm looking into improving this process.) |
take |
…keys (#34181) This PR is a replacement for #10491 which appears to be abandoned. I've fixed the issues pointed out in review comments as well as a few more things I noticed. I've also made one significant change to the API; rather than `CryptoFactory::RotateMasterKeys` operating on a whole directory, it rotates keys for a single file, as this is much more flexible and allows users to decide what files to rotate keys for, and there didn't appear to be a good reason for this to work at the whole directory level. This does mean the API diverges slightly from the parquet-mr API though. ### Rationale for this change Use of external key material allows rotating master encryption keys without having to rewrite Parquet file data. See https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk/edit?usp=sharing for more details. ### What changes are included in this PR? Adds support for writing and reading external key material for Parquet files from C++, as well as a new `CryptoFactory::RotateMasterKeys` function that allows re-encrypting key encryption keys or data encryption keys with latest versions of master keys. ### Are these changes tested? Yes, unit tests are included. I've added an additional test that reads a file generated with parquet-mr (used by Spark) from the parquet-testing repository. This requires merging the PR at apache/parquet-testing#36 and updating the parquet-testing submodule before the new test will pass. ### Are there any user-facing changes? Yes, the existing `internal_key_material` option in `parquet::encryption::EncryptionConfiguration` will now work and use external key material. This requires using two new parameters (`file_path` and `file_system`) in `CryptoFactory::GetFileEncryptionProperties` and `CryptoFactory::GetFileDecryptionProperties`, which are needed so that we know where to write/read the external key material. Note that this means external key material won't work from Python until the new parameters are exposed in Python too. This changes the `CryptoFactory` ABI but the API is still source compatible. Is this okay or should new overloads be added that take the new parameters? The `CryptoFactory::RotateMasterKeys` function is also a new public facing API. * Closes: #25986 Lead-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Maya Anderson <mayaa@il.ibm.com> Signed-off-by: Will Jones <willjones127@gmail.com>
Encryption key material keeps the information required to recover the keys by Parquet readers. Currently, only "internal storage mode" is supported, where the key material is stored inside Parquet file footers, and therefore can't be rotated (re-wrapped with refreshed master keys) in immutable files. The new mechanism in this Jira adds the advanced "external storage mode" for the key material, where the key material is stored in separate small files, that enable the key rotation. This mechanism will also implement the key rotation functionality.
Reporter: Gidon Gershinsky / @ggershinsky
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-9960. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: