Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Enable external material and rotation for encryption keys #25986

Closed
asfimport opened this issue Sep 10, 2020 · 4 comments · Fixed by #34181
Closed

[C++][Parquet] Enable external material and rotation for encryption keys #25986

asfimport opened this issue Sep 10, 2020 · 4 comments · Fixed by #34181

Comments

@asfimport
Copy link

asfimport commented Sep 10, 2020

Encryption key material keeps the information required to recover the keys by Parquet readers. Currently, only "internal storage mode" is supported, where the key material is stored inside Parquet file footers, and therefore can't be rotated (re-wrapped with refreshed master keys) in immutable files. The new mechanism in this Jira adds the advanced "external storage mode" for the key material, where the key material is stored in separate small files, that enable the key rotation. This mechanism will also implement the key rotation functionality.

Reporter: Gidon Gershinsky / @ggershinsky

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-9960. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Todd Farmer / @toddfarmer:
This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned per project policy. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

@wjones127
Copy link
Member

@adamreeve is working on this.

@wjones127
Copy link
Member

@adamreeve could you comment "take" on this issue? Once you do, this issue should be assigned to you and I can merge the PR. (Also I'm looking into improving this process.)

@adamreeve
Copy link
Contributor

take

wjones127 pushed a commit that referenced this issue Feb 27, 2023
…keys (#34181)

This PR is a replacement for #10491 which appears to be abandoned. I've fixed the issues pointed out in review comments as well as a few more things I noticed. I've also made one significant change to the API; rather than `CryptoFactory::RotateMasterKeys` operating on a whole directory, it rotates keys for a single file, as this is much more flexible and allows users to decide what files to rotate keys for, and there didn't appear to be a good reason for this to work at the whole directory level. This does mean the API diverges slightly from the parquet-mr API though.

### Rationale for this change

Use of external key material allows rotating master encryption keys without having to rewrite Parquet file data. See https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk/edit?usp=sharing for more details.

### What changes are included in this PR?

Adds support for writing and reading external key material for Parquet files from C++, as well as a new `CryptoFactory::RotateMasterKeys` function that allows re-encrypting key encryption keys or data encryption keys with latest versions of master keys.

### Are these changes tested?

Yes, unit tests are included. I've added an additional test that reads a file generated with parquet-mr (used by Spark) from the parquet-testing repository. This requires merging the PR at apache/parquet-testing#36 and updating the parquet-testing submodule before the new test will pass.

### Are there any user-facing changes?

Yes, the existing `internal_key_material` option in `parquet::encryption::EncryptionConfiguration` will now work and use external key material. This requires using two new parameters (`file_path` and `file_system`) in `CryptoFactory::GetFileEncryptionProperties` and `CryptoFactory::GetFileDecryptionProperties`, which are needed so that we know where to write/read the external key material. Note that this means external key material won't work from Python until the new parameters are exposed in Python too.

This changes the `CryptoFactory` ABI but the API is still source compatible. Is this okay or should new overloads be added that take the new parameters?

The `CryptoFactory::RotateMasterKeys` function is also a new public facing API.
* Closes: #25986

Lead-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: Revital Sur <eres@il.ibm.com>
Co-authored-by: Maya Anderson <mayaa@il.ibm.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 added this to the 12.0.0 milestone Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants