Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest list encryption #7770

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

ggershinsky
Copy link
Contributor

No description provided.

@ggershinsky ggershinsky marked this pull request as draft June 5, 2023 06:10
@@ -162,6 +162,15 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
*/
String manifestListLocation();

/**
* Return the size of this snapshot's manifest list. For encrypted tables, a verified plaintext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comment

@ggershinsky ggershinsky marked this pull request as ready for review March 27, 2024 07:07
@@ -162,6 +162,25 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
*/
String manifestListLocation();

/**
* Return the size of this snapshot's manifest list file. Must be a verified value, taken from a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused here, we have a default of -1 as well set in base Snapshot which seemed to also be allowed as an "unset". Should we mention that here or is it always required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define this field to be required only for encrypted tables. It will be not set in the snapshot file for unencrypted tables - where this method can return 0 (or -1, I'll make it consistent across all implementation classes).

if (manifestListKeyMetadata != null) { // encrypted manifest list file
Preconditions.checkArgument(
fileIO instanceof EncryptingFileIO,
"No encryption in FileIO class " + fileIO.getClass());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot read manifest list (%s) because it is encrypted but the configured FileIO (%s) does not implement EncryptingFileIO)

EncryptingFileIO encryptingFileIO = (EncryptingFileIO) fileIO;
Preconditions.checkArgument(
encryptingFileIO.encryptionManager() instanceof StandardEncryptionManager,
"Encryption manager for encrypted manifest list files can currently only be an instance of "
Copy link
Member

@RussellSpitzer RussellSpitzer Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot decrypt manifest list (%s) because the encryption manager (%s) does not implement StandardEncryptionManager

generator.writeStringField(MANIFEST_LIST_KEY_METADATA, snapshot.manifestListKeyMetadata());
}

// TODO discuss: do we need to sign the size value? Or sign the whole snapshot?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this attack work? Wouldn't the user also need the key to encrypt the replacement files? I thought we were storing the metadata.json key in the catalog so an attacker could replace everything but still not be able to trick a client using the catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the question. Some thoughts on the scenarios and protection options:

  • currently, we don't have a metadata.json key. We have only a key for snapshot's manifest list file. Besides using it for encrypting the manifest list file, we can also use this key for signing snapshot's sensitive parts like the manifest list size field. Or for signing the whole metadata.json file (should be possible with some effort) - then we also protect the integrity of e.g. the table properties (like the table key id).
  • snapshot (metadata.json file) doesn't keep secret values, so encrypting it might not be required. The signatures, mentioned above, would be kept in added snapshot fields - sufficient for detecting the file modification attacks.
  • these protection techniques are not required with the REST catalog - because we trust the catalog service (we don't trust the storage service). Since the whole snapshot is stored in the REST catalog, we don't need to sign anything.
  • the manifest list key is not stored in the catalog. Instead, it is wrapped in a KMS with the table master key, and stored in the snapshot MANIFEST_LIST_KEY_METADATA field. Only the KMS-authorized (for the table key) users/processes will be able to get the manifest list key.
  • In catalogs other than the REST, the signatures provide a partial protection - because the metadata.json is kept in the untrusted storage. With the signatures, it can't be modified. But the whole folder can be replaced (e.g. a replay attack - where all table files are removed, and replaced with files of an older version of the table). To prevent this attack in non-REST catalogs, we will have to update the catalog per each table snapshot (setting eg the latest table version/sequence number, or a random AAD prefix)

&& encryptedManifestList.keyMetadata().buffer() != null) {
Preconditions.checkArgument(
encryptionManager instanceof StandardEncryptionManager,
"Encryption manager for encrypted manifest list files can currently only be an instance of "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above. "Cannot X because Y"

@@ -85,7 +85,7 @@ public class TestManifestEncryption {

private static final DataFile DATA_FILE =
new GenericDataFile(
0,
SPEC.specId(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completing the previous patch #8252 (comment)

private static final long EXISTING_ROWS = 857273L;
private static final int DELETED_FILES = 1;
private static final long DELETED_ROWS = 22910L;
private static final List<ManifestFile.PartitionFieldSummary> PARTITION_SUMMARIES =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a test example which has a non-empty list of partition field summaries


@Test
public void testV2Write() throws IOException {
ManifestFile manifest = writeAndReadManifestList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: writeAndReadEncryptedManifestList

public void testV2Write() throws IOException {
ManifestFile manifest = writeAndReadManifestList();

// all v2 fields should be read correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert J has some helper for this, Not sure if it is correct

assertThat(actual).usingRecursiveComparison().isEqualTo(expected);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// TODO discuss: do we need to sign the size value? Or sign the whole snapshot?
// Or rely on REST catalog? - the only option that prevents "full folder replacement" attack.
if (snapshot.manifestListSize() >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another small question here, we essentially are doing a transform here

manifestlists sizes < 0 become 0.

Also nit: we are also ignoring 0's that get passed through although we will read this as 0 if it is missing.

Just wondering what the intent here is. I think it may be better to have a defined missing value? Not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, part of this thread #7770 (comment)

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that we store the manifest list size? Won't the encryption be enough to prove the file is the right one?

@ggershinsky
Copy link
Contributor Author

Yep, this is due to https://github.com/apache/iceberg/blob/main/format/gcm-stream-spec.md#file-length . There are options for table modification attacks if this field is not (safely) stored.

this.v1ManifestLocations = v1ManifestLocations;
this.manifestListKeyMetadata = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above (group manifest vars)

* In encrypted tables, return the size of this snapshot's manifest list file. Must be a verified
* value, taken from a trusted source. In unencrypted tables, can return 0.
*/
default long manifestListSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding new methods for each piece of new information we want to pass, what about adding a ManifestList object that contains the location, size, and key metadata?


import java.nio.ByteBuffer;

public interface ManifestListFile {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to length or size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now kept in the key metadata.
Also, we need it in one place only - when decrypting the manifest list file (stream).

import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
import org.apache.iceberg.util.ByteBuffers;

public class BaseManifestListFile implements ManifestListFile, Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be public. Is there a reason why the implementation must be accessible other than to SnapshotParser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per a comment below, this class is now used in EncryptionUtil (a different package).

private final ByteBuffer encryptedKeyMetadata;
private ByteBuffer keyMetadata;

public BaseManifestListFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be package private as well.

Copy link
Contributor Author

@ggershinsky ggershinsky Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per a comment below, this constructor is now called from EncryptionUtil (in a different package).

@@ -53,6 +55,7 @@ class BaseSnapshot implements Snapshot {
private transient List<DeleteFile> addedDeleteFiles = null;
private transient List<DeleteFile> removedDeleteFiles = null;

/** Tests only */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an annotation that should be used instead, VisibleForTesting.

+ "FileIO (%s) does not implement EncryptingFileIO",
manifestListFile.location(),
fileIO.getClass());
EncryptingFileIO encryptingFileIO = (EncryptingFileIO) fileIO;
Copy link
Contributor

@rdblue rdblue Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should add a method to FileIO to read a ManifestList, just like the pattern that we introduced for manifests. Here's my previous comment: #7770 (comment)

Adding a method avoids needing to check and cast a FileIO here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncryptingFileIO is in the api module. Do we want to move it to the core module, or create an extending class there? So we'll have access to the functionality required for manifest list encryption (KEKs/cache, and file length in key metadata).

manifestListFile.location(),
encryptingFileIO.encryptionManager().getClass());
StandardEncryptionManager standardEncryptionManager =
(StandardEncryptionManager) encryptingFileIO.encryptionManager();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment just above, if this is done in EncryptingFileIO then there is no need to cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncryptingFileIO is in the api module. Do we want to move it to the core module, or create an extending class there? So we'll have access to the functionality required for manifest list encryption (KEKs/cache, and file length in key metadata).

ByteBuffer.wrap(
keyDecryptor.decrypt(
ByteBuffers.toByteArray(manifestListFile.encryptedKeyMetadata()), null));
((BaseManifestListFile) manifestListFile).setDecryptedKeyMetadata(manifestListKeyMetadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of keeping the encryption key in the ManifestListFile after it has been decrypted?

I don't think this should be casting the instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, we can drop this.

return keyID;
}

public byte[] key() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should hold the key. The EncryptionManager can provide the key multiple times by caching results from unwrapKey. Callers should not hold on to the key any longer than needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a KEK cache entry. The cache is kept in the EncryptionManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, why is this class public? I would expect it to be a private static class with the EncryptionManager implementaiton.

*/
public StandardEncryptionManager(
String tableKeyId, int dataKeyLength, KeyManagementClient kmsClient) {
String tableKeyId, int dataKeyLength, KeyManagementClient kmsClient, long kekCacheTimeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new constructor? Seems like we're just going to use a default value and almost never change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different organizations can set the kekCacheTimeout differently, to address their particular security requirements. But that's probably an advanced feature. We can start with a default value; and make it configurable later - if needed (if we get a request from the community). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Flink produces a new snapshot every 2 minutes, we will have by default 144 KEKs a day.. Growing to a very large number in days and weeks. All of them unwrapped by KMS in readers, can be very expensive. We might want to keep this parameter configurable..
(now it is renamed to "writer KEK timeout")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we have an explicit NIST recommendation for this in https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf
"The recommended originator-usage period for a symmetric key-wrapping
key that is used to wrap very large numbers of keys over a short period of time is on the
order of a day or a week. If a relatively small number of keys are to be wrapped under a
key-wrapping key, the originator-usage period of the key-wrapping key could be up to
two years. "

@@ -51,6 +69,8 @@ public StandardEncryptionManager(
this.tableKeyId = tableKeyId;
this.kmsClient = kmsClient;
this.dataKeyLength = dataKeyLength;
this.kekCacheTimeout = kekCacheTimeout;
this.kekCache = Maps.newHashMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use a Cache that has a time-based eviction policy? (See the Caffeine javadocs)

   LoadingCache<String, ByteBuffer> kekCache = Caffeine.newBuilder()
       .expireAfterWrite(kekCacheTimeout, TimeUnit.MINUTES)
       .build(key -> unwrapKey(key));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably kekCacheTimeout is not an accurate name - the parameter basically means how long we can use a KEK in the writer, before we need to generate a new one. I'll rename it.
We need previous keys in the cache, so that readers don't call the KMS again.

}
}

public KeyEncryptionKey currentKEK() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be handled by the encryption manager. The current key is determined by the table not the encryption implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the KEK that is currently used by the writers (to encrypt the metadata of manifest list files). This KEK is the latest in the table KEK cache (kept in the encryption manager object).

return result;
}

public Map<String, KeyEncryptionKey> kekCache() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be exposed. I don't think that this class needs any new public methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll move this to the EncryptionUtil .

tableKeyId, dataKeyLength, kmsClient, CatalogProperties.KEK_CACHE_TIMEOUT_MS_DEFAULT);
}

public static EncryptionManager createEncryptionManager(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is passing extra options, it seems like this should just use the original method signature that extracts the key length and other settings (like cache timeout).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original signature took the table properties. The new parameter (kekCacheTimeout, to be renamed to something like writerKekTimeout) is taken from the catalog properties. But if we use a default value (per the other comments), then no need to pass it here for now.

return encryptedKeyMetadata;
}

public void setDecryptedKeyMetadata(ByteBuffer decryptedKeyMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this class should track the unencrypted key metadata. It can be decrypted when used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

}

static ManifestListFile create(
String location, EncryptionManager em, EncryptionKeyMetadata keyMetadata, long length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a method that should live in EncryptionUtil rather than here. This also needs the current key encryption key to be passed in.

Copy link
Contributor Author

@ggershinsky ggershinsky Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a method that should live in EncryptionUtil rather than here

SGTM

This also needs the current key encryption key to be passed in.

Per the other comments, the current KEK is the latest entry in the KEK cache, kept in the StandardEncryptionManager (accessible by the EncryptionUtil, but not by the SnapshotProducer that calls this "create manifest list file" method).

@ggershinsky ggershinsky closed this Aug 4, 2024
@ggershinsky ggershinsky reopened this Aug 4, 2024
@@ -143,7 +172,13 @@ private void cacheManifests(FileIO fileIO) {

if (allManifests == null) {
// if manifests isn't set, then the snapshotFile is set and should be read to get the list
this.allManifests = ManifestLists.read(fileIO.newInputFile(manifestListLocation));
InputFile manifestListInputFile = fileIO.newInputFile(manifestListFile.location());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this under if else (called only for unencrypted)

String metadataEncryptionKeyID();

/** Returns the manifest list key metadata, encrypted with its KEK. */
ByteBuffer encryptedKeyMetadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that these should expose both encrypted and unencrypted versions of key metadata. If this object stores the encrypted key metadata, then it should only exposed the encrypted version. Similarly, other classes that store the unencrypted metadata should not be responsible for encrypting. These are simple classes and should stay that way rather than being responsible for encryption or carrying multiple versions of the same thing.

@ggershinsky
Copy link
Contributor Author

@rdblue Thanks for the patch, I've merged it into this PR. We still need to sync on caching the unwrapped keys, I've added a commit that implements one way of doing this, will appreciate your review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants