Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-1477: Thrift crypto updates #124

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
59 changes: 36 additions & 23 deletions src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -670,8 +670,8 @@ struct EncryptionWithColumnKey {
/** Column path in schema **/
1: required list<string> path_in_schema

/** Retrieval metadata of the column-specific key **/
2: optional binary column_key_metadata
/** Retrieval metadata of column encryption key **/
2: optional binary key_metadata
}

union ColumnCryptoMetaData {
Expand Down Expand Up @@ -707,7 +707,10 @@ struct ColumnChunk {
7: optional i32 column_index_length

/** Crypto metadata of encrypted columns **/
8: optional ColumnCryptoMetaData crypto_meta_data
8: optional ColumnCryptoMetaData crypto_metadata

/** Encrypted column metadata for this chunk **/
9: optional binary encrypted_column_metadata
}

struct RowGroup {
Expand All @@ -734,6 +737,9 @@ struct RowGroup {
/** Total byte size of all compressed (and potentially encrypted) column data
* in this row group **/
6: optional i64 total_compressed_size

/** Row group ordinal in the file **/
7: optional i16 ordinal
}

/** Empty struct to signal the order defined by the physical or logical type */
Expand Down Expand Up @@ -863,23 +869,27 @@ struct ColumnIndex {
}

struct AesGcmV1 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AesGcmV1 and AesGcmCtrV1 seems identical. If they are same, can we just use one structure instead of duplicating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • we use them to specify two different algorithms. since we dont use enums, its a union with two structs
  • the design is signed off. we shouldn't change it, lets only make sure the Thrift pr conforms to the merged design.

/** Retrieval metadata of AAD used for encryption of pages and structures **/
1: optional binary aad_metadata
/** AAD prefix **/
1: optional binary aad_prefix

/** If file IVs are comprised of a fixed part, and variable parts
* (e.g. counter), keep the fixed part here **/
2: optional binary iv_prefix
/** Unique file identifier part of AAD suffix **/
2: optional binary aad_file_unique

/** In files encrypted with AAD prefix without storing it,
* readers must supply the prefix **/
3: optional bool supply_aad_prefix
}

struct AesGcmCtrV1 {
/** Retrieval metadata of AAD used for encryption of structures **/
1: optional binary aad_metadata

/** If file IVs are comprised of a fixed part, and variable parts
* (e.g. counter), keep the fixed part here **/
2: optional binary gcm_iv_prefix
/** AAD prefix **/
1: optional binary aad_prefix

3: optional binary ctr_iv_prefix
/** Unique file identifier part of AAD suffix **/
2: optional binary aad_file_unique

/** In files encrypted with AAD prefix without storing it,
* readers must supply the prefix **/
3: optional bool supply_aad_prefix
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 'supply_aad_prefix' and 'aad_prefix'/'aad_file_unique' mutual exclusive? If yes, should we make it as union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not.

}

union EncryptionAlgorithm {
Expand Down Expand Up @@ -932,27 +942,30 @@ struct FileMetaData {
7: optional list<ColumnOrder> column_orders

/**
* Encryption algorithm. Note that this field is only used for files
* with plaintext footer. Files with encrypted footer store the algorithm id
* Encryption algorithm. This field is set only in encrypted files
* with plaintext footer. Files with encrypted footer store algorithm id
* in FileCryptoMetaData structure.
*/
8: optional EncryptionAlgorithm encryption_algorithm

/**
* Retrieval metadata of key used for signing the footer.
* Used only in encrypted files with plaintext footer.
*/
9: optional binary footer_signing_key_metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not mentioned in the diagram 'Plaintext footer mode' of the design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

/** Crypto metadata for files with encrypted footer **/
struct FileCryptoMetaData {
/**
* Encryption algorithm. Note that this field is only used for files
* with encrypted footer. Files with plaintext footer store the algorithm id
* Encryption algorithm. This field is only used for files
* with encrypted footer. Files with plaintext footer store algorithm id
* inside footer (FileMetaData structure).
*/
1: required EncryptionAlgorithm encryption_algorithm

/** Retrieval metadata of key used for encryption of footer,
* and (possibly) columns **/
2: optional binary footer_key_metadata

/** Offset of encrypted Parquet footer **/
3: required i64 footer_offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'footer_offset' field still exists in the diagram at the section 'Encrypted footer mode' of the design document (https://github.com/ggershinsky/parquet-format/blob/p1232-encryption-docs/Encryption.md).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an outdated link. the current doc is at
https://github.com/apache/parquet-format/blob/encryption/Encryption.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the Parquet-1178(https://docs.google.com/document/d/1T89G7xR0zHFV1f2pjTO28jtfVm8qoNVGEJQ70Rsk-bY) to let it point to the current doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

2: optional binary key_metadata
}