PARQUET-1450: Detailed crypto specification #114

ggershinsky · 2018-10-25T10:58:54Z

No description provided.

Encryption.md

zivanfi · 2018-10-25T13:49:25Z

Encryption.md

-
-Parquet-mr/-cpp implementations use the RBG-based IV construction as defined in the NIST 
-SP 800-38D document for the GCM ciphers (section 8.2.2).
+The initialization vectors (IVs) in AES GCM and CTR ciphers must be unique for each encryption key, 


I may be misunderstanding the NIST SP 800-38D document, but in my interpretation IVs do not have to be unique for each encryption key, but for each set of inputs instead. One can naturally use the same encrpytion key for multiple inputs, and it would be problematic to use the same IV for them, since identical inputs would lead to identical ciphertexts in this case.

Okay, so I just realized that since the IV is stored in the EncryptionAlgorithm and the EncryptionAlgorithm is stored in the FileCryptoMetaData, it means that there is only a single IV for a whole Parquet file. This seems problematic to me, because:

Either we chain all blocks together, in which case one IV is enough, but we can not decrypt parts of the file without decryipting every block that preceeds it;

Or we reuse the same IV multiple times, in which case we can selectively decrypt parts of the file, but identical data will lead to identical ciphertexts, weakening the encryption (e.g., if two pages contain the same data).

Do I misunderstand something?

Hi Zoltan, have to run now, so will address this comment only, and the others tomorrow morning. We don't keep IVs in the EncryptionAlgorithm. We used to have a so-called iv_prefix for 8.2.1 - but we don't anymore, as per our discussion. Please check the Thrift structures in the doc - iv_prefix'es dont show up there. Once the doc is signed off, I'll send a pr removing these fields from parquet.thrift.
In any case - we have a fresh IV/nonce for each ciphertext (structure or page) in a file.

Great, thanks for the clarification. Just noticed the sentence "The IVs are written before each ciphertext.", sorry for not noticing it earler. Maybe you could make this a bit more explicit by saying that a new, unique IV is generated for every ciphertext and maybe also mention exactly how they are generated (using random number generation).

You could also add some diagrams to the corresponding sections like so:

IV (12 bytes) length (4 bytes) ciphertext (length bytes) authentication tag (16 bytes)

and so:

nonce (16 bytes) length (4 bytes) ciphertext (length bytes)

Zoltan is right. This states that the IV is unique for each key: "The initialization vectors (IVs) in AES GCM and CTR ciphers must be unique for each encryption key, since a single re-use of an IV breaks the encryption."

An IV must be unique for each encrypted stream. The key stream for each IV is identical, so I can take two encrypted streams with the same IV and XOR them to get the plain-text XORed together. If I can control or predict one of the plain-text streams, then I can decrypt without the key.

zivanfi · 2018-10-25T14:05:19Z

Encryption.md

+struct ColumnChunk {
+...
+  /** Crypto metadata of encrypted columns **/
+  8: optional ColumnCryptoMetaData crypto_meta_data


(nit) Just noticed that this field is named *_meta_data, while other structs use *_metadata. You may consider renaming this field here and in the proto itself.

this is named after the original "meta_data" field in the same structure. Lets indeed rename the other *_metadata fields to *_meta_data.

just noticed that the current FileMetaData structure has a field named key_value_metadata. So there are already metadata and meta_data fields..

I prefer _metadata to _meta_data. There is only one meta_data field name, for ColumnMetaData in ColumnChunk, and there are multiple _metadata fields. Let's standardize on _metadata here and from now on.

This change still needs to be made.

zivanfi · 2018-10-25T14:06:53Z

Encryption.md

+the file - if constructed with a timestamp or version number, the AAD will allow readers to make sure they
+work with a correct file version. If organization works with a number of different 
+AAD construction methods, the `aad_metadata` can contain the method ID. In any case, the AAD string itself
+must not be kept in this field - a reader reliance on this AAD will defy the cipher authentication purpose.


I don't understand this last sentece, could you please explain it?

Yep, the wording is not clear enough, I'll re-write this sentence. The point is as follows: say we have a claims processor that needs to run a query on yesterday's data, making sure it had not been replaced with an older data. The query will specify the AAD based on the expected file name: abcd_yesterday_date. There is no point in keeping this AAD in the file itself - because it can be tampered with / replaced - but mostly because the reader should know apriori what data version (date) it looks for.

I had to do some research to understand this, it turned out I had mistunderstood AAD earlier. I would propose a decription along the following lines: First state how AAD is used conventionally (AAD is not stored alongside the ciphertext, the ciphertext can only be decrypted when the AAD is provided to the decryption algorithm, essentially behaves like a second password but is used for an entirely different purpose). Then explain that in Parquet we do store the AAD as well (emphasizing that it is not the usual practice), because it allows more flexibility:

Readers that don't care about checking integrity can still read the data without providing an AAD, since they can read it from the file itself.

Readers that want integrity checking can provide an expected AAD and encryption will fail unless the same AAD was used for encryption (thereby ensuring integrity).

Readers that provide multiple AADs can first check whether the stored value is amongst the provided AADs and then do the integrity check using that single AAD instead of trying to decrypt the data using all providedf AADs.

Since this does not seem to be the usual practice, please also explain why storing the AAD does not degrade its security.

Ah, okay, so reading your description once more, it seems that I misunderstood again. :) The fields should not store the AAD itself, just some way to identify it (select which one to use).

zivanfi · 2018-10-25T14:09:30Z

Encryption.md

-
-Only the `FileCryptoMetaData` is written as a plaintext, all other file parts are protected
+The same magic bytes are written at the beginning of the file (offset 0).
+Parquet readers start file parsing by reading and checking the magic string. Therefore, the encrypted footer mode uses a new magic string ("PARE") in order to immediately inform legacy readers (expecting ‘PAR1’ bytes) that they can’t parse this file.


I would say the purpose of a different magic string is not to inform legacy reader that they can't read the file (they would fail anyway), but instead to inform new readers to look for a file crypto metadata at the end of the file instead of a footer.

Agreed, this is the main reason. Still, having an early and clear exception in legacy readers, informing of a wrong file type (instead of a more obscure exception later on) is helpful too. I'll update the text accordingly.

Encryption.md

zivanfi · 2018-10-30T10:12:19Z

Encryption.md

 length of `aad_metadata` is 512 bytes. Typically, readers would know the file AAD without relying
 on this field, using instead a method adopted by the organization. For example, the AAD can be the name of 
 the file - if constructed with a timestamp or version number, the AAD will allow readers to make sure they
 work with a correct file version. If organization works with a number of different 
-AAD construction methods, the `aad_metadata` can contain the method ID. In any case, the AAD string itself
-must not be kept in this field - a reader reliance on this AAD will defy the cipher authentication purpose.
+AAD construction methods, the `aad_metadata` can contain the method ID; otherwise, the `aad_metadata` can


Why did you remove the sentence that the AAD itself must not be kept in this field? I think that was one of the most important points. In fact, I think you should elaborate on that and describe the intended use of this field. If I understand correctly: if there are multiple AADs in use, we need a way to decide which one to use in each case. Storing some kind of ID or reference to the AAD in this field solves this purpose. Is this the intended use case? Is there any other one? Please describe them here briefly.

Turns out there is a usecase where it might make sense to store the AAD in this field. Say we have a dataset with 3 files: customers_ny_oct18, customers_pa_oct18 and customers_ct_oct18. We can store these strings as AADs in the 'aad_metadata' field - as long as the reader knows it must get Oct18 version and checks for "customer_??_oct18" AADs. The main message is in the last part - a reader must know what dataset it needs, and what AADs should be there. I'm adding a section on AADs to the doc.

zivanfi · 2018-10-30T10:16:01Z

Encryption.md

+possible to infer from other metadata. This might be exploited for an exotic
+attack on file integrity (such as swapping dictionary pages of two columns). While these
+attacks succeed under rare conditions and in any case don't harm data confidentiality - 
+they can be prevented by using the offset as a part of the GCM AAD formation. Therefore,


The offset of the very first page of every Parquet file is the same. Does this mean that a malicious person can swap the first pages of two Parquet files without knowing the encryption key?

Nope, this won't work, but another attack might: replacement of column pages in row group 0 by pages from the same column in row group 1.

ggershinsky · 2018-10-30T10:58:33Z

Yup, committed this too fast, will add more details on the AADs.

…

On Tue, Oct 30, 2018, 12:17 Zoltan Ivanfi ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In Encryption.md <#114 (comment)>: > length of `aad_metadata` is 512 bytes. Typically, readers would know the file AAD without relying on this field, using instead a method adopted by the organization. For example, the AAD can be the name of the file - if constructed with a timestamp or version number, the AAD will allow readers to make sure they work with a correct file version. If organization works with a number of different -AAD construction methods, the `aad_metadata` can contain the method ID. In any case, the AAD string itself -must not be kept in this field - a reader reliance on this AAD will defy the cipher authentication purpose. +AAD construction methods, the `aad_metadata` can contain the method ID; otherwise, the `aad_metadata` can Why did you remove the sentence that the AAD itself must not be kept in this field? I think that was one of the most important points. In fact, I think you should elaborate on that and describe the intended use of this field. If I understand correctly: if there are multiple AADs in use, we need a way to decide which one to use in each case. Storing some kind of ID or reference to the AAD in this field solves this purpose. Is this the intended use case? Is there any other one? Please describe them here briefly. ------------------------------ In Encryption.md <#114 (comment)>: > The column pages (data and dictionary) are encrypted after compression. For each page, the encryption buffer is comprised of an IV (nonce), ciphertext and (in case of AES_GCM_V1) of a tag, as described in the Algorithms section. Only the buffer is written to the output stream - no need to write the length of the buffer, since the length (page size after compression and encryption) is kept in the page headers. +| IV (12 bytes) | ciphertext (`page_size` bytes) | tag (16 bytes) | +|---------------|--------------------------------|----------------| + +or + +| IV (16 bytes) | ciphertext (`page_size` bytes) | +|---------------|--------------------------------| + +Sometimes, a number of encrypted columns have their file offset either directly visible or +possible to infer from other metadata. This might be exploited for an exotic +attack on file integrity (such as swapping dictionary pages of two columns). While these +attacks succeed under rare conditions and in any case don't harm data confidentiality - +they can be prevented by using the offset as a part of the GCM AAD formation. Therefore, The offset of the very first page of every Parquet file is the same. Does this mean that a malicious person can swap the first pages of two Parquet files without knowing the encryption key? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#114 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALecvPVnL33UxwZilCx01DQovsYLzn8oks5uqCdQgaJpZM4X6BXq> .

zivanfi · 2018-10-31T09:02:47Z

Looks good to me. One final thing to consider (at least in my opinion) is the mixing of IVs and nonces in the terminology. Currently these terms are used somewhat inconsistently. I would prefer if the docs either always used IV or if it explicitly called out that the correct terminology depends on the encryption mode in use.

Additionally, according to this discussion, a nonce seems to be a weaker term than an IV, since a nonce only requires uniqueness, while an IV also requires unpredictability, so the first can be a regular counter but the latter must be random. This sounds like one more reason not to mix their usage.

ggershinsky · 2018-10-31T09:54:30Z

As Ryan has pointed out, GCM usually uses "IV" terminology, while CTR calls them "nonces". I'll add this detail to the doc.

The referenced discussion is related to a different AES mode (CBC), where an IV indeed must be unpredictable. There is no such requirement in GCM or CTR modes. In general, the term IV refers to a byte array used for cipher initialization, with requirements depending on the cipher mode (eg there are modes where an IV doesnt have to be unique).

julienledem · 2018-11-06T17:19:20Z

@rdblue: Gidon has updated the doc based on your comments. Does it look good to you now?

rdblue · 2018-11-06T18:00:27Z

Encryption.md

 while allowing for a regular Parquet functionality (columnar projection, 
-predicate pushdown, encoding and compression). The mechanism also enables column access 
+predicate pushdown, encoding and compression). This mechanism also enables column access 


I don't think this document should make the claim that this "enables column access control". Using different encryption keys can be used in a scheme that provides some form of access control, but in itself it doesn't provide what we commonly think of as "access control" because there is no way to revoke that access.

This document should be clear about what the feature provides -- the ability to encrypt columns with different keys or not encrypt a column -- and should not make claims about security guarantees that can be partially built with these features.

rdblue · 2018-11-06T18:03:14Z

Encryption.md

@@ -39,32 +39,32 @@ speed and access control granularity.
 2. Implement "client-side" encryption/decryption (storage client). The storage server 
 must not see plaintext data, metadata or encryption keys.
 3. Leverage authenticated encryption that allows clients to check integrity of the 
-retrieved data - making sure the file (or file parts) had not been replaced with a 
+retrieved data - making sure the file (or file parts) have not been replaced with a 
 wrong version, or tampered with otherwise.
 4. Support column access control - by enabling different encryption keys for different 


As I noted above, this goal should be rephrased to state that columns can use different encryption keys. It should not claim to provide access control.

rdblue · 2018-11-06T18:10:22Z

Encryption.md

-with its version (or creation timestamp) as the AAD, to verify the file has not been 
-replaced with an older version.
+with its version (or creation timestamp) as the AAD, to verify that the file has not been 
+replaced with an older version (see below).


See what below? Which section? This should be a description, ideally with a link.

In the AAD section below, an AAD scheme is specified. If an AAD scheme is required, then the statement that the user can do something is false. This encryption spec requires that the AAD is used to authenticate streams such that no stream can be replaced. with one from a different file or moved.

From below:

Parquet modular encryption uses the AAD mode to protect against swapping ciphertext modules inside a file, between files - or against swapping full files (for example, replacement of a file with an old version)"

rdblue · 2018-11-06T18:21:42Z

Encryption.md

 columns in an encrypted file.
-9. Miminize overhead of encryption: in terms of size of encrypted files, and throughput
+9. Minimize overhead of encryption: in terms of size of encrypted files, and throughput
 of write/read operations.


 ## Technical Approach

 Each Parquet module (footer, page headers, pages, column indexes, column metadata) is 


The use of "module" is confusing later because it isn't used elsewhere it Parquet documentation. Please add a definition instead of using it and defining it at the same time.

rdblue · 2018-11-06T18:27:38Z

Encryption.md

-written after each ciphertext. The IVs (12 bytes) are written before each ciphertext.
+All modules are encrypted by an AES-GCM cipher, without padding. Unique IVs are generated and 
+written before each ciphertext. The IV length is 12 
+bytes (96 bits), which is generally considered to be the optimal size of GCM IVs.  The NIST 


Please remove statements of opinion, like "... which is generally considered to be the optimal size of ..."

rdblue · 2018-11-06T18:30:14Z

Encryption.md



 ### AES_GCM_V1
-All modules are encrypted with the AES-GCM cipher. The authentication tags (16 bytes) are 
-written after each ciphertext. The IVs (12 bytes) are written before each ciphertext.
+All modules are encrypted by an AES-GCM cipher, without padding. Unique IVs are generated and 


Is the padding statement clear enough? Is there a reference we could point to that shows how to encrypt the last, under-filled block?

I'm using the standard APIs for that, such as
javax.crypto.Cipher.getInstance("AES/GCM/NoPadding"); in Java.
As for a reference, we can point to
https://crypto.stackexchange.com/questions/42412/gcm-padding-or-not
or
https://docs.oracle.com/javase/9/docs/api/javax/crypto/Cipher.html
or some other reference

The mode used for a Java library is not sufficient to document this. A reference for exactly what that mode does would be, if not a description here of how this is done.

Yep, makes sense. I'll try to find one, or let me know if you come across something suitable.

Ok, it looks like the most authoritative references for this are
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
https://tools.ietf.org/html/rfc3686

CTR is explicitly excluded from the list of padded ciphers in the former. The latter says:
"AES-CTR does not require the plaintext to be padded to a multiple of the block size"

Also, we have these from different sources:

"because CTR mode (contrary to CBC mode) does not require padding to block boundaries, and AES-CTR would also be compatible with no padding at all, or with strict PKCS#5 padding"

"Internally GCM really is CTR mode along with a polynomial hashing function applied on the ciphertext. CTR-mode doesn't need padding because you can just partly use the bits the last counter block generated and the polynomial hash does use (zero-)padding."

"GCM is a streaming mode which means that the ciphertext is only as long as the plaintext (not including authentication tag). GCM doesn't require a padding. This means that the PKCS5Padding version is actually only a synonym for NoPadding for convenience during programming. Some providers don't have this strange mode."

So, while the NIST spec text is not great on this, the picture is clear. GCM/NoPadding implementations will be compatible in any language and crypto provider. Same is true for CTR/NoPadding.

Are there "no padding" modes for C++, Python, etc.? The libraries you mention are all JVM-based.

OpenSSL is C++, we're using it in the parquet-cpp implementation. Tested for interop with the parquet-mr.
Anyways, adding the no-padding details to the spec.

Sounds good. I just want to make sure that there isn't any confusion that would cause incompatibility. If based on the available APIs you think it is perfectly clear, then that is fine.

Starting to look at this, I've realized I'm not comfortable with providing details that are not provided in the NIST specifications for these ciphers. The reason NIST explicitly ignores padding for the CTR and GCM modes is likely to be this: ".. CTR modes do not require any special measures to handle messages whose lengths are not multiples of the block size, since the modes work by XORing the plaintext with the output of the block cipher".
Therefore, I think we need to require from a Parquet implementer to use a cryptographic provider that implements these two ciphers according to the NIST specification. That will be sufficient to guarantee an interop. As an example, we already know the SunJCE and OpenSSL providers are interoperable, since we have tested this in the parquet-mr and parquet-cpp implementations.

Okay, then let's leave it at "no padding" then.

rdblue · 2018-11-06T18:41:38Z

Encryption.md

-Thrift modules are encrypted with the AES-GCM cipher, as described above. 
-The pages are encrypted with AES-CTR, where the IVs (16 bytes) are written before each 
-ciphertext.
+Thrift modules are encrypted with AES-GCM, as described above. The pages are encrypted by an AES-CTR


This mixes the layout for AES-GCM or AES-CTR with the Parquet modes, AES_GCM_v1 and AES_GCM_CTR_V1.

Those should be separate to be more clear for implementers. This should have a section for AES-GCM and AES-CTR that specifies the layout (CTR: IV, length, ciphertext; GCM: IV, length, ciphertext, auth tag) and a separate section that specifies the Parquet modes: AES_GCM_V1 that uses AES-GCM mode used for all encryption, and AES_GCM_CTR_V1 that uses AES-GCM mode for metadata structures and AES-CTR for page data.

rdblue · 2018-11-06T18:42:35Z

Encryption.md

-The pages are encrypted with AES-CTR, where the IVs (16 bytes) are written before each 
-ciphertext.
+Thrift modules are encrypted with AES-GCM, as described above. The pages are encrypted by an AES-CTR
+cipher without padding. The nonces are generated and written before each ciphertext, with the nonce length of 16 bytes 


Either "nonce" or "IV" should be used throughout the doc. Terminology should not change between GCM and CTR.

rdblue · 2018-11-06T18:43:50Z

Encryption.md

-ciphertext.
+Thrift modules are encrypted with AES-GCM, as described above. The pages are encrypted by an AES-CTR
+cipher without padding. The nonces are generated and written before each ciphertext, with the nonce length of 16 bytes 
+(128 bits). At least 12 bytes out of 16 are constructed with an RBG.


Why are nonces and IVs different lengths? Why can't a 12-byte IV that is entirely generated with an RBG be used for CTR mode?

https://crypto.stackexchange.com/questions/41601/aes-gcm-recommended-iv-size-why-12-bytes/41610
GCM runs CTR internally which requires a 16-byte counter. The IV provides 12 of those, the other 4 are an actual block-wise counter.
The AES block size is 128 bits, that's 16 bytes.
Out of those 16, the first 12 are the IV and the remaining 4 are the counter (starting from 1).

https://stackoverflow.com/questions/4608489/how-to-pick-an-appropriate-iv-initialization-vector-for-aes-ctr-nopadding
use a cryptographically secure random number generator, and create a new 16-byte random IV for every message. I underline "cryptographically secure" because that's important; a basic random number generator is not enough. With Java, use java.util.SecureRandom. With Win32, call CryptGenRandom(). With a random selection, the space of possible 128-bit IV is large enough that collisions are extremely improbable. Actually, that's why AES uses 128-bit blocks (thus implying 128-bit IV).

https://security.stackexchange.com/questions/90848/encrypting-using-aes-256-can-i-use-256-bits-iv
AES uses 128-bit blocks, so a 128-bit IV. Note that AES-256 uses a 256-bit key (hence the name), but still with 128-bit blocks.

The links don't address my question. AES-GCM requires a 12-byte IV, to which a 4-byte counter is added to get the nonce for AES-CTR. Why doesn't the counter mode do the same and take an IV to which it adds the counter? Is this a quirk of the API for the AES-CTR library you're using, or is there a real difference between how the two are handled?

It works like this:

AES CTR
requires a user to provide a 16 byte IV for each encryption operation [API call to convert n-byte plaintext into n-byte ciphertext - takes the following input parameters: key (16 bytes), IV (16 bytes), plaintext offset, and length n].

Per the quotes above, the IV can be constructed by the user in one of the two following ways:
a. First 12 bytes are random bytes, the other 4 are set to a value of 1.
b. All 16 bytes are random.

Inside, CTR implementation will use the last 4 bytes as a counter field (starting with the provided value) when running over n bytes split into 16 byte chunks.

AES GCM
requires a user to provide a 12 byte IV (see NIST spec) for each encryption operation [API call to convert n-byte plaintext into n-byte ciphertext and 16-byte tag - takes the following input parameters: key (16 bytes), IV (12 bytes), plaintext offset, and length n].

Per the NIST spec section 8.2.2, the IV is constructed by the user as a 12-byte random array.

Inside, GCM runs a CTR cipher that gets a 16-byte IV, constructed from the 12-byte array provided by user, and extra 4 bytes provided by the GCM implementation itself. The CTR cipher then runs as described above ("will use the last 4 bytes as a counter field (starting with the provided value) when running over n bytes split into 16 byte chunks").

Okay, then it sounds like the problem is just with the API that is used. One accepts a 12-byte IV and constructs a 16-byte IV+counter, the other accepts a 16-byte nonce where 12 bytes are an IV and it overwrites some bytes with the counter.

For Parquet, we don't need two formats. The encryption library should use the same 12-byte IV to construct a nonce that is acceptable for passing into AES-CTR if that API is different.

Sounds good. The CTR option a. above allows us to use 12 bytes for IV storage in Parquet for both GCM and CTR ciphers.

rdblue · 2018-11-06T18:49:55Z

Encryption.md

+the file. 
+
+Parquet constructs a module AAD from two components: an optional AAD prefix - a string provided 
+by the user for the file, and an AAD suffix, built internally for each encrypted module inside 


For each AES-GCM encrypted module.

rdblue · 2018-11-06T18:51:03Z

Encryption.md

+the file. The AAD prefix should reflect the target identity that helps to detect file swapping - 
+for example, table name and version or date / timestamp (e.g., "employees_23_05_2018"). The 
+AAD suffix reflects the internal identity of modules inside the file, which for example prevents 
+replacement of column pages in row group 0 by pages from the same column in row group 1. The module 


Instead of "pages" use "AES-GCM ciphertext" because pages may use AES-CTR.

rdblue · 2018-11-06T18:54:23Z

Encryption.md

+
+A file writer passes an AAD prefix string upon file creation, that allows to differentate it from other
+files and datasets. The reader should know this string apriori, or should be able to retrieve and verify
+it, using a convention accepted in the organization.


Why not generate a random AAD prefix?

If Parquet relies on users to supply a prefix, then it allows the case where files are not protected by an AAD or where files use the same AAD. That allows attackers to swap ciphertext streams when the user has been lazy or doesn't understand the purpose of the AAD.

A random AAD prefix would ensure that ciphertext cannot be moved between files and the AAD suffix ensures that ciphertext cannot be moved between row groups or columns.

Another alternative is to allow users to pass a prefix, but to add a random portion to the suffix that makes the file AADs unique.

Random AAD prefix is possible, but requires a trusted service where these prefixes are stored for every file. A reader must be able to retrieve an AAD prefix for every file. Since the data storage is untrusted (thats why we use the keys and AADs in the first place), the AAD prefixes must be stored in a secure location, and serviced by an authenticated process.
Unlike keys, AADs are not secret. Hence, having a "well known" method of AAD prefix creation (such as <table name>_<date> or <table name>_<version>) allows writers and readers to create/verify the file AAD prefix without a remote service.
Still, I'll add your suggestion as an example of possible AAD prefixes and their store/fetch service.

The AAD prefix is stored in the encryption metadata. I'm suggesting adding a per-file random to that. Users can add to the AAD prefix if they choose, but if they don't supply anything, then something unique to the file should be there. That way even if users are dumb, files are protected from replacing columns or pages with encrypted ones from other files. Isn't that the point?

AAD prefix is not necessarily stored in the aad_metadata. The "well known" method approach, described above, doesn't require setting the aad_metadata at all. But I agree it does require smart users, that can generate and verify unique AAD prefixes. Dumb users should work with a trusted service that keeps the random AAD prefixes created for each file. I'll add this to the text (maybe avoiding the dumb/smart terminology :). Similar to key metadata section, we'll have a number of example usages of the aad metadata field.

The example above describes the "smart user" scenario - because both writer and reader know the AAD without a help from any service. They just use a convention "well known" in the organization for table AAD creation. That is, the same code is running in both writer and reader, creating matching AADs for every table.

Here goes the "dumb user" scenario - which also doesn't require the aad_metadata field.

We use a random string as the file AAD.

Writer generates a random string and uses it as an AAD for the file. It also sends the table name ("employees"), month of creation, "10_<CURRENT_YEAR>", and the generated AAD, to a trusted AAD service. It might also send the file name ("employees_10_<CURRENT_YEAR>).

Reader runs on the day of 9 Nov 2018 - it goes to the trusted (authenticated) AAD service, and asks for the AAD for the latest version of employees table. The service sends the AAD, and the file name. The reader opens the file, using the AAD to verify its authenticity.

As before, we don't need "aad_metadata" for any of these. Putting the random AAD in this field or elsewhere in the file is not safe, can be modified by attacker.

The attack I'm addressing is slightly different. In this scenario, users don't know about supplying an AAD. User AAD metadata is empty in the 2017 version and the 2018 version.

The suffix part of the AAD is used to prevent pages from being swapped, so I can't overwrite row group 0 column 1 page 3 with row group 0 column 1 page 4.

But there isn't anything that prevents an attacker from replacing row group 0 column 1 page 4 with the same page from the 2017 version of the file. A per-file unique that is stored in the file would prevent page swapping between any 2 files, even if the unique string is in the file metadata.

This would not protect against copying 2017 over 2018 like your example, but it does at least ensure the pages a file has not been tampered with, in relation to each other.

Oh, I see. Protecting against page swapping between files - without being dependent on user input. Sounds good, adding aad_file_unique to the EncryptionAlgorithm structs.

I find the described scenarios a bit oversimplied. Of course, one can use Parquet in the way desribed (separate Parquet files for each month each being read explicitly), but I think in the much more common scenario a directory of Parquet files will correspond to a table without a one-to-one mapping between files and meaningful attributes like months.

When one uses MapReduce or RDD to process big data, different parts of the dataset get processed and written by different machines of the cluster. Instead of writing the data to a single file, each machine writes to a separate file. The individual file names don't even matter.

Additionally, when more data is written to the same table, it is not appended to existing files but new files get added. Impala can't even write arbitrary amount of data into a single Parquet file, it starts a new one for each block instead.

Could the discussed integrity checking measures be used to protect against (partial or full) file-swapping/duplicating/deleting attacks in such a scenario?

Yep, the PARQUET-1457 addresses this scenario. As for the AAD prefix section in this doc - I'm currently updating it with more details - with the same focus on integrity of individual files, since Parquet API (ParquetFileWriter and ParquetFileReader) is designed for writing/reading individual files. Data sets (tables, dataFrames, etc) are handled by frameworks above Parquet. The data set integrity can be handled by framework tools, with or without PARQUET-1457 help. I'll mention this in the doc.

rdblue · 2018-11-06T19:02:06Z

Encryption.md

+confidentiality - they can be prevented by using the column and row group identity as a part of GCM AAD 
+formation. Therefore, the `PageHeader` structures and the pages (in case of AES_GCM_V1) are enciphered using 
+an AAD suffix built by direct concatenation of two parts: the row group offset in the file (8 bytes, little endian) 
+and the column name (for nested columns, a concatenated path with "." separator) serialized with the UTF-8 charset.


An AAD suffix constructed this way could go over the size. Why not use column ordinal, which is required by Parquet to match the order in the file schema?

I think the suffix should be made from a per-file random, row group ordinal, column ordinal, and an id for the type of structure encrypted (e.g., data page, data page header, etc.)

The limit on AAD size is in gigabytes, so the column name and row group offset are ok. The ordinals can also work, but they require extra code, and also can't be verified directly since they are not kept in the protected structures such as ColumnMetadata and RowGroup.
We'll add per-file random and the structure type.

Checked the code - the column ordinal will be feasible, as both readers and writers have a simple loop on column chunks. I agree this will save the path String concatenation and serialization. We need to decide whether to use 2 or 4 bytes for the column ordinal component in AAD suffixes. Is there a limit on the number of columns? Is 32K enough?

The situation with row groups is different. In readers, the RowGroup list is filtered inside the FileMetaData structure, before being processed by a higher level code. Therefore, we can't rely on the list/loop count.
On the other hand, the cost of serialization of 8 byte longs is basically as negligible as the cost of serialization of 4 byte ints or 2 byte shorts. Therefore, we can use the row group offset (kept inside RowGroup structures) as a part of AAD suffix.
Or we can add an i32 or i16 "ordinal" field to the RowGroup structure.
I'm fine with either option, let me know your preference.

32k sounds like enough to me.

I'm fine either adding an ordinal or using the offset. I slightly prefer ordinal since it sounds generally useful. And a 2-byte ordinal would work here, too.

rdblue · 2018-11-06T19:03:49Z

Encryption.md

@@ -132,49 +203,164 @@ the encryption buffer is comprised of an IV, ciphertext and tag, as described in
 Algorithms section. The length of the encryption buffer (a 4-byte little endian) is 
 written to the output stream, followed by the buffer itself.

+|length (4 bytes) | IV (12 bytes) | ciphertext (`length` bytes) | tag (16 bytes) |


+1 for including this diagram.

Why is IV between length and ciphertext? I think IV should come before length. Otherwise, it isn't clear that length is the ciphertext length and not the length of IV, ciphertext, and tag, or for CTR, just IV and ciphertext.

rdblue · 2018-11-06T19:09:03Z

Encryption.md

 compression and encryption) is kept in the page headers.

-A `crypto_meta_data` field in set in each `ColumnChunk` in the encrypted columns. 
+| IV (12 bytes) | ciphertext (`page_size` bytes) | tag (16 bytes) |


I would rather have length of the ciphertext stored in all cases so that these are always the same. The overhead isn't that high.

The main reason is not the storage overhead. Having the length here, exposes the offset and size of every page - which can be exploited for integrity and other attacks. Storing the length inside encrypted and tamper-proofed PageHeaders makes the page offset/size invisible (only the offset of the first page is known; the size of the first page and offset/size of all other pages are hidden).
And saves a bit of file space.

Can you describe an integrity attack that uses this information?

In general, if you're worried about the length of the ciphertext giving away information, I think you should be addressing that directly.

E.g. page swapping attacks - but mostly the attacks we don't know about today, its impossible to foresee all possible future threats. Given two options - one that hides the position/size of every ciphertext (with its IVs, tags, etc), at no cost - and one that exposes them all, with storage overhead - we should choose the first one, unless there is a very strong reason not to.

AES-GCM and AADs are used to avoid page swapping attacks.

I would rather all encryption buffers be stored the same way, but I don't have a strong opinion here.

@julienledem or @zivanfi: do you have an opinion?

Oh, I see. 'plaintext' threw me off, we keep the ciphertext length, but you mean the plaintext length before encryption.

I'm okay with using option 1 for all IVs.
Sounds good, adding to the spec.

Because the page header must have a length, an attacker can identify where each encrypted page header starts and work backward from there.
Yep, possible, but with a certain effort / probability of success. If we replicate and explicitly expose the page length, an attack will require no effort and will be 100% successful. I guess my point here - security is not a binary thing, its often about defense in depth, making an attacker work harder to learn something about the sensitive data. Sometimes, its worth an effort on the defender part to hide things. But in this case, we don't even have to make an effort - we get this feature out of box. We just shouldn't make an effort to expose this field; especially since we won't use it; only an attacker would. More on this later.

security through obscurity
Agreed, we don't want this. The only place I relied on obscurity was for page swapping attacks within the same chunk. I'm adding a page ordinal to the spec (4 bytes; but let me know if 2 are enough). With this addition, the spec won't have any dependency on obscurity, both implicitly and explicitly (it is not mentioned as a part of the security design).

hiding the lengths ... makes this spec harder to implement
That's where we differ. Page length is kept in page headers and read from there by the existing Parquet code. Once the page header is encrypted, we don't need to make any effort to hide the length. Exposing the length will actually require an effort and Parquet code changes, as follows:

Writer will need to serialize the length and write it to the stream.

Reader doesn't need this field. We will have to change Parquet reader code - to either read and deserialize this field, in addition to the length in PageHeaders - or to ignore it, which will require some form of data skipping, with or without a seek.

Again, all this might be done, for a good cause. But in my view, the net result of this effort would be helping an attacker.

If the length of a plaintext page is something that should be protected, then Parquet should pad before encrypting. Anything else is security through obscurity, which you agree we should not employ here. I don't buy the argument about making it harder. Page headers are reliably short and those lengths will be easy to identify.

Hiding the length does make this harder to implement. Instead of a function to go read an encrypted buffer and return its bytes, you need to have two methods: one with an explicit length and one without. Instead of having a method that encrypts bytes and writes out a buffer, you need two: one that write the length and one that only returns it. You need to make sure that the right method is used in the right places. And worse, you must make it absolutely clear for other people that are implementing the spec.

Any time you add special cases, it makes a spec harder to implement.

This comes from the same motivation for using only one IV format. The fewer cases, the easier this is to implement. That makes it more reliable.

I'm against trading clarity in the spec just to obscure something that can be easily recovered and isn't a problem anyway. If you want to hide the length of the plaintext, then hide it by padding before encryption. Don't obscure it at the cost of simplicity.

let me know if 2 are enough

2 bytes for a page ordinal within a row group should be sufficient.

Ok. Given the technical trade-offs involved, this is not worth delaying the work.

rdblue · 2018-12-03T19:07:55Z

Encryption.md

+   (potentially, along with file names/paths).
+
+#### 4.4.2 AAD_Suffix
+File integrity can be disrupted by module swapping inside the file - such as replacement 


These sentences are written in a misleading way. They state that "file integrity can be disrupted by module swapping inside the file" first, and only later introduce the AAD_suffix to prevent it.

Instead, this should state that the purpose of the AAD_suffix is to prevent the following integrity attacks, then have a list of some of the swapping attacks. That way, the reader knows that this specification prevents those attacks.

Consider a similar change to the AAD_prefix section above.

rdblue · 2018-12-03T19:08:33Z

Encryption.md

+of column pages in row group 0 by pages from the same column in row group 1. Also, if an encryption 
+key is re-used in multiple files and the user has not provided a unique AAD_Prefix for each file – an 
+attacker can swap modules between the files. While these attacks succeed under rare conditions and in 
+any case don't harm data confidentiality - they can be prevented by using an internally generated file 


They are prevented, at least in full GCM mode.

rdblue · 2018-12-03T19:09:54Z

Encryption.md

+and module identity as a part of GCM AAD formation in Parquet. 
+
+The AAD_Suffix is built by direct concatenation of the following parts: 
+1.	[All modules] random byte array (file identifier): variable length


I don't think that "(file identifier)" is sufficient. Be more clear that this is a random byte array generated once per file and stored in the footer encryption struct.

I think just referring to this as the "file identifier, a random byte array generated for each file of implementation-defined length" works.

rdblue · 2018-12-03T19:12:04Z

Encryption.md

+
+|                      |File ID     | Module type | Row group ordinal | Column ordinal | Page ordinal|
+|----------------------|------------|-------------|-------------------|----------------|-------------|
+| Footer               |     V      |    V (0)    |         X         |       X        |      X      |


What do V and X indicate?

V for check (yes), X for no.
Suggestions on alternatives are welcome.

I would probably use a character to indicate it is used and blank to indicate it is not. V for a checkmark isn't clear.

Whatever characters you decide to use, the convention should be explained somewhere. Readers should never guess what something means in a spec.

rdblue · 2018-12-03T19:14:38Z

Encryption.md

+|-----------------|------------------|------------------------------|
+
+
+A `crypto_meta_data` field is set in each ColumnChunk in the encrypted columns. ColumnCryptoMetaData 


I thought that the decision was to use "metadata" instead of "meta_data"?

parquet-cpp has released the version 1.5.0 with these crypto structures.
https://github.com/apache/parquet-cpp/blob/apache-parquet-cpp-1.5.0/src/parquet/parquet.thrift#L708

While this version cant read files with encrypted footer, or encrypted columns in files with plaintext footer - it should be able to read plaintext columns in encrypted files with plaintext footer - as any legacy reader.

However, this means we cant rename a couple of fields, including this one. The cpp thrift above has:

struct ColumnChunk { 8: optional ColumnCryptoMetaData crypto_meta_data

I thought the point of field IDs was that we can rename fields. It may just work.

As for the 1.5.0 release of parquet-cpp, I don't think we intend to support files written with the encryption structures from that release. The spec was not approved.

And aren't we making a lot of other changes to the spec anyway? I would expect more than just this to break. Column metadata, for example, has moved.

As for the 1.5.0 release of parquet-cpp, I don't think we intend to support files written with the encryption structures from that release. The spec was not approved.
Yep, but this version can't even write files with encryption structures (they are not set, simply because it has no code that drives encryption). My concern is different - this version, like any other legacy parquet reader, should be able to read plaintext columns in encrypted files (with plaintext footer), written by new writers.

And aren't we making a lot of other changes to the spec anyway? I would expect more than just this to break. Column metadata, for example, has moved.
things should be ok for parquet-cpp-1.5.0 working as a legacy reader on plaintext columns. Unless there is a name collision on a couple of crypto fields in ColumnChunk and FileMetaData. Keeping eg the name crypto_meta_data instead of crypto_metadata seems to be a reasonable price to pay. I'll also experiment with your suggestion on fields IDs - if this works, then no problem with renaming them.

I think IDs should work here.

They do! The names seem to be used just for API generation.
Renaming the fields.

rdblue · 2018-12-03T19:15:34Z

Encryption.md

+  1: required list<string> path_in_schema
+
+  /** Retrieval metadata of the column-specific key **/
+  2: optional binary column_key_metadata


Should this be named "key_metadata" since it is already in a column-specific structure?

Same as above,
https://github.com/apache/parquet-cpp/blob/apache-parquet-cpp-1.5.0/src/parquet/parquet.thrift#L672

rdblue · 2018-12-03T19:17:03Z

Encryption.md

+   readers get the public or symmetric key, verify the signature file and get the AAD prefixes for the table files 
+   (potentially, along with file names/paths).
+
+#### 4.4.2 AAD_Suffix


Why is this "AAD_Suffix" and not "aad_suffix"? I would prefer keeping it lower-case since that's used elsewhere. For example, "aad_metadata".

rdblue · 2018-12-03T19:21:39Z

Encryption.md

+...
+
+  /** Column metadata for this chunk.. **/
+  3: optional ColumnMetaData meta_data


Do we know how older readers act when meta_data is null?

That will be the case with an unencrypted footer and encrypted columns.

Nevermind, I see that meta_data is still required in plaintext footer mode. Glad to see you already caught it!

rdblue · 2018-12-03T19:23:56Z

Encryption.md

+number of rows, key-value properties, column sort order, names of the encrypted columns  
+and metadata of the column encryption keys. 
+
+The columns encrypted with the same key as the footer leave the column metadata at the original 


To be clear, can you update this to "must leave"? I don't want this interpreted as optional when the key is identical.

rdblue · 2018-12-03T19:25:51Z

Encryption.md

+  /** Column metadata for this chunk.. **/
+  3: optional ColumnMetaData meta_data
+..
+


You might want to copy the crypto_metadata field in this snippet as well, to be clear that it is along side encrypted_column_metadata and meta_data.

rdblue · 2018-12-03T19:27:36Z

Encryption.md

+is Thrift-serialized, encrypted with the column key and written to the `encrypted_column_metadata` 
+field in the `ColumnChunk` structure, as described in the section 5.1.
+
+A Thrift-serialized `FileCryptoMetaData` structure is written before the encrypted footer. 


A table diagram of the layout like the ones for encrypted buffers would help clarity here.

This should be visible in the figures I'm working on.

rdblue · 2018-12-03T19:28:26Z

Encryption.md

+}
+```
+
+<!-- 


Remove this?

I guess this is the figure placeholder. I'll uncomment it when adding them back next week.

rdblue · 2018-12-03T19:31:27Z

Encryption.md

+
+  /** Retrieval metadata of key used for encryption of footer, 
+   *  and (possibly) columns **/
+  2: optional binary footer_key_metadata


Should this be just key_metadata since it is in a footer-specific struct?

rdblue · 2018-12-03T19:42:46Z

Encryption.md

+   -->
+
+
+### 5.4 New fields for vectorized readers


What broke that requires this?

I think this is an addition to the Parquet format, not to encryption. Why is it needed for encryption but not for normal vectorized reads? Can't total compressed size be inferred from page locations?

Also, total_compressed_size is misleading. It is total size on disk, not just the compressed size. Page buffers are compressed, but may also be encrypted. Page headers aren't compressed.

This helps normal vectorized reads too - instead of looping on all column chunks, they can just read the row group size field.

The column chunk length is not available in files with encrypted footer (unless a reader has all column keys) - therefore, these new fields are required there. It might be possible to reconstruct the size from page lengths and other available file metadata, but it wont be easy or elegant ( eg the row group offset is also unknown, so we wouldn't know where the pages start; its also not obvious where they end for each row group).
I think adding these two fields is the right thing to do for both regular and encrypted files.

Regarding the name total_compressed_size - it copies an existing field in ColumnMetaData structure, called the same:

/** total byte size of all compressed pages in this column chunk (including the headers) **/ 7: required i64 total_compressed_size

regarding section 5.4 - we can replace it by adding a short text in section "5.2 Encrypted footer mode", explaining that these two fields must be set in RowGroup structures in files created in this mode.

I think adding these two fields is the right thing to do for both regular and encrypted files

If that's the case, then we should add the fields to the spec as a separate change.

A reader can find the starting point of all of the columns it intends to read and work from there. Vectorized reads don't necessarily need a stopping point and that can be inferred most of the time.

I don't see a need to over-complicate this spec with this concern, let's do it separately.

rdblue · 2018-12-03T19:43:43Z

Encryption.md

+## 6. Encryption Overhead
+The size overhead of Parquet modular encryption is negligible, since most of the encryption 
+operations are performed on pages (the minimal unit of Parquet data storage and compression). 
+The overhead order of magnitude is adding 1 byte per each ~30,000 bytes of original compressed 


This is fine because it is a fact. But what is this based on?

Overhead is going to depend on the total size of data after encoding and compression. So overhead is dependent on the data that was written.

rdblue · 2018-12-03T19:45:32Z

Encryption.md

+The throughput overhead of Parquet modular encryption depends on whether AES enciphering is 
+done in software or hardware. In both cases, performing encryption on full pages (~1MB buffers) 
+instead of on much smaller individual data values causes AES to work at its maximal speed. 
+Preliminary tests show Parquet modular encryption throughput overhead to be up to a few 


A spec should not contain statements that will be outdated later as more data is gathered. The spec should not argue for itself to exist. It should simply and clearly state its requirements and little else.

ggershinsky · 2018-12-06T17:28:24Z

@rdblue As before, the comments I haven't replied to, sound good. Please review the replies.

rdblue · 2018-12-06T18:06:15Z

@ggershinsky, I replied, I think we're mostly in agreement.

rdblue · 2018-12-14T17:09:48Z

@ggershinsky, looks good to me. Do you want to call a vote or do you want me to?

ggershinsky · 2018-12-14T17:39:33Z

Cool, will do.

rdblue · 2019-01-17T00:33:25Z

The vote passed! I'm going to merge this.

Gidon Gershinsky and others added 3 commits October 25, 2018 13:52

Readme update

803b402

Update Encryption.md

ebb3ac7

Update Encryption.md

964242b

zivanfi reviewed Oct 25, 2018

View reviewed changes

ggershinsky commented Oct 26, 2018

View reviewed changes

Encryption.md Outdated Show resolved Hide resolved

ggershinsky commented Oct 29, 2018

View reviewed changes

Encryption.md Show resolved Hide resolved

ggershinsky added 2 commits October 29, 2018 18:28

Update Encryption.md

0c94d60

spelling fix.

24c2470

zivanfi reviewed Oct 30, 2018

View reviewed changes

Update Encryption.md

3b5ded1

Clean up

cb0c491

rdblue reviewed Nov 6, 2018

View reviewed changes

rdblue reviewed Dec 3, 2018

View reviewed changes

ggershinsky added 3 commits December 14, 2018 09:18

Update Encryption.md

4365c7b

Picture updates

cde99ea

Update Encryption.md

d8bd3c8

Gidon Gershinsky added 2 commits December 16, 2018 15:57

cleanup

9d50c03

pics cleanup

17a5069

rdblue merged commit 41b9701 into apache:encryption Jan 17, 2019

rdblue mentioned this pull request Jan 28, 2019

PARQUET-1477: Thrift crypto updates #124

Merged

ggershinsky mentioned this pull request May 20, 2020

PARQUET-1229: Parquet MR encryption apache/parquet-java#776

Merged

4 tasks

ggershinsky mentioned this pull request Sep 4, 2020

ARROW-9318: [C++] Parquet encryption key management apache/arrow#8023

Closed

		\|-----------------\|------------------\|------------------------------\|


		A `crypto_meta_data` field is set in each ColumnChunk in the encrypted columns. ColumnCryptoMetaData

+              }
+              ```
+              <!--

PARQUET-1450: Detailed crypto specification #114

PARQUET-1450: Detailed crypto specification #114

Conversation

ggershinsky commented Oct 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zivanfi Oct 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggershinsky Oct 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggershinsky commented Oct 30, 2018 via email

zivanfi commented Oct 31, 2018

ggershinsky commented Oct 31, 2018

julienledem commented Nov 6, 2018

rdblue Nov 6, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggershinsky Nov 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggershinsky Nov 9, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zivanfi Oct 26, 2018 •

edited

ggershinsky Oct 26, 2018 •

edited

rdblue Nov 6, 2018 •

edited

ggershinsky Nov 8, 2018 •

edited

ggershinsky Nov 9, 2018 •

edited

ggershinsky Nov 8, 2018 •

edited