Closed
Description
I've been wondering why it had been chosen not to have a stream-based API for interacting with AES-GCM. I would like to describe a theoretical model built around the API added in this pull request: dotnet/corefx#31389. I understand that the reason behind this decision is security. However, in such a case please explain why the following approach would not meet your security standards.
Motivation
The maximum array length of 2147483591 imposes a restriction on the maximum size of a byte sequence that can be encrypted — roughly 2 GB. It is likely that a customer will try to encrypt a larger piece of data.
Theoretical approach
Imagine a huge piece of data encoded in the following way:
- Header
- Initialization vector anchor (32 bits)
- Initialization vector counter (64 bits)
- Number a of additional data chunks (32 bits)
- A series of a numbers, each representing the size of an additional data chunk (each number 32 bits)
- Number b of ciphertext chunks (32 bits)
- A series of b numbers, each representing the size of a ciphertext chunk (each number 32 bits)
- Header tag, used to authenticate the above fields (128 bits)
Math.Min(a, b)
chunks, each having:- a_i additional data chunk
- b_i ciphertext chunk
- Tag used to authenticate a chunk
- Remaining chunks. If
a != b
then one type of data (either ciphertext or additional data) requires more chunks. In such a case all of them are laid out with the following configuration:- "This kind of data" chunk
- Tag used to authenticate this chunk
Some thoughts
- The following allows to encrypt and decrypt thousands of gigabytes of data.
- A single read/write of the data is enough to decrypt/encrypt.
- The number or order of chunks cannot be changed by an attacker.
- This can be simplified if we assume that all the chunks have 2147483591 bytes except for the last chunk, in which case a number of additional data chunks and a number of ciphertext chunks is enough.
Metadata
Metadata
Assignees
Labels
No labels