Why not stream based AES-GCM? #27348

pgolebiowski · 2018-09-08T21:47:02Z

I've been wondering why it had been chosen not to have a stream-based API for interacting with AES-GCM. I would like to describe a theoretical model built around the API added in this pull request: dotnet/corefx#31389. I understand that the reason behind this decision is security. However, in such a case please explain why the following approach would not meet your security standards.

Motivation

The maximum array length of 2147483591 imposes a restriction on the maximum size of a byte sequence that can be encrypted — roughly 2 GB. It is likely that a customer will try to encrypt a larger piece of data.

Theoretical approach

Imagine a huge piece of data encoded in the following way:

Header
- Initialization vector anchor (32 bits)
- Initialization vector counter (64 bits)
- Number a of additional data chunks (32 bits)
- A series of a numbers, each representing the size of an additional data chunk (each number 32 bits)
- Number b of ciphertext chunks (32 bits)
- A series of b numbers, each representing the size of a ciphertext chunk (each number 32 bits)
- Header tag, used to authenticate the above fields (128 bits)
Math.Min(a, b) chunks, each having:
- a_i additional data chunk
- b_i ciphertext chunk
- Tag used to authenticate a chunk
Remaining chunks. If a != b then one type of data (either ciphertext or additional data) requires more chunks. In such a case all of them are laid out with the following configuration:
- "This kind of data" chunk
- Tag used to authenticate this chunk

Some thoughts

The following allows to encrypt and decrypt thousands of gigabytes of data.
A single read/write of the data is enough to decrypt/encrypt.
The number or order of chunks cannot be changed by an attacker.
This can be simplified if we assume that all the chunks have 2147483591 bytes except for the last chunk, in which case a number of additional data chunks and a number of ciphertext chunks is enough.

@krwq @vcsjones @bartonjs @morganbr @Drawaes @blowdart

The text was updated successfully, but these errors were encountered:

bartonjs · 2018-09-08T22:11:55Z

The biggest problem is streaming decryption. There's a tendency to want to read the data before the tag verifies, and that's dangerous.

That said, the model looks fair, and sound. But it's a protocol, not direct exposure to the algorithm. The API we added allows access to the algorithm for anything up to int.MaxValue bytes, and the protocol you've described can be written in terms of it (provided you have 2x2GB of addressable memory to buffer with).

If that protocol was a standard, we would likely implement it as the standard, but we aren't currently interested in custom protocols or schemes.

AronParker · 2020-04-01T12:38:48Z

There is another issue disallowing stream based AES-GCM which is when using it in conjunction with System.IO.Pipelines, which gives you a ReadOnlySequence which consists of multiple buffers internally so they can't be deciphered in one go. And no even reducing them into smaller packets won't help as the amount of data you receive per socket recv call is never guaranteed. I'd appreciate if this issue would be reopened again.

krwq · 2020-04-02T08:53:32Z

@AronParker can you read data to the buffer first instead? AES-GCM by design has a limit of how much can be encrypted per (key, nonce) which should fit in memory. Also in order to authenticate correctly library is not suppose to reveal any data to user until the whole decryption is done and authenticated

AronParker · 2020-04-02T09:46:59Z

@AronParker can you read data to the buffer first instead? AES-GCM by design has a limit of how much can be encrypted per (key, nonce) which should fit in memory. Also in order to authenticate correctly library is not suppose to reveal any data to user until the whole decryption is done and authenticated

Yes I can read the data into a buffer, but that defeats the whole purpose of the scalability of System.IO.Pipelines, the whole point of the package is avoiding these kinds of memory allocations.

That limit won't be hit, the amount of data that is to be decrypted by AES-GCM is 2^16 at most, yet due to the unpredictable nature of the sizes received by Socket.Receive calls, it's not possible to rely on this, in practice the received data will be segmented into multiple buffers.

I agree on making the API design safe by default because people do forget to verify the tag, however the option for advanced uses should not be taken as it is necessary in some high performance cases, in particular in conjunction with System.IO.Pipelines.

bartonjs closed this as completed Sep 8, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

dotnet locked as resolved and limited conversation to collaborators Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not stream based AES-GCM? #27348

Why not stream based AES-GCM? #27348

pgolebiowski commented Sep 8, 2018

bartonjs commented Sep 8, 2018

AronParker commented Apr 1, 2020 •

edited

krwq commented Apr 2, 2020

AronParker commented Apr 2, 2020

Why not stream based AES-GCM? #27348

Why not stream based AES-GCM? #27348

Comments

pgolebiowski commented Sep 8, 2018

Motivation

Theoretical approach

Some thoughts

bartonjs commented Sep 8, 2018

AronParker commented Apr 1, 2020 • edited

krwq commented Apr 2, 2020

AronParker commented Apr 2, 2020

AronParker commented Apr 1, 2020 •

edited