Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not stream based AES-GCM? #27348

Closed
pgolebiowski opened this issue Sep 8, 2018 · 4 comments
Closed

Why not stream based AES-GCM? #27348

pgolebiowski opened this issue Sep 8, 2018 · 4 comments
Milestone

Comments

@pgolebiowski
Copy link
Contributor

I've been wondering why it had been chosen not to have a stream-based API for interacting with AES-GCM. I would like to describe a theoretical model built around the API added in this pull request: dotnet/corefx#31389. I understand that the reason behind this decision is security. However, in such a case please explain why the following approach would not meet your security standards.

Motivation

The maximum array length of 2147483591 imposes a restriction on the maximum size of a byte sequence that can be encrypted — roughly 2 GB. It is likely that a customer will try to encrypt a larger piece of data.

Theoretical approach

Imagine a huge piece of data encoded in the following way:

  • Header
    • Initialization vector anchor (32 bits)
    • Initialization vector counter (64 bits)
    • Number a of additional data chunks (32 bits)
    • A series of a numbers, each representing the size of an additional data chunk (each number 32 bits)
    • Number b of ciphertext chunks (32 bits)
    • A series of b numbers, each representing the size of a ciphertext chunk (each number 32 bits)
    • Header tag, used to authenticate the above fields (128 bits)
  • Math.Min(a, b) chunks, each having:
    • a_i additional data chunk
    • b_i ciphertext chunk
    • Tag used to authenticate a chunk
  • Remaining chunks. If a != b then one type of data (either ciphertext or additional data) requires more chunks. In such a case all of them are laid out with the following configuration:
    • "This kind of data" chunk
    • Tag used to authenticate this chunk

Some thoughts

  • The following allows to encrypt and decrypt thousands of gigabytes of data.
  • A single read/write of the data is enough to decrypt/encrypt.
  • The number or order of chunks cannot be changed by an attacker.
  • This can be simplified if we assume that all the chunks have 2147483591 bytes except for the last chunk, in which case a number of additional data chunks and a number of ciphertext chunks is enough.

@krwq @vcsjones @bartonjs @morganbr @Drawaes @blowdart

@bartonjs
Copy link
Member

bartonjs commented Sep 8, 2018

The biggest problem is streaming decryption. There's a tendency to want to read the data before the tag verifies, and that's dangerous.

That said, the model looks fair, and sound. But it's a protocol, not direct exposure to the algorithm. The API we added allows access to the algorithm for anything up to int.MaxValue bytes, and the protocol you've described can be written in terms of it (provided you have 2x2GB of addressable memory to buffer with).

If that protocol was a standard, we would likely implement it as the standard, but we aren't currently interested in custom protocols or schemes.

@bartonjs bartonjs closed this as completed Sep 8, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@AronParker
Copy link

AronParker commented Apr 1, 2020

There is another issue disallowing stream based AES-GCM which is when using it in conjunction with System.IO.Pipelines, which gives you a ReadOnlySequence which consists of multiple buffers internally so they can't be deciphered in one go. And no even reducing them into smaller packets won't help as the amount of data you receive per socket recv call is never guaranteed. I'd appreciate if this issue would be reopened again.

@krwq
Copy link
Member

krwq commented Apr 2, 2020

@AronParker can you read data to the buffer first instead? AES-GCM by design has a limit of how much can be encrypted per (key, nonce) which should fit in memory. Also in order to authenticate correctly library is not suppose to reveal any data to user until the whole decryption is done and authenticated

@AronParker
Copy link

@AronParker can you read data to the buffer first instead? AES-GCM by design has a limit of how much can be encrypted per (key, nonce) which should fit in memory. Also in order to authenticate correctly library is not suppose to reveal any data to user until the whole decryption is done and authenticated

Yes I can read the data into a buffer, but that defeats the whole purpose of the scalability of System.IO.Pipelines, the whole point of the package is avoiding these kinds of memory allocations.

That limit won't be hit, the amount of data that is to be decrypted by AES-GCM is 2^16 at most, yet due to the unpredictable nature of the sizes received by Socket.Receive calls, it's not possible to rely on this, in practice the received data will be segmented into multiple buffers.

I agree on making the API design safe by default because people do forget to verify the tag, however the option for advanced uses should not be taken as it is necessary in some high performance cases, in particular in conjunction with System.IO.Pipelines.

@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants