Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: ArrayRecord.TotalElementsCount #106644

Closed
adamsitnik opened this issue Aug 19, 2024 · 2 comments · Fixed by #106629
Closed

[API Proposal]: ArrayRecord.TotalElementsCount #106644

adamsitnik opened this issue Aug 19, 2024 · 2 comments · Fixed by #106629
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Formats.Nrbf binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it blocking-release in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Aug 19, 2024

Background and motivation

NRBF format exposes a possibility to represent multiple nulls with a single record (we need one byte for its type and another four bytes for the null count). It allows the attackers to send a small payload that represents a very large array. For example, serialized representation of following jagged array is just 90 bytes.

string[3][] jagged = new string[3][];
jagged[0] = new string[Array.MaxLength];
jagged[1] = new string[Array.MaxLength];
jagged[2] = new string[Array.MaxLength];

using MemoryStream payload = new();
new BinaryFormatter().Serialize(payload, jagged);
Assert.Equal(90, payload.Length);

while it takes more than 50 GB to instantiate it.

ArrayRecord arrayRecord = (ArrayRecord)NrbfDecoder.Decode(payload);
string[][] jagged = (string[][])arrayRecord.GetArray(expectedArrayType: typeof(string[][])); // allocates few dozens GB of memory

During the Threat Model session, it turned out that we currently don’t expose any API that could allow the users to avoid getting into this particular trap.

API Proposal

Providing TotalElementsCount is going to allow the users to perform such checks (and also avoid the need to compute it on their own for multi-dimensional arrays). By elements I mean the T values stored by the inner most arrays.

namespace System.Formats.Nrbf;

public abstract partial class ArrayRecord : SerializationRecord
{
    public abstract System.ReadOnlySpan<int> Lengths { get; }
+   public virtual long TotalElementsCount { get; }
}

API Usage

ArrayRecord arrayRecord = (ArrayRecord)NrbfDecoder.Decode(payload);
if (arrayRecord.TotalElementsCount > 100_000)
{
    throw new Exception($"Provided record had too many elements: {arrayRecord.TotalElementsCount}");
}
// from now we know that GetArray won't allocate more space than required for storing 100_000 elements.
string[][] jagged = (string[][])arrayRecord.GetArray(expectedArrayType: typeof(string[][]));

Alternative Designs

It's not obvious what total element count means for jagged arrays of jagged arrays. To avoid that, we could provide an API that would return the total allocated bytes.

public long GetTotalAllocatedBytes();

The API would need to take references into account (for every array record, GetArray allocates the array only once). In following example:

byte[] referenced = new byte[Array.MaxLength];
byte[][] jagged = new byte[3][];
jagged[0] = referenced;
jagged[1] = referenced;
jagged[2] = referenced;

The allocated bytes would be 2 GB (rather than 6 GB). Would that be a problem for users who would like to use the new API as initial guard and then process it later without any checks?

We don't provide any similar API in the BCL and I suspect that it would be hard to make it always return 100% exact value: we would need to take many runtime implementation details into account: object headers, method table pointers, alignment. But we could also just document that the API does not take these extra fields into account and returns estimated value. But would that make us vulnerable to attack where the payload contains MANY contained arrays with just 1 element?

Risks

No response

Other

#106629 contains a working impl

@adamsitnik adamsitnik added api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Formats.Nrbf labels Aug 19, 2024
@adamsitnik adamsitnik added this to the 9.0.0 milestone Aug 19, 2024
@adamsitnik adamsitnik self-assigned this Aug 19, 2024
@adamsitnik adamsitnik added the blocking Marks issues that we want to fast track in order to unblock other important work label Aug 19, 2024
@bartonjs
Copy link
Member

bartonjs commented Aug 20, 2024

Video

  • We renamed TotalElementsCount to FlattenedLength to match the same concept on TensorSpan
  • The hierarchy is closed, so virtual or not is author's discretion.
namespace System.Formats.Nrbf;

public abstract partial class ArrayRecord : SerializationRecord
{
    public virtual long FlattenedLength { get; }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 20, 2024
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 20, 2024
@adamsitnik adamsitnik added the binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it label Aug 21, 2024
Copy link
Contributor

Tagging subscribers to 'binaryformatter-migration': @adamsitnik, @bartonjs, @jeffhandley, @terrajobst

@github-actions github-actions bot locked and limited conversation to collaborators Oct 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Formats.Nrbf binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it blocking-release in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants