Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deserialization of big files #940

Merged

Conversation

shaosss
Copy link
Contributor

@shaosss shaosss commented May 29, 2020

This is a fix for deserialization of files with Length more than int.MaxValue.

@AArnott
Copy link
Collaborator

AArnott commented May 29, 2020

Holy cow, that would be a huge stream. Thanks for submitting a fix, but this would still leave MessagePack trying to allocate potentially a 4GB area of contiguous memory, which may be problematic as well. On a 64-bit process this might be feasible, but in a 32-bit process, even finding 1GB of contiguous memory may be unlikely.

I think instead of this particular change, we should simply set a reasonable cap (somewhere between 80KB and 1MB perhaps) on how large the arg is to GetSpan and simply loop to copy the entire content of the stream as necessary.

@AArnott
Copy link
Collaborator

AArnott commented May 29, 2020

It looks like the loop is already there, so if we just use Math.Max(..., 1024*1024) that might be best.

@shaosss
Copy link
Contributor Author

shaosss commented May 29, 2020

Did you mean Math.Min? Because right now we are trying to cast long value to int and getting arithmetic exception. That ruins the deserialization completely(you can serialize file of that size, but cannot deserialize).
I will look at you suggestion tomorrow, thanks.
And yes, we use Messagepack to save some captured data,which sometimes is a hell of amount.:)

@shaosss
Copy link
Contributor Author

shaosss commented Jun 1, 2020

I checked the code and profile it. It seems it is already using the chunks of size:
private static readonly int DefaultLengthFromArrayPool = 1 + (4095 / Marshal.SizeOf<T>());
So for the stream scenario it seems line no additional work needed. The memoryStream code branch doesn't need any changes since you can't have that long memory streams right now (since you can't have array of bytes with length more than 0x7FFFFFC7).
Not sure about any other overrides of Deserialize function (since most of them are probably can't have big sizes because of array of bytes limitations), but async version needs the same fix (will push in a few minutes and update PR).
Not sure if any tests are needed, since it is fix for simple arithmetic exception.

limiting the size of Hint, fix async version of Deserialize
@shaosss
Copy link
Contributor Author

shaosss commented Jun 1, 2020

@AArnott updated the PR. Found that between 0x7FFFFFC7 and int.MaxValue things still not work because of the array of bytes length limit, so I added the limit to the HintSize as you suggested (not sure that it's in the right place though).
Really appreciate your feedback.

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented, please revert the change to Sequence<T> and apply the fix to MessagePackSerializer itself to avoid asking for egregiously large spans.

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you.

@AArnott AArnott merged commit b29e271 into MessagePack-CSharp:master Jun 3, 2020
@shaosss shaosss deleted the fixBigFilesDeserialization branch June 3, 2020 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants