New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949

Merged
merged 1 commit into from Apr 5, 2017

Conversation

Projects
None yet
5 participants
@stephentoub
Member

stephentoub commented Apr 5, 2017

To support fix-ups during BinaryFormatter deserialization, ObjectManager maintains a table of ObjectHolders. This table is an array, where each element/bucket of the array is a linked list of ObjectHolders, and where the right bucket is found via id % MaxArraySize, where MaxArraySize is a const power of 2. IDs are allocated monotonically and contiguously, such that the max list size is equal to the total number of object holders divided by MaxArraySize. Every time an object is searched for, we need to find the right bucket, and then walk the list, and so for large numbers of objects, searching for every object as is done during deserialization is effectively an O(N^2) operation.

There are a variety of ways to address this. But a very simple stop-gap is to simply increase the MaxArraySize. Doing so doesn't impact small deserializations, as the array wouldn't have grown beyond MaxArraySize anyway, and for large deserializations, we allocate more memory for the array but only proportionally to the number of objects added to the table. The primary downside is such a large array is more likely to end up in the LOH, which will make collecting the array more expensive, but nowhere near as expensive as the O(N^2) algorithm that would result with a smaller array of long lists.

Thus, this mitigation simply increases the MaxArraySize, from 4K to 1M.

For the repro in #16991, before this change I get:

Serialization time 2.30s
Deserialization 21.11s

and after I get:

Serialization time 2.49s
Deserialization 2.07s

If I increase the size further, from 500K to 1M Book in the list, before I get:

Serialization time 4.98s
Deserialization 82.30s

and after I get:

Serialization time 4.88s
Deserialization 4.13s

For super huge graphs, we could still end up with long bucket lists and thus still exhibit some polynomic behavior, but the chances/impact of that are significantly decreased.

cc: @morganbr, @jkotas , @Alois-xx
Closes #16991

Increase ObjectManager.MaxArraySize
To support fix-ups during BinaryFormatter deserialization, ObjectManager maintains a table of ObjectHolders.  This table is an array, where each element/bucket of the array is a linked list of ObjectHolders, and where the right bucket is found via `id % MaxArraySize`.  IDs are allocated monotonically and contiguously, such that the max list size is equal to the total number of object holders divided by MaxArraySize.  Every time an object is searched for, we need to find the right bucket, and then walk the list, and so for large numbers of objects, searching for every object is effectively an O(N^2) operation.

There are a variety of ways to address this.  But a very simple stop-gap is to simply increase the MaxArraySize.  Doing so doesn't impact small deserializations, as the array wouldn't have grown beyond MaxArraySize anyway, and for large deserializations, we allocate more memory for the array but only proportionally to the number of objects added to the table.  The primary downside is such a large array is more likely to end up in the LOH, which will make collecting the array more expensive, but nowhere near as expensive as the O(N^2) algorithm that would result with a smaller array of long lists.

Thus, this mitigation simply increases the MaxArraySize from 4K to 1MB.
@jkotas

jkotas approved these changes Apr 5, 2017

@stephentoub stephentoub merged commit afde3c1 into dotnet:master Apr 5, 2017

19 checks passed

CROSS Check Build finished.
Details
Innerloop CentOS7.1 Debug x64 Build and Test Build finished.
Details
Innerloop CentOS7.1 Release x64 Build and Test Build finished.
Details
Innerloop OSX10.12 Debug x64 Build and Test Build finished.
Details
Innerloop OSX10.12 Release x64 Build and Test Build finished.
Details
Innerloop PortableLinux Debug x64 Build and Test Build finished.
Details
Innerloop PortableLinux Release x64 Build and Test Build finished.
Details
Innerloop Tizen armel Debug Cross Build Build finished.
Details
Innerloop Tizen armel Release Cross Build Build finished.
Details
Innerloop Ubuntu14.04 Debug x64 Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Release x64 Build and Test Build finished.
Details
Innerloop Ubuntu14.04 arm Release Cross Build Build finished.
Details
Innerloop Ubuntu16.04 arm Debug Cross Build Build finished.
Details
Innerloop Windows_NT Debug x86 Build and Test Build finished.
Details
Innerloop Windows_NT Release x64 Build and Test Build finished.
Details
Vertical netfx Build Build finished.
Details
Vertical uap Build Build finished.
Details
Vertical uapaot Build Build finished.
Details
Windows_NT Debug AllConfigurations Build Build finished.
Details

@stephentoub stephentoub deleted the stephentoub:deser_perf branch Apr 5, 2017

@gnilsson2

This comment has been minimized.

Show comment
Hide comment
@gnilsson2

gnilsson2 Apr 7, 2017

Just to Point out how bad the performance are:
In #16991 I wrote "Deserialization of an 114Mbyte Dictionary, takes about 10 minutes, and uses 1Gbyte memory!"
Since then I have coded a simple binary save/load of the same data, it loads in 3 seconds!
Ofcource this imporvment is not possible to atain in an general binary formatter, but with clever ordering of data, it should be possible to come close to this.

stephentoub 's O(N^2) I just can't understand. O(N) should be possible I think.

gnilsson2 commented Apr 7, 2017

Just to Point out how bad the performance are:
In #16991 I wrote "Deserialization of an 114Mbyte Dictionary, takes about 10 minutes, and uses 1Gbyte memory!"
Since then I have coded a simple binary save/load of the same data, it loads in 3 seconds!
Ofcource this imporvment is not possible to atain in an general binary formatter, but with clever ordering of data, it should be possible to come close to this.

stephentoub 's O(N^2) I just can't understand. O(N) should be possible I think.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Apr 7, 2017

Member

stephentoub 's O(N^2) I just can't understand

You mean you don't understand my analysis that says the current implementation is O(N^2)? Worst case, you're doing a lookup for N items, and since those N items are stored in lists that are being walked, each lookup can be O(N), hence O(N^2).

Member

stephentoub commented Apr 7, 2017

stephentoub 's O(N^2) I just can't understand

You mean you don't understand my analysis that says the current implementation is O(N^2)? Worst case, you're doing a lookup for N items, and since those N items are stored in lists that are being walked, each lookup can be O(N), hence O(N^2).

@gnilsson2

This comment has been minimized.

Show comment
Hide comment
@gnilsson2

gnilsson2 Apr 7, 2017

Sorry I'm unclear,
Yes you are right if you need the lookup.
My point is that (at least in my case) its possible to get rid of the lookup, with smart ordering of data.
In my case 600secondes -> 3 seconds, with N ~ 1,3million.
Going from O(N^2) to O(N) should give an improvment of N times, so my 3 sec. must be disk and other "overhead"

I agrue that just increasing the buffer size is not a satisfactory solution.
(Today we can't rely on increasing hardware performance solving anymore)
Certanly we expect .NET library to give us near optimal performance.

gnilsson2 commented Apr 7, 2017

Sorry I'm unclear,
Yes you are right if you need the lookup.
My point is that (at least in my case) its possible to get rid of the lookup, with smart ordering of data.
In my case 600secondes -> 3 seconds, with N ~ 1,3million.
Going from O(N^2) to O(N) should give an improvment of N times, so my 3 sec. must be disk and other "overhead"

I agrue that just increasing the buffer size is not a satisfactory solution.
(Today we can't rely on increasing hardware performance solving anymore)
Certanly we expect .NET library to give us near optimal performance.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Apr 7, 2017

Member

just increasing the buffer size is not a satisfactory solution

As I stated in the PR, it's a workaround that helps to mitigate the problem, with few downsides. If you'd like to submit a PR that does better, we'd welcome it. But we're not planning to spend a lot of time working on BinaryFormatter, which was brought to .NET Core primarily for compatibility and is not a technology we're heavily investing in moving forward.

Member

stephentoub commented Apr 7, 2017

just increasing the buffer size is not a satisfactory solution

As I stated in the PR, it's a workaround that helps to mitigate the problem, with few downsides. If you'd like to submit a PR that does better, we'd welcome it. But we're not planning to spend a lot of time working on BinaryFormatter, which was brought to .NET Core primarily for compatibility and is not a technology we're heavily investing in moving forward.

@gnilsson2

This comment has been minimized.

Show comment
Hide comment
@gnilsson2

gnilsson2 Apr 7, 2017

Excuse my illiteracy. what does PR mean,
Ah.. I guess Problem Resolution

gnilsson2 commented Apr 7, 2017

Excuse my illiteracy. what does PR mean,
Ah.. I guess Problem Resolution

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Apr 7, 2017

Member

what does PR mean

"Pull Request", meaning you could implement a solution and submit it to this repo on GitHub, which would allow us to review the changes and potentially merge them into .NET Core.

Member

stephentoub commented Apr 7, 2017

what does PR mean

"Pull Request", meaning you could implement a solution and submit it to this repo on GitHub, which would allow us to review the changes and potentially merge them into .NET Core.

@gnilsson2

This comment has been minimized.

Show comment
Hide comment
@gnilsson2

gnilsson2 Apr 7, 2017

I might consider take a deep dive into BinaryFormater, but having browsed the code I realize it's not any easy task to understand/get a grip on it...

I have 2 reasons for pushing this issue

First
I my hobby coding, I want a simple way to Serialize, BinaryFormater is much easier to apply than xml.

Second
I my work, I have recently recomended a switch from C++ to C#, arguing that the .NET library
is a good fast library, compensating for switching from pre compiled to JIT compiled.

Now my trust in .NET decreeses. (I recently also experienced bad performance in string.IndexOf, I found a workaround)

gnilsson2 commented Apr 7, 2017

I might consider take a deep dive into BinaryFormater, but having browsed the code I realize it's not any easy task to understand/get a grip on it...

I have 2 reasons for pushing this issue

First
I my hobby coding, I want a simple way to Serialize, BinaryFormater is much easier to apply than xml.

Second
I my work, I have recently recomended a switch from C++ to C#, arguing that the .NET library
is a good fast library, compensating for switching from pre compiled to JIT compiled.

Now my trust in .NET decreeses. (I recently also experienced bad performance in string.IndexOf, I found a workaround)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment