Make BinaryFormatter faster #16991

Closed
Alois-xx opened this Issue Mar 10, 2017 · 10 comments

Comments

Projects
None yet
10 participants
@Alois-xx

Alois-xx commented Mar 10, 2017

When BinaryFormatter encounters a larger object list it gets quadratic deserialization times due to the linear search in

>	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.FindObjectHolder(long objectID) Line 68	C#
 	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.FindOrCreateObjectHolder(long objectID) Line 81	C#
 	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.RegisterFixup(System.Runtime.Serialization.FixupHolder fixup, long objectToBeFixed, long objectRequired) Line 888	C#
 	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.RecordArrayElementFixup(long arrayToBeFixed, int[] indices, long objectRequired) Line 966	C#

Sample

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;

namespace ConsoleApp2
{
    [Serializable]
    class Book
    {
        public string Name;
        public string Id;
    }

    class Program
    {
        static void Main(string[] args)
        {
            var formatter = new BinaryFormatter();
            List<Book> books = new List<Book>();
            var mem = new MemoryStream();
            for(int i=0;i<500*1000;i++)
            {
                books.Add(new Book { Id = i.ToString() });
            }

            var sw = Stopwatch.StartNew();
            formatter.Serialize(mem, books);
            sw.Stop();
            Console.WriteLine($"Serialization time {sw.Elapsed.TotalSeconds:F2}s");
            mem.Position = 0;
            sw = Stopwatch.StartNew();
            List<Book> booksDeser = (List<Book>)formatter.Deserialize(mem);
            sw.Stop();
            Console.WriteLine($"Deserialization {sw.Elapsed.TotalSeconds:F2}s");
        }
    }
}

Serialization time 2.31s
Deserialization 21.16s

This caused some unexpected "slowdowns" in production code when "real" big objects (e.g. 20 - 100 MB) object graphs are deserialized. Deserialization times of 10 minutes are not uncommon due to this. It would be great if this ugly thing gets cleaned up in .NET Core.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Mar 11, 2017

Member

A trivial change here that would help would simply be to increase the allowed size of the array used by the ObjectManager, e.g. here:
https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Serialization.Formatters/src/System/Runtime/Serialization/ObjectManager.cs#L14
change:

private const int MaxArraySize = 0x1000;

to:

private const int MaxArraySize = 0x10000;

That should make a very measurable difference in the deserialization time for a large graph like the one in the repro, at the expense of using some more memory, but not a ton more, especially given that it'll be relatively proportional to the size of the graph in question. Subsequently a different data structure could be considered, e.g. a tree of some kind, an array of trees of some kind, etc., rather than an array of lists.

Member

stephentoub commented Mar 11, 2017

A trivial change here that would help would simply be to increase the allowed size of the array used by the ObjectManager, e.g. here:
https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Serialization.Formatters/src/System/Runtime/Serialization/ObjectManager.cs#L14
change:

private const int MaxArraySize = 0x1000;

to:

private const int MaxArraySize = 0x10000;

That should make a very measurable difference in the deserialization time for a large graph like the one in the repro, at the expense of using some more memory, but not a ton more, especially given that it'll be relatively proportional to the size of the graph in question. Subsequently a different data structure could be considered, e.g. a tree of some kind, an array of trees of some kind, etc., rather than an array of lists.

@karelz karelz added the enhancement label Mar 11, 2017

@karelz karelz modified the milestone: Future Mar 11, 2017

@morganbr

This comment has been minimized.

Show comment
Hide comment
@morganbr

morganbr Mar 11, 2017

Contributor

@Alois-xx , can you please tell us more about the way you're using BinaryFormatter? I'd like to better understand the parts of deserialization performance that are important to your scenario.

Contributor

morganbr commented Mar 11, 2017

@Alois-xx , can you please tell us more about the way you're using BinaryFormatter? I'd like to better understand the parts of deserialization performance that are important to your scenario.

@Alois-xx

This comment has been minimized.

Show comment
Hide comment
@Alois-xx

Alois-xx Mar 11, 2017

I have found in our regular .NET based product large deserialization times which did lead to strange architectural changes because the root cause was never fully understood. I have now checked .NET Core if BinaryFormatter has become better and in fact it did not. Hence my request to improve at least .NET Core and if possible backmerge it to the .NET Desktop BCL as well.

I have found in our regular .NET based product large deserialization times which did lead to strange architectural changes because the root cause was never fully understood. I have now checked .NET Core if BinaryFormatter has become better and in fact it did not. Hence my request to improve at least .NET Core and if possible backmerge it to the .NET Desktop BCL as well.

@joperezr joperezr added this to the Future milestone Mar 16, 2017

@gnilsson2

This comment has been minimized.

Show comment
Hide comment
@gnilsson2

gnilsson2 Apr 4, 2017

Deserialization of an 114Mbyte Dictionary, takes about 10 minutes, and uses 1Gbyte memory!
When finished only 316MBytes memory is used, 175MB for the dictionary.
(Serialization in < 1 minute)

So there is a big potential for improvement here

Object Type    Count diff.  Size Diff.        Inclusive...
ObjectHolder	-730 231	-75 944 024	-184 019 992	3	312	488
LongList	-365 116	-29 209 280	-29 209 280	1	80	80
FixupHolderList	-365 116	-26 288 352	-40 893 112	1	72	72

It seems that it's all forward references that needs fixup.
"If the reference in the serialized stream is a forward reference, then the Formatter can register a fixup with the ObjectManager."

Deserialization of an 114Mbyte Dictionary, takes about 10 minutes, and uses 1Gbyte memory!
When finished only 316MBytes memory is used, 175MB for the dictionary.
(Serialization in < 1 minute)

So there is a big potential for improvement here

Object Type    Count diff.  Size Diff.        Inclusive...
ObjectHolder	-730 231	-75 944 024	-184 019 992	3	312	488
LongList	-365 116	-29 209 280	-29 209 280	1	80	80
FixupHolderList	-365 116	-26 288 352	-40 893 112	1	72	72

It seems that it's all forward references that needs fixup.
"If the reference in the serialized stream is a forward reference, then the Formatter can register a fixup with the ObjectManager."

@Alois-xx

This comment has been minimized.

Show comment
Hide comment
@Alois-xx

Alois-xx May 16, 2017

Will this change be merged to the Full .NET Framework? Will this be part of a minor update and what is the expected timeframe for it to arrive?

Will this change be merged to the Full .NET Framework? Will this be part of a minor update and what is the expected timeframe for it to arrive?

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub May 16, 2017

Member

The change is marked for consideration for being ported to the .NET Framework, but there are currently no plans for if/when that would happen.

Member

stephentoub commented May 16, 2017

The change is marked for consideration for being ported to the .NET Framework, but there are currently no plans for if/when that would happen.

@mludlum

This comment has been minimized.

Show comment
Hide comment
@mludlum

mludlum Sep 26, 2017

Is there a way I can integrate this change to ObjectManager into my application without replacing the complete framework I'm pointing to? There doesn't seem to be a way to substituted a type or it's methods without modifying the original type somehow. I could do this easily in Java just by bumping a jar up in the classpath.

mludlum commented Sep 26, 2017

Is there a way I can integrate this change to ObjectManager into my application without replacing the complete framework I'm pointing to? There doesn't seem to be a way to substituted a type or it's methods without modifying the original type somehow. I could do this easily in Java just by bumping a jar up in the classpath.

@Alois-xx

This comment has been minimized.

Show comment
Hide comment
@Alois-xx

Alois-xx Sep 26, 2017

The next .NET Framework update 4.7.3? will contain a fix for the regular .NET Framework. For .NET Core the fix is already in with .NET Core 2.0.

The next .NET Framework update 4.7.3? will contain a fix for the regular .NET Framework. For .NET Core the fix is already in with .NET Core 2.0.

@jrocha

This comment has been minimized.

Show comment
Hide comment
@jrocha

jrocha Oct 25, 2017

Hope it come soon to .NET Framework, it is really painful to wait for large graph deserialization.

jrocha commented Oct 25, 2017

Hope it come soon to .NET Framework, it is really painful to wait for large graph deserialization.

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Oct 26, 2017

Member

This fix is scheduled to be included in .NET Framework 4.7.2 update. It won't be enabled by default. It will be only enabled when Switch.System.Runtime.Serialization.UseNewMaxArraySize config switch is set.

Early access preview builds of .NET Framework 4.7.2 should start showing up on https://github.com/Microsoft/dotnet-framework-early-access/ soon. Please give it a try once the build with this fix is available.

For reference, the .NET Framework port is internal MS bug DevDiv 443438.

cc @AlexGhiondea

Member

jkotas commented Oct 26, 2017

This fix is scheduled to be included in .NET Framework 4.7.2 update. It won't be enabled by default. It will be only enabled when Switch.System.Runtime.Serialization.UseNewMaxArraySize config switch is set.

Early access preview builds of .NET Framework 4.7.2 should start showing up on https://github.com/Microsoft/dotnet-framework-early-access/ soon. Please give it a try once the build with this fix is available.

For reference, the .NET Framework port is internal MS bug DevDiv 443438.

cc @AlexGhiondea

dotnet-bot added a commit that referenced this issue Mar 19, 2018

Prevent concurrent use corruption from causing infinite loops (#16991)
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

stephentoub added a commit that referenced this issue Mar 19, 2018

Prevent concurrent use corruption from causing infinite loops (#16991)
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

ericstj added a commit to ericstj/corefx that referenced this issue Mar 28, 2018

Prevent concurrent use corruption from causing infinite loops (#16991)
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

satano added a commit to satano/corefx that referenced this issue May 30, 2018

Prevent concurrent use corruption from causing infinite loops (#16991)
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment