Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980

Open
jcouv opened this issue Sep 20, 2017 · 24 comments
Open
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug JitUntriaged CLR JIT issues needing additional triage
Milestone

Comments

@jcouv
Copy link
Member

jcouv commented Sep 20, 2017

Moved (and edited) this issue from the roslyn repo.

Summary: you add an empty Main method to the attached code, then compile it with csc.exe geocoords_stackoverflow.cs.txt. When you run the resulting executable, you get an overflow error (Process is terminated due to StackOverflowException.)

From discussion with @VSadov, we think this is a JIT issue rather than a compiler issue.
FYI @jkotas for triage. Thanks


@andras-ferencz commented on Tue Feb 28 2017

When using the new System.ValueTuple in a large Dictionary, I receive a stack overflow exception where as I do not with System.Tuple.

Dictionary<int, (double lat, double lon)> zipcodes = new Dictionary<int, (double lat, double lon)>() {{123,(18.180555,-66.749961)} }; works fine just for reference.

geocoords_stackoverflow.cs.txt
geocoords_worksfine.cs.txt


@jcouv commented on Tue Feb 28 2017

I'm able to see the repro file. The code causing stack overflow looks like this:

using System;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Net.Mail;

namespace HAL
{
    public static partial class geo
    {
        public static Dictionary<int,(double lat, double lon)> zipcodes = new Dictionary<int, (double lat, double lon)>()
{{00601,(18.180555,-66.749961)},
{00602,(18.361945,-67.175597)},
{00603,(18.455183,-67.119887)},
{00606,(18.158345,-66.932911)},
{00610,(18.295366,-67.125135)},
{00612,(18.402253,-66.711397)},
{00616,(18.420412,-66.671979)},
{00617,(18.445147,-66.559696)},
{00622,(17.991245,-67.153993)},
{00623,(18.083361,-67.153897)},
{00624,(18.064919,-66.716683)},
{00627,(18.412600,-66.863926)},
{00631,(18.190607,-66.832041)},
... goes on and on...
{99922,(55.307528,-133.046815)},
{99923,(56.002315,-130.041026)},
{99925,(55.550204,-132.945933)},
{99926,(55.138352,-131.470424)},
{99927,(56.239062,-133.457924)},
{99929,(56.370751,-131.693301)}}; // total is about 33100 lines

    }
}

@jcouv commented on Wed Sep 20 2017

I'm able to repro this. The program compiles fine, but it overflows when run.

The strange thing is that decompiling the static constructor shows that it only expects to use very little IL stack (.maxstack 4 from debug compilation):

.method private hidebysig specialname rtspecialname static 
        void  .cctor() cil managed
{
  // Code size       1160053 (0x11b375)
  .maxstack  4
  .locals init (class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>> V_0)
  IL_0000:  newobj     instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldloc.0
  IL_0007:  ldc.i4     0x259
  IL_000c:  ldc.r8     18.180554999999998
  IL_0015:  ldc.r8     -66.749960999999999
  IL_001e:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_0023:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)
  IL_0028:  nop
  IL_0029:  ldloc.0
  IL_002a:  ldc.i4     0x25a
  IL_002f:  ldc.r8     18.361944999999999
  IL_0038:  ldc.r8     -67.175596999999996
  IL_0041:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_0046:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)
  IL_004b:  nop
  IL_004c:  ldloc.0
  IL_004d:  ldc.i4     0x25b
  IL_0052:  ldc.r8     18.455183000000002
  IL_005b:  ldc.r8     -67.119887000000006
  IL_0064:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_0069:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)
  IL_006e:  nop
  IL_006f:  ldloc.0
  IL_0070:  ldc.i4     0x25e
  IL_0075:  ldc.r8     18.158345000000001
  IL_007e:  ldc.r8     -66.932911000000004
  IL_0087:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_008c:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)

               ... repeats ...

I've tested this with debug build and also optimized build (/o+ compiler option), but both overflow at runtime.


category:correctness
theme:large-methods
skill-level:expert
cost:medium

@jkotas
Copy link
Member

jkotas commented Sep 20, 2017

cc @dotnet/jit-contrib

it only expects to use very little IL stack (.maxstack 4 from debug compilation)

There are number of IL constructs that require the JIT to create temporary locals. maxstack is a poor approximation of actual stack consumption of a method.

The JIT turns off optimizations for very large methods like this one to get them the compile in a reasonable amount of time. And when the optimizations are off, there is no lifetime tracking for locals and each of these temps gets its own spot in the stack frame.

Even when it "works", initialization pattern like this will have poor performance. Initializing the dictionary by reading the numbers from a text file would be likely order of magnitude faster.

@mikedn
Copy link
Contributor

mikedn commented Sep 21, 2017

maxstack is a poor approximation of actual stack consumption of a method.

For completeness, the compiled version of such a method requires ~1.3 megabytes of stack.

@AndyAyersMS
Copy link
Member

Thanks to dotnet/coreclr#14890, the jit allocates two structs on the stack for each initialization (I wanted to verify that temp growth here is only linear, thankfully it is...).

So the ~33,000 initializer tuples create ~66,000 structs on the stack, each 16 bytes. These require around 1,056,000 bytes of stack frame and puts this method over the 1MB default size limit and hence causes stack overflow.

The best fix would be to realize that we really only need a small number of structs. It might be possible to have reusable temps sort of like we do (did) for the box temp; basically we'd reuse the temp or temps at stack empty points.

But another possible mitigation here is to fix dotnet/coreclr#14890, it would cut the stack usage in half.

@AndyAyersMS
Copy link
Member

One more thing occurred to me -- I'll note it here on the off chance that it is helpful. The jit is very conservative about optimizing class constructors, since it knows they will only run once. So moving the initialization to a helper method at least allows the jit to think about optimizing.

Unfortunately the method is large enough that the jit decides not to optimize and we end up with more or less the same code.

But if you then prejit the assembly that uses the helper, the jit will decide to optimize and the stack size is cut roughly in half, as the extra structs introduced by dotnet/coreclr#14890 end up getting register promoted and so don't use up stack space.

@AndyAyersMS
Copy link
Member

Box temp like reuse can't help here because (with recent CSCs at least) the eval stack does not empty between the struct newobjs. So importer can't easily tell if an existing temp is free.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 9, 2018

Thought about it some more. We might be able to "free up" a struct temp when it gets popped from the stack as the pop entails a copy.

Implemented this and the rest of the caching logic and it knocks the stack usage below 1MB. We still create one non-reused stack temp per call as there's no way currently to describe where the copy that gets passed as a call argument lives.

This temp re-use helps out in some other similar methods, eg System.Reflection.Emit.OpCodes:.cctor() but also causes some sizeable regressions, as evidently there are some cases where we decide the temp needs to be address taken. So I plan to to look deeper at diffs and see what causes that.

@AndyAyersMS
Copy link
Member

Seems like in cases where we can re-use a struct temp, we need to be able to anticipate if that use of the temp was going to end up being promoted or not. If it will be promoted then a unique temp seems best, or at least a temp that is only re-used for other promotable instances (one might consider keeping a second temp that is used for all the address-exposed uses). For minopts/fast jitting where we won't ever promote we can just reuse with impunity. Also we won't promote certain classes of structs so we could run those checks early and just always reuse the temps in those cases too.

Unfortunately we don't have reliable indications about potential enregistration at the point that we need to decide whether to re-use a temp (and if we keep several around, which one). There are perhaps some tells; the newobj typically feeds a constructor call and while we don't yet know if that call is an inline candidate (and if so, if it will actually get inlined) we can tell sometimes that it won't be an inline candidate -- say for instance if we're in a catch or some other region where we generally inhibit inlining.

So one practical opt strategy might be to never re-use and then rely on some later stack-packing phase to collapse compatible disjoint-lifetime struct temps down onto just one on-stack instance. (This would also incidentally solve stack spill temp problems, at least when optimizing). But we don't have such a phase and they're not cheap to run.

Another might be to disentangle after the importer runs -- allow the importer to reuse temps as it sees fit, then run struct liveness after inlining (augmented to realize that non-inlined struct ctors won't escape their this params), then split out any disjoint lifetimes that are not address-exposed and promotable into their own temp (or temps).

So pragmatically I'm going to experiment with:

  • always re-use the temp, if in minopts/debug/(fastopt), similar to what happens with the box temp
  • never re-use the temp if the instance is not promotable, or perhaps use a "second temp"
  • never re-use if constructor is not inlinable (at best a guess), or perhaps use a "second temp"
  • otherwise re-use if possible

Also perhaps look at what it would mean to have morph or whatever phase it is that creates the out param temps try and do some reuse too...?

@AndyAyersMS
Copy link
Member

I don't see us getting a fix for this in time for the 2.1 release, so I'm moving it to Future.

@masonwheeler
Copy link
Contributor

@jkotas

Even when it "works", initialization pattern like this will have poor performance. Initializing the dictionary by reading the numbers from a text file would be likely order of magnitude faster.

That's kind of horrifying. Why doesn't the Roslyn compiler simply dump this big literal array into a sdata blob with whatever its equivalent of ModuleBuilder.DefineInitializedData is, then block-load it at runtime?

@briansull
Copy link
Contributor

briansull commented Feb 26, 2018

Beacuse you can't initialize objects in the managed heap en-mass.
It is intrinsically a one at a time operation.

@masonwheeler
Copy link
Contributor

@briansull That doesn't actually make sense. Are you familiar with how DefineInitializedData works?

@briansull
Copy link
Contributor

briansull commented Feb 26, 2018

Actually DefineInitializedData sounds like it is dealing with unmanaged BLOBs, not Object types.

@masonwheeler
Copy link
Contributor

masonwheeler commented Feb 26, 2018

Not object types, struct types.

Basically, your compiler serializes your struct array literal to a binary blob which it saves to the PE file. At runtime, that blob gets loaded into a static field as a byte array, and then you use System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray to blit that into your struct array.

@briansull
Copy link
Contributor

Well you can make a feature request for some functionally like that to be "added" to Roslyn.
In the test case above we are populating a Dictionary<K,V> which is quite a bit different than a struct or an array.

@masonwheeler
Copy link
Contributor

@briansull *facepalm* OK, that's on me. I was about to point out how you can initialize it with an array of structs with the IEnumerable<KeyValuePair<K, V>> constructor, when I double-checked and found that that's not actually a thing!

How did the team manage to overlook something that obvious?!? All the other collections can be initialized from arrays via IEnumerable<T>...

@jkotas
Copy link
Member

jkotas commented Feb 27, 2018

initialize it with an array of structs with the IEnumerable<KeyValuePair<K, V>> constructor

I do not think that this constructor would help. It would flip the problem into how to create huge pre-initialized array of structs that suffers from the same issue.

The best way to do this kind of initialization today is via arrays of primitive constants. It is the only one that does not suffer from this issue. Here is how the example above could be written in lean and performant way:

public static partial class geo
{
    public static Dictionary<int,(double lat, double lon)> zipcodes = CreateZipCodes();

    static Dictionary<int,(double lat, double lon)> CreateZipCodes()
    {
        double[] data = new double[] {
            00601, 18.180555, -66.749961, 
            00602, 18.361945, -67.175597,
            ...
        };

        Dictionary dict = new Dictionary<int, (double lat, double lon)>(data.Length / 3);
        for (int i = 0; i < data.Length; i += 3)
           dict.Add((int)data[i], (data[i+1], data[i+2]));
        return dict;
    }
}

@masonwheeler
Copy link
Contributor

@jkotas

I do not think that this constructor would help. It would flip the problem into how to create huge pre-initialized array of structs that suffers from the same issue.

Nope. If you can save it into a blob at compile-time, and then at runtime blit that blob back into your array with System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray, there's no chance of the stack overflowing because a System.Array is a reference type. Then the constructor could pass in that array and for-loop over it, and no matter how big the array is, you still won't blow the stack.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2018

@masonwheeler So assume that there is Dictionary constructor that takes IEnumerable<KeyValuePair<K, V>. What would the code look like for the example in this issue?

@VSadov
Copy link
Member

VSadov commented Feb 27, 2018

FWIW C# specifies collection initializers in terms of calls to Add methods. It is assumed that Add could be sideeffecting so compiler must emit as many calls to Add as there are items in the initializer.

It might be possible to pack all the literals in the initializer into an array, or multiple arrays (if they have different types) and after rehydrating those arrays feed them to the loop that populates the collection via Add.

You may imagine the complexity of the transformation and the conditions for the optimizaion to be actually applicable/useful.

@masonwheeler
Copy link
Contributor

@jkotas

  1. The compiler would take the dictionary initializer literal and transform it into a []byte, which is saved as a const field with csc's metadata model's equivalent to ModuleBuilder.DefineInitializedData.
  2. Compiler gets a reference to this field
  3. Compiler emits the following pseudocode IL:
    • ldc.i4.s initializer_count
    • newarr System.Collections.Generic.KeyValuePair<K,V>
    • dup
    • ldtoken const_field_reference
    • call System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray
  4. This initializes the array and leaves a reference to it on the top of the execution stack. This is the argument to the dict constructor, so emit a call to it.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2018

InitializeArray works for primitive types only today. You are describing a new set of features across language and runtime. The dictionary constructor to support this pattern is far from the only thing missing.

@masonwheeler
Copy link
Contributor

@jkotas Does it? That's not actually in the documentation; it just says "this method is used by compilers."

Well that's disappointing to learn, then.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@i3anaan
Copy link

i3anaan commented May 22, 2020

We ran into the same issue with a dictionary of signature new Dictionary<string, object>(), and about 4000 key/value pairs. (our application generates C# code files based on some database and compiles it into a dll)
Surely caused some headaches on our side debugging this issue, as the message you get is not very informational, and we did certainly not expect it to be caused by this.

@AndyAyersMS
Copy link
Member

I wouldn't expect you to hit the problem noted above for Dictionary<string, object> as neither key or value types are structs. If you can get me a repro case I'll take a closer look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug JitUntriaged CLR JIT issues needing additional triage
Projects
None yet
Development

No branches or pull requests

10 participants