System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980

jcouv · 2017-09-20T23:14:40Z

Moved (and edited) this issue from the roslyn repo.

Summary: you add an empty Main method to the attached code, then compile it with csc.exe geocoords_stackoverflow.cs.txt. When you run the resulting executable, you get an overflow error (Process is terminated due to StackOverflowException.)

From discussion with @VSadov, we think this is a JIT issue rather than a compiler issue.
FYI @jkotas for triage. Thanks

@andras-ferencz commented on Tue Feb 28 2017

When using the new System.ValueTuple in a large Dictionary, I receive a stack overflow exception where as I do not with System.Tuple.

Dictionary<int, (double lat, double lon)> zipcodes = new Dictionary<int, (double lat, double lon)>() {{123,(18.180555,-66.749961)} }; works fine just for reference.

geocoords_stackoverflow.cs.txt
geocoords_worksfine.cs.txt

@jcouv commented on Tue Feb 28 2017

I'm able to see the repro file. The code causing stack overflow looks like this:

using System;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Net.Mail;

namespace HAL
{
    public static partial class geo
    {
        public static Dictionary<int,(double lat, double lon)> zipcodes = new Dictionary<int, (double lat, double lon)>()
{{00601,(18.180555,-66.749961)},
{00602,(18.361945,-67.175597)},
{00603,(18.455183,-67.119887)},
{00606,(18.158345,-66.932911)},
{00610,(18.295366,-67.125135)},
{00612,(18.402253,-66.711397)},
{00616,(18.420412,-66.671979)},
{00617,(18.445147,-66.559696)},
{00622,(17.991245,-67.153993)},
{00623,(18.083361,-67.153897)},
{00624,(18.064919,-66.716683)},
{00627,(18.412600,-66.863926)},
{00631,(18.190607,-66.832041)},
... goes on and on...
{99922,(55.307528,-133.046815)},
{99923,(56.002315,-130.041026)},
{99925,(55.550204,-132.945933)},
{99926,(55.138352,-131.470424)},
{99927,(56.239062,-133.457924)},
{99929,(56.370751,-131.693301)}}; // total is about 33100 lines

    }
}

@jcouv commented on Wed Sep 20 2017

I'm able to repro this. The program compiles fine, but it overflows when run.

The strange thing is that decompiling the static constructor shows that it only expects to use very little IL stack (.maxstack 4 from debug compilation):

.method private hidebysig specialname rtspecialname static 
        void  .cctor() cil managed
{
  // Code size       1160053 (0x11b375)
  .maxstack  4
  .locals init (class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>> V_0)
  IL_0000:  newobj     instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldloc.0
  IL_0007:  ldc.i4     0x259
  IL_000c:  ldc.r8     18.180554999999998
  IL_0015:  ldc.r8     -66.749960999999999
  IL_001e:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_0023:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)
  IL_0028:  nop
  IL_0029:  ldloc.0
  IL_002a:  ldc.i4     0x25a
  IL_002f:  ldc.r8     18.361944999999999
  IL_0038:  ldc.r8     -67.175596999999996
  IL_0041:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_0046:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)
  IL_004b:  nop
  IL_004c:  ldloc.0
  IL_004d:  ldc.i4     0x25b
  IL_0052:  ldc.r8     18.455183000000002
  IL_005b:  ldc.r8     -67.119887000000006
  IL_0064:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_0069:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)
  IL_006e:  nop
  IL_006f:  ldloc.0
  IL_0070:  ldc.i4     0x25e
  IL_0075:  ldc.r8     18.158345000000001
  IL_007e:  ldc.r8     -66.932911000000004
  IL_0087:  newobj     instance void valuetype [mscorlib]System.ValueTuple`2<float64,float64>::.ctor(!0,
                                                                                                     !1)
  IL_008c:  callvirt   instance void class [mscorlib]System.Collections.Generic.Dictionary`2<int32,valuetype [mscorlib]System.ValueTuple`2<float64,float64>>::Add(!0,
                                                                                                                                                                  !1)

               ... repeats ...

I've tested this with debug build and also optimized build (/o+ compiler option), but both overflow at runtime.

category:correctness
theme:large-methods
skill-level:expert
cost:medium

The text was updated successfully, but these errors were encountered:

jkotas · 2017-09-20T23:36:12Z

cc @dotnet/jit-contrib

it only expects to use very little IL stack (.maxstack 4 from debug compilation)

There are number of IL constructs that require the JIT to create temporary locals. maxstack is a poor approximation of actual stack consumption of a method.

The JIT turns off optimizations for very large methods like this one to get them the compile in a reasonable amount of time. And when the optimizations are off, there is no lifetime tracking for locals and each of these temps gets its own spot in the stack frame.

Even when it "works", initialization pattern like this will have poor performance. Initializing the dictionary by reading the numbers from a text file would be likely order of magnitude faster.

mikedn · 2017-09-21T05:01:06Z

maxstack is a poor approximation of actual stack consumption of a method.

For completeness, the compiled version of such a method requires ~1.3 megabytes of stack.

AndyAyersMS · 2017-11-07T20:24:42Z

Thanks to dotnet/coreclr#14890, the jit allocates two structs on the stack for each initialization (I wanted to verify that temp growth here is only linear, thankfully it is...).

So the ~33,000 initializer tuples create ~66,000 structs on the stack, each 16 bytes. These require around 1,056,000 bytes of stack frame and puts this method over the 1MB default size limit and hence causes stack overflow.

The best fix would be to realize that we really only need a small number of structs. It might be possible to have reusable temps sort of like we do (did) for the box temp; basically we'd reuse the temp or temps at stack empty points.

But another possible mitigation here is to fix dotnet/coreclr#14890, it would cut the stack usage in half.

AndyAyersMS · 2017-11-07T21:08:42Z

One more thing occurred to me -- I'll note it here on the off chance that it is helpful. The jit is very conservative about optimizing class constructors, since it knows they will only run once. So moving the initialization to a helper method at least allows the jit to think about optimizing.

Unfortunately the method is large enough that the jit decides not to optimize and we end up with more or less the same code.

But if you then prejit the assembly that uses the helper, the jit will decide to optimize and the stack size is cut roughly in half, as the extra structs introduced by dotnet/coreclr#14890 end up getting register promoted and so don't use up stack space.

AndyAyersMS · 2018-02-08T18:14:33Z

Box temp like reuse can't help here because (with recent CSCs at least) the eval stack does not empty between the struct newobjs. So importer can't easily tell if an existing temp is free.

AndyAyersMS · 2018-02-09T00:15:56Z

Thought about it some more. We might be able to "free up" a struct temp when it gets popped from the stack as the pop entails a copy.

Implemented this and the rest of the caching logic and it knocks the stack usage below 1MB. We still create one non-reused stack temp per call as there's no way currently to describe where the copy that gets passed as a call argument lives.

This temp re-use helps out in some other similar methods, eg System.Reflection.Emit.OpCodes:.cctor() but also causes some sizeable regressions, as evidently there are some cases where we decide the temp needs to be address taken. So I plan to to look deeper at diffs and see what causes that.

AndyAyersMS · 2018-02-10T02:32:30Z

Seems like in cases where we can re-use a struct temp, we need to be able to anticipate if that use of the temp was going to end up being promoted or not. If it will be promoted then a unique temp seems best, or at least a temp that is only re-used for other promotable instances (one might consider keeping a second temp that is used for all the address-exposed uses). For minopts/fast jitting where we won't ever promote we can just reuse with impunity. Also we won't promote certain classes of structs so we could run those checks early and just always reuse the temps in those cases too.

Unfortunately we don't have reliable indications about potential enregistration at the point that we need to decide whether to re-use a temp (and if we keep several around, which one). There are perhaps some tells; the newobj typically feeds a constructor call and while we don't yet know if that call is an inline candidate (and if so, if it will actually get inlined) we can tell sometimes that it won't be an inline candidate -- say for instance if we're in a catch or some other region where we generally inhibit inlining.

So one practical opt strategy might be to never re-use and then rely on some later stack-packing phase to collapse compatible disjoint-lifetime struct temps down onto just one on-stack instance. (This would also incidentally solve stack spill temp problems, at least when optimizing). But we don't have such a phase and they're not cheap to run.

Another might be to disentangle after the importer runs -- allow the importer to reuse temps as it sees fit, then run struct liveness after inlining (augmented to realize that non-inlined struct ctors won't escape their this params), then split out any disjoint lifetimes that are not address-exposed and promotable into their own temp (or temps).

So pragmatically I'm going to experiment with:

always re-use the temp, if in minopts/debug/(fastopt), similar to what happens with the box temp
never re-use the temp if the instance is not promotable, or perhaps use a "second temp"
never re-use if constructor is not inlinable (at best a guess), or perhaps use a "second temp"
otherwise re-use if possible

Also perhaps look at what it would mean to have morph or whatever phase it is that creates the out param temps try and do some reuse too...?

AndyAyersMS · 2018-02-26T22:46:46Z

I don't see us getting a fix for this in time for the 2.1 release, so I'm moving it to Future.

masonwheeler · 2018-02-26T23:06:56Z

@jkotas

Even when it "works", initialization pattern like this will have poor performance. Initializing the dictionary by reading the numbers from a text file would be likely order of magnitude faster.

That's kind of horrifying. Why doesn't the Roslyn compiler simply dump this big literal array into a sdata blob with whatever its equivalent of ModuleBuilder.DefineInitializedData is, then block-load it at runtime?

briansull · 2018-02-26T23:12:20Z

Beacuse you can't initialize objects in the managed heap en-mass.
It is intrinsically a one at a time operation.

masonwheeler · 2018-02-26T23:16:49Z

@briansull That doesn't actually make sense. Are you familiar with how DefineInitializedData works?

briansull · 2018-02-26T23:30:34Z

Actually DefineInitializedData sounds like it is dealing with unmanaged BLOBs, not Object types.

masonwheeler · 2018-02-26T23:33:21Z

Not object types, struct types.

Basically, your compiler serializes your struct array literal to a binary blob which it saves to the PE file. At runtime, that blob gets loaded into a static field as a byte array, and then you use System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray to blit that into your struct array.

briansull · 2018-02-27T00:01:47Z

Well you can make a feature request for some functionally like that to be "added" to Roslyn.
In the test case above we are populating a Dictionary<K,V> which is quite a bit different than a struct or an array.

masonwheeler · 2018-02-27T22:09:15Z

@briansull *facepalm* OK, that's on me. I was about to point out how you can initialize it with an array of structs with the IEnumerable<KeyValuePair<K, V>> constructor, when I double-checked and found that that's not actually a thing!

How did the team manage to overlook something that obvious?!? All the other collections can be initialized from arrays via IEnumerable<T>...

jkotas · 2018-02-27T22:40:17Z

initialize it with an array of structs with the IEnumerable<KeyValuePair<K, V>> constructor

I do not think that this constructor would help. It would flip the problem into how to create huge pre-initialized array of structs that suffers from the same issue.

The best way to do this kind of initialization today is via arrays of primitive constants. It is the only one that does not suffer from this issue. Here is how the example above could be written in lean and performant way:

public static partial class geo
{
    public static Dictionary<int,(double lat, double lon)> zipcodes = CreateZipCodes();

    static Dictionary<int,(double lat, double lon)> CreateZipCodes()
    {
        double[] data = new double[] {
            00601, 18.180555, -66.749961, 
            00602, 18.361945, -67.175597,
            ...
        };

        Dictionary dict = new Dictionary<int, (double lat, double lon)>(data.Length / 3);
        for (int i = 0; i < data.Length; i += 3)
           dict.Add((int)data[i], (data[i+1], data[i+2]));
        return dict;
    }
}

masonwheeler · 2018-02-27T22:51:38Z

@jkotas

I do not think that this constructor would help. It would flip the problem into how to create huge pre-initialized array of structs that suffers from the same issue.

Nope. If you can save it into a blob at compile-time, and then at runtime blit that blob back into your array with System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray, there's no chance of the stack overflowing because a System.Array is a reference type. Then the constructor could pass in that array and for-loop over it, and no matter how big the array is, you still won't blow the stack.

jkotas · 2018-02-27T22:55:50Z

@masonwheeler So assume that there is Dictionary constructor that takes IEnumerable<KeyValuePair<K, V>. What would the code look like for the example in this issue?

VSadov · 2018-02-27T23:02:33Z

FWIW C# specifies collection initializers in terms of calls to Add methods. It is assumed that Add could be sideeffecting so compiler must emit as many calls to Add as there are items in the initializer.

It might be possible to pack all the literals in the initializer into an array, or multiple arrays (if they have different types) and after rehydrating those arrays feed them to the loop that populates the collection via Add.

You may imagine the complexity of the transformation and the conditions for the optimizaion to be actually applicable/useful.

masonwheeler · 2018-02-27T23:18:07Z

@jkotas

The compiler would take the dictionary initializer literal and transform it into a []byte, which is saved as a const field with csc's metadata model's equivalent to ModuleBuilder.DefineInitializedData.
Compiler gets a reference to this field
Compiler emits the following pseudocode IL:
- ldc.i4.s initializer_count
- newarr System.Collections.Generic.KeyValuePair<K,V>
- dup
- ldtoken const_field_reference
- call System.Runtime.CompilerServices.RuntimeHelpers.InitializeArray
This initializes the array and leaves a reference to it on the top of the execution stack. This is the argument to the dict constructor, so emit a call to it.

jkotas · 2018-02-27T23:26:29Z

InitializeArray works for primitive types only today. You are describing a new set of features across language and runtime. The dictionary constructor to support this pattern is far from the only thing missing.

masonwheeler · 2018-02-27T23:29:03Z

@jkotas Does it? That's not actually in the documentation; it just says "this method is used by compilers."

Well that's disappointing to learn, then.

i3anaan · 2020-05-22T13:50:16Z

We ran into the same issue with a dictionary of signature new Dictionary<string, object>(), and about 4000 key/value pairs. (our application generates C# code files based on some database and compiles it into a dll)
Surely caused some headaches on our side debugging this issue, as the message you get is not very informational, and we did certainly not expect it to be caused by this.

AndyAyersMS · 2020-05-22T17:38:07Z

I wouldn't expect you to hit the problem noted above for Dictionary<string, object> as neither key or value types are structs. If you can get me a repro case I'll take a closer look.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

AndyAyersMS mentioned this issue Apr 9, 2020

Reuse temporaries for every new by default #4275

Open

AndyAyersMS mentioned this issue Oct 21, 2020

StackOverflowException from collection initializer with huge number of elements #43659

Closed

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

jkotas mentioned this issue Mar 19, 2021

[bug] excessive stack usage, seems values not popped from the stack, resulting in stack overflow #49842

Closed

jakubmisek mentioned this issue Mar 21, 2021

weird stack overflow exception peachpiecompiler/peachpie#923

Closed

jakubmisek mentioned this issue May 12, 2021

Problems related to WordPress PHPUnit tests peachpiecompiler/peachpie#791

Open

11 tasks

jkotas mentioned this issue Jun 25, 2021

Massive allocations in unmanaged memory on Dictionary creation #54688

Closed

jakobbotsch mentioned this issue Oct 12, 2022

[RyuJIT] Introduce a general reverse copy prop for struct temps #8887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980

System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980

jcouv commented Sep 20, 2017

jkotas commented Sep 20, 2017 •

edited

Loading

mikedn commented Sep 21, 2017

AndyAyersMS commented Nov 7, 2017

AndyAyersMS commented Nov 7, 2017

AndyAyersMS commented Feb 8, 2018

AndyAyersMS commented Feb 9, 2018 •

edited

Loading

AndyAyersMS commented Feb 10, 2018

AndyAyersMS commented Feb 26, 2018

masonwheeler commented Feb 26, 2018

briansull commented Feb 26, 2018 •

edited

Loading

masonwheeler commented Feb 26, 2018

briansull commented Feb 26, 2018 •

edited

Loading

masonwheeler commented Feb 26, 2018 •

edited

Loading

briansull commented Feb 27, 2018

masonwheeler commented Feb 27, 2018

jkotas commented Feb 27, 2018 •

edited

Loading

masonwheeler commented Feb 27, 2018

jkotas commented Feb 27, 2018

VSadov commented Feb 27, 2018

masonwheeler commented Feb 27, 2018

jkotas commented Feb 27, 2018

masonwheeler commented Feb 27, 2018

i3anaan commented May 22, 2020

AndyAyersMS commented May 22, 2020

System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980

System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980

Comments

jcouv commented Sep 20, 2017

jkotas commented Sep 20, 2017 • edited Loading

mikedn commented Sep 21, 2017

AndyAyersMS commented Nov 7, 2017

AndyAyersMS commented Nov 7, 2017

AndyAyersMS commented Feb 8, 2018

AndyAyersMS commented Feb 9, 2018 • edited Loading

AndyAyersMS commented Feb 10, 2018

AndyAyersMS commented Feb 26, 2018

masonwheeler commented Feb 26, 2018

briansull commented Feb 26, 2018 • edited Loading

masonwheeler commented Feb 26, 2018

briansull commented Feb 26, 2018 • edited Loading

masonwheeler commented Feb 26, 2018 • edited Loading

briansull commented Feb 27, 2018

masonwheeler commented Feb 27, 2018

jkotas commented Feb 27, 2018 • edited Loading

masonwheeler commented Feb 27, 2018

jkotas commented Feb 27, 2018

VSadov commented Feb 27, 2018

masonwheeler commented Feb 27, 2018

jkotas commented Feb 27, 2018

masonwheeler commented Feb 27, 2018

i3anaan commented May 22, 2020

AndyAyersMS commented May 22, 2020

jkotas commented Sep 20, 2017 •

edited

Loading

AndyAyersMS commented Feb 9, 2018 •

edited

Loading

briansull commented Feb 26, 2018 •

edited

Loading

briansull commented Feb 26, 2018 •

edited

Loading

masonwheeler commented Feb 26, 2018 •

edited

Loading

jkotas commented Feb 27, 2018 •

edited

Loading