-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.ValueTuple causes a StackOverflow when used in a large Dictionary. #8980
Comments
cc @dotnet/jit-contrib
There are number of IL constructs that require the JIT to create temporary locals. maxstack is a poor approximation of actual stack consumption of a method. The JIT turns off optimizations for very large methods like this one to get them the compile in a reasonable amount of time. And when the optimizations are off, there is no lifetime tracking for locals and each of these temps gets its own spot in the stack frame. Even when it "works", initialization pattern like this will have poor performance. Initializing the dictionary by reading the numbers from a text file would be likely order of magnitude faster. |
For completeness, the compiled version of such a method requires ~1.3 megabytes of stack. |
Thanks to dotnet/coreclr#14890, the jit allocates two structs on the stack for each initialization (I wanted to verify that temp growth here is only linear, thankfully it is...). So the ~33,000 initializer tuples create ~66,000 structs on the stack, each 16 bytes. These require around 1,056,000 bytes of stack frame and puts this method over the 1MB default size limit and hence causes stack overflow. The best fix would be to realize that we really only need a small number of structs. It might be possible to have reusable temps sort of like we do (did) for the box temp; basically we'd reuse the temp or temps at stack empty points. But another possible mitigation here is to fix dotnet/coreclr#14890, it would cut the stack usage in half. |
One more thing occurred to me -- I'll note it here on the off chance that it is helpful. The jit is very conservative about optimizing class constructors, since it knows they will only run once. So moving the initialization to a helper method at least allows the jit to think about optimizing. Unfortunately the method is large enough that the jit decides not to optimize and we end up with more or less the same code. But if you then prejit the assembly that uses the helper, the jit will decide to optimize and the stack size is cut roughly in half, as the extra structs introduced by dotnet/coreclr#14890 end up getting register promoted and so don't use up stack space. |
Box temp like reuse can't help here because (with recent CSCs at least) the eval stack does not empty between the struct newobjs. So importer can't easily tell if an existing temp is free. |
Thought about it some more. We might be able to "free up" a struct temp when it gets popped from the stack as the pop entails a copy. Implemented this and the rest of the caching logic and it knocks the stack usage below 1MB. We still create one non-reused stack temp per call as there's no way currently to describe where the copy that gets passed as a call argument lives. This temp re-use helps out in some other similar methods, eg |
Seems like in cases where we can re-use a struct temp, we need to be able to anticipate if that use of the temp was going to end up being promoted or not. If it will be promoted then a unique temp seems best, or at least a temp that is only re-used for other promotable instances (one might consider keeping a second temp that is used for all the address-exposed uses). For minopts/fast jitting where we won't ever promote we can just reuse with impunity. Also we won't promote certain classes of structs so we could run those checks early and just always reuse the temps in those cases too. Unfortunately we don't have reliable indications about potential enregistration at the point that we need to decide whether to re-use a temp (and if we keep several around, which one). There are perhaps some tells; the So one practical opt strategy might be to never re-use and then rely on some later stack-packing phase to collapse compatible disjoint-lifetime struct temps down onto just one on-stack instance. (This would also incidentally solve stack spill temp problems, at least when optimizing). But we don't have such a phase and they're not cheap to run. Another might be to disentangle after the importer runs -- allow the importer to reuse temps as it sees fit, then run struct liveness after inlining (augmented to realize that non-inlined struct ctors won't escape their So pragmatically I'm going to experiment with:
Also perhaps look at what it would mean to have morph or whatever phase it is that creates the out param temps try and do some reuse too...? |
I don't see us getting a fix for this in time for the 2.1 release, so I'm moving it to Future. |
That's kind of horrifying. Why doesn't the Roslyn compiler simply dump this big literal array into a sdata blob with whatever its equivalent of ModuleBuilder.DefineInitializedData is, then block-load it at runtime? |
Beacuse you can't initialize objects in the managed heap en-mass. |
@briansull That doesn't actually make sense. Are you familiar with how |
Actually |
Not object types, struct types. Basically, your compiler serializes your struct array literal to a binary blob which it saves to the PE file. At runtime, that blob gets loaded into a static field as a byte array, and then you use |
Well you can make a feature request for some functionally like that to be "added" to Roslyn. |
@briansull *facepalm* OK, that's on me. I was about to point out how you can initialize it with an array of structs with the How did the team manage to overlook something that obvious?!? All the other collections can be initialized from arrays via |
I do not think that this constructor would help. It would flip the problem into how to create huge pre-initialized array of structs that suffers from the same issue. The best way to do this kind of initialization today is via arrays of primitive constants. It is the only one that does not suffer from this issue. Here is how the example above could be written in lean and performant way: public static partial class geo
{
public static Dictionary<int,(double lat, double lon)> zipcodes = CreateZipCodes();
static Dictionary<int,(double lat, double lon)> CreateZipCodes()
{
double[] data = new double[] {
00601, 18.180555, -66.749961,
00602, 18.361945, -67.175597,
...
};
Dictionary dict = new Dictionary<int, (double lat, double lon)>(data.Length / 3);
for (int i = 0; i < data.Length; i += 3)
dict.Add((int)data[i], (data[i+1], data[i+2]));
return dict;
}
} |
Nope. If you can save it into a blob at compile-time, and then at runtime blit that blob back into your array with |
@masonwheeler So assume that there is |
FWIW C# specifies collection initializers in terms of calls to It might be possible to pack all the literals in the initializer into an array, or multiple arrays (if they have different types) and after rehydrating those arrays feed them to the loop that populates the collection via You may imagine the complexity of the transformation and the conditions for the optimizaion to be actually applicable/useful. |
|
InitializeArray works for primitive types only today. You are describing a new set of features across language and runtime. The dictionary constructor to support this pattern is far from the only thing missing. |
@jkotas Does it? That's not actually in the documentation; it just says "this method is used by compilers." Well that's disappointing to learn, then. |
We ran into the same issue with a dictionary of signature |
I wouldn't expect you to hit the problem noted above for |
Moved (and edited) this issue from the roslyn repo.
Summary: you add an empty
Main
method to the attached code, then compile it withcsc.exe geocoords_stackoverflow.cs.txt
. When you run the resulting executable, you get an overflow error (Process is terminated due to StackOverflowException.
)From discussion with @VSadov, we think this is a JIT issue rather than a compiler issue.
FYI @jkotas for triage. Thanks
@andras-ferencz commented on Tue Feb 28 2017
When using the new System.ValueTuple in a large Dictionary, I receive a stack overflow exception where as I do not with System.Tuple.
Dictionary<int, (double lat, double lon)> zipcodes = new Dictionary<int, (double lat, double lon)>() {{123,(18.180555,-66.749961)} };
works fine just for reference.geocoords_stackoverflow.cs.txt
geocoords_worksfine.cs.txt
@jcouv commented on Tue Feb 28 2017
I'm able to see the repro file. The code causing stack overflow looks like this:
@jcouv commented on Wed Sep 20 2017
I'm able to repro this. The program compiles fine, but it overflows when run.
The strange thing is that decompiling the static constructor shows that it only expects to use very little IL stack (
.maxstack 4
from debug compilation):I've tested this with debug build and also optimized build (
/o+
compiler option), but both overflow at runtime.category:correctness
theme:large-methods
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: