Introduce detection of generic cycles #1681

MichalStrehovsky · 2021-10-27T09:02:48Z

Motivated by npgsql/npgsql#4057. The generic recursion that was in Npgsql made its way back into Npgsql. We've seen these show up too often. Not because it's common to write recursions in C#, but because popular NuGets end up having these and then people hit it too often.

This brings over the generic cycle detection from .NET Native and hooks it up in a rudimentary way. The pull request is split into logical commits for better reviewability.

We detect generic recursion at two spots:

In the code generation phase. This detects all the cases where shared generics are not involved.
In generic dictionaries. This is the more typical case where recursion results in never-ending expansion in terms of types and generic dictionaries.

If we detect a recursion, we unceremoniously cut it off after a couple levels of nesting. The cutoff point is fairly low because in the cases I’ve seen, the recursion also causes an expansion in breadth, not just depth, and allowing that to go unchecked was resulting in sadness. If the code goes beyond the cutoff point at runtime, it will not work.

The recursion detector is essentially a prepass – it goes over everything in the assembly, trying to look for recursions and generating a list of types/generic method involved in a cycle. I’m fairly convinced it cannot be done in other ways than with a prepass.

Looking at the .NET Native code, we had quite a few limitations where generic recursion would not be detected. Given a failure to detect a cycle results in a compiler failure, and we’ve not heard of this failure over the years, the .NET Native approach is likely sufficient.

What’s missing

If we cutoff the generic recursion within a generic dictionary, the failure mode at runtime is a NullRef shortly after a dictionary lookup (because the cut off slot will be null). It’s not great. I think we can update the lookup helpers to check for null while restricting the check to only the helpers where we’ve seen a damaged slot. We also need to make sure such lookups never get inlined by RyuJIT and always go through our helpers. I think it’s doable and not hard. I just don’t want to put more in this pull request than what’s strictly necessary.

Lazy generics. If the recursion happens over reference types, we could potentially generate fully working code by failing back to lazy generics. The cases I was looking at were not over reference types.

We might want to allow specifying the cutoff point at compile time (also allow disabling all of this).

Perf

I've seen a 2% regression in compilation throughput for WebApi template. We can make this more efficient by e.g. not hydrating type system objects for everything when we're scanning for cycles and staying with S.R.Metadata for longer.

MichalStrehovsky · 2021-10-27T09:04:28Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/LazyGenerics/Graph.Cycles.cs

+                    if (previousAlgorithmTimeoutWatch.ElapsedMilliseconds > s_previousAlgorithmTimeout)
+                    {
+                        abortedDueToTimeout = true;
+                        break;
+                    }


We might not want this for determinism. I've not seen things go anywhere near the timeout. Even in unoptimized debug builds most assemblies are analyzed within 2 seconds.

So delete it for now and see whether it is going to show up as actual problem?

We may want to just print a message to verbose logger, so that it is easy to tell that the compilation is struck in this recursion detector.

I looked at this a bit more and the "previous algorithm" this is referring to is the original algorithm .NET Native used. .NET Native originally had a naive implementation of this that would take a really long time. Then David changed it to use Tarjan's algorithm. He kept the original algorithm and in DEBUG builds we run both and compare the results. So all of the timeouts are under #if DEBUG. It's probably fine to just keep this or delete it all. I don't have a strong opinion either way. It's probably easier to see why the cycle formed in the naive algorithm.

MichalStrehovsky · 2021-10-28T02:56:07Z

With the latest commit, the impact on compiler throughput is within noise range. The largest assembly in WebApi gets analyzed in 88 ms. Most assemblies are in single-digit ms range, a lot even at 0. We do pretty much all of this work in the multithreaded phase.

jkotas · 2021-10-28T18:45:58Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/LazyGenerics/ModuleCycleInfo.cs

+                    {
+                        if (referentType != null)
+                        {
+                            // TODO: better exception string ID?


This is more part of the compiler, not part of the core type system. I think we can just use regular logger to communicate the problem here, and also log the types and method involved in the cycle here.

This exception will still be rethown at runtime when the cutoff code is actually reached, so we need a string ID because we might end up generating throwing code out of this. Throwing a type system exception makes all of this fall out of it nicely because we have all the infrastructure to do that.

I'll revisit this when I work on a nicer experience around this.

jkotas

Thank you!

MichalStrehovsky · 2021-10-29T01:20:04Z

Cc @yowl - the change in CorInfoImpl.RyuJit.cs (or the equivalent change in ILImporter.Scanner.cs) is relevant to WASM as well - without the extra call to check if we're forming a cycle, the new tests in Generics.cs will run out of memory at compile time trying to compile an endless generic recursion.

hez2010 · 2021-10-29T03:49:17Z

With above changes there're still some tests failing, see #776 for test cases.

GenericExpansionWithGenericMethod: Arg_NullReferenceException (if I reduce the expansion level, it will yield NotSupported_SubclassOverride)
GenericExpansionWithInterfaces: NotSupported_SubclassOverride
GenericExpansionWithStructsViaDirectMethodCall: compilation never finish

MichalStrehovsky · 2021-10-29T04:09:03Z

With above changes there're still some tests failing, see #776 for test cases.

Thanks for checking! Yeah, I intentionally didn't make this resolve #776 - this is just the bare minimum. I expect 2-3 more pull requests before it gets in a shape that I'll be happy with.

MichalStrehovsky · 2021-10-29T06:39:28Z

Are you running the tests in reflection disabled mode? The tests do use reflection and reflection disabled mode didn't bother overriding non-abstract methods on System.Type, so that would explain NotSupported_SubclassOverride.

The NullReferenceException is expected because "If we cutoff the generic recursion within a generic dictionary, the failure mode at runtime is a NullRef shortly after a dictionary lookup (because the cut off slot will be null). It’s not great. I think we can update the lookup helpers to check for null while restricting the check to only the helpers where we’ve seen a damaged slot. We also need to make sure such lookups never get inlined by RyuJIT and always go through our helpers. I think it’s doable and not hard. I just don’t want to put more in this pull request than what’s strictly necessary.").

hez2010 · 2021-10-29T07:51:46Z

Are you running the tests in reflection disabled mode?

Yes. I forgot to enable reflection while testing :D

yowl · 2021-10-29T12:14:10Z

Cc @yowl - the change in CorInfoImpl.RyuJit.cs (or the equivalent change in ILImporter.Scanner.cs) is relevant to WASM as well -

Alternatively I suppose if the methods in questions were compiled with the new RyuJit process they would get the benefit of your changes in CorInfoImpl.RyuJit.cs?

MichalStrehovsky · 2021-10-30T00:34:17Z

Alternatively I suppose if the methods in questions were compiled with the new RyuJit process they would get the benefit of your changes in CorInfoImpl.RyuJit.cs?

Yup, that should just fall out there!

MichalStrehovsky requested a review from jkotas October 27, 2021 09:02

MichalStrehovsky commented Oct 27, 2021

View reviewed changes

Add source code from the .NET Native compiler

bdc1c47

MichalStrehovsky force-pushed the lazygenerics branch from e20ec47 to a4e5c58 Compare October 27, 2021 23:25

MichalStrehovsky added 2 commits October 28, 2021 12:50

Make things build

895f5db

Hook things up

1ec14c5

MichalStrehovsky force-pushed the lazygenerics branch from ff42a68 to 1ec14c5 Compare October 28, 2021 03:51

jkotas reviewed Oct 28, 2021

View reviewed changes

jkotas approved these changes Oct 28, 2021

View reviewed changes

MichalStrehovsky merged commit 644d17d into dotnet:feature/NativeAOT Oct 28, 2021

MichalStrehovsky deleted the lazygenerics branch October 28, 2021 23:42

This was referenced Nov 4, 2021

[aspnetcore-corert] Upgrade to .NET 6.0-RC2 TechEmpower/FrameworkBenchmarks#6862

Merged

Detect infinite generic expansion #776

Closed

yowl mentioned this pull request Jan 19, 2022

[NativeAOT-LLVM]: Outstanding tasks tracking issue #1828

Open

31 tasks

MichalStrehovsky mentioned this pull request Jul 28, 2022

[NativeAOT] crash on unix-arm64 in TestSimpleGenericRecursion__RecurseOverClass dotnet/runtime#72966

Closed

MichalStrehovsky mentioned this pull request Oct 28, 2023

Detect generic cycles in the Roslyn analyzer dotnet/runtime#94131

Open

Introduce detection of generic cycles #1681

Introduce detection of generic cycles #1681

Uh oh!

Conversation

MichalStrehovsky commented Oct 27, 2021

What’s missing

Perf

Uh oh!

MichalStrehovsky Oct 27, 2021

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky commented Oct 28, 2021

Uh oh!

jkotas Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky Oct 28, 2021

Choose a reason for hiding this comment

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky commented Oct 29, 2021

Uh oh!

hez2010 commented Oct 29, 2021

Uh oh!

MichalStrehovsky commented Oct 29, 2021

Uh oh!

MichalStrehovsky commented Oct 29, 2021

Uh oh!

hez2010 commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yowl commented Oct 29, 2021

Uh oh!

MichalStrehovsky commented Oct 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jkotas Oct 28, 2021 •

edited

Loading

hez2010 commented Oct 29, 2021 •

edited

Loading