Skip to content

Conversation

@MichalStrehovsky
Copy link
Member

Motivated by npgsql/npgsql#4057. The generic recursion that was in Npgsql made its way back into Npgsql. We've seen these show up too often. Not because it's common to write recursions in C#, but because popular NuGets end up having these and then people hit it too often.

This brings over the generic cycle detection from .NET Native and hooks it up in a rudimentary way. The pull request is split into logical commits for better reviewability.

We detect generic recursion at two spots:

  • In the code generation phase. This detects all the cases where shared generics are not involved.
  • In generic dictionaries. This is the more typical case where recursion results in never-ending expansion in terms of types and generic dictionaries.

If we detect a recursion, we unceremoniously cut it off after a couple levels of nesting. The cutoff point is fairly low because in the cases I’ve seen, the recursion also causes an expansion in breadth, not just depth, and allowing that to go unchecked was resulting in sadness. If the code goes beyond the cutoff point at runtime, it will not work.

The recursion detector is essentially a prepass – it goes over everything in the assembly, trying to look for recursions and generating a list of types/generic method involved in a cycle. I’m fairly convinced it cannot be done in other ways than with a prepass.

Looking at the .NET Native code, we had quite a few limitations where generic recursion would not be detected. Given a failure to detect a cycle results in a compiler failure, and we’ve not heard of this failure over the years, the .NET Native approach is likely sufficient.

What’s missing

If we cutoff the generic recursion within a generic dictionary, the failure mode at runtime is a NullRef shortly after a dictionary lookup (because the cut off slot will be null). It’s not great. I think we can update the lookup helpers to check for null while restricting the check to only the helpers where we’ve seen a damaged slot. We also need to make sure such lookups never get inlined by RyuJIT and always go through our helpers. I think it’s doable and not hard. I just don’t want to put more in this pull request than what’s strictly necessary.

Lazy generics. If the recursion happens over reference types, we could potentially generate fully working code by failing back to lazy generics. The cases I was looking at were not over reference types.

We might want to allow specifying the cutoff point at compile time (also allow disabling all of this).

Perf

I've seen a 2% regression in compilation throughput for WebApi template. We can make this more efficient by e.g. not hydrating type system objects for everything when we're scanning for cycles and staying with S.R.Metadata for longer.

Comment on lines +283 to +287
if (previousAlgorithmTimeoutWatch.ElapsedMilliseconds > s_previousAlgorithmTimeout)
{
abortedDueToTimeout = true;
break;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not want this for determinism. I've not seen things go anywhere near the timeout. Even in unoptimized debug builds most assemblies are analyzed within 2 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So delete it for now and see whether it is going to show up as actual problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to just print a message to verbose logger, so that it is easy to tell that the compilation is struck in this recursion detector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this a bit more and the "previous algorithm" this is referring to is the original algorithm .NET Native used. .NET Native originally had a naive implementation of this that would take a really long time. Then David changed it to use Tarjan's algorithm. He kept the original algorithm and in DEBUG builds we run both and compare the results. So all of the timeouts are under #if DEBUG. It's probably fine to just keep this or delete it all. I don't have a strong opinion either way. It's probably easier to see why the cycle formed in the naive algorithm.

@MichalStrehovsky
Copy link
Member Author

With the latest commit, the impact on compiler throughput is within noise range. The largest assembly in WebApi gets analyzed in 88 ms. Most assemblies are in single-digit ms range, a lot even at 0. We do pretty much all of this work in the multithreaded phase.

{
if (referentType != null)
{
// TODO: better exception string ID?
Copy link
Member

@jkotas jkotas Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more part of the compiler, not part of the core type system. I think we can just use regular logger to communicate the problem here, and also log the types and method involved in the cycle here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception will still be rethown at runtime when the cutoff code is actually reached, so we need a string ID because we might end up generating throwing code out of this. Throwing a type system exception makes all of this fall out of it nicely because we have all the infrastructure to do that.

I'll revisit this when I work on a nicer experience around this.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@MichalStrehovsky MichalStrehovsky merged commit 644d17d into dotnet:feature/NativeAOT Oct 28, 2021
@MichalStrehovsky MichalStrehovsky deleted the lazygenerics branch October 28, 2021 23:42
@MichalStrehovsky
Copy link
Member Author

Cc @yowl - the change in CorInfoImpl.RyuJit.cs (or the equivalent change in ILImporter.Scanner.cs) is relevant to WASM as well - without the extra call to check if we're forming a cycle, the new tests in Generics.cs will run out of memory at compile time trying to compile an endless generic recursion.

@hez2010
Copy link

hez2010 commented Oct 29, 2021

With above changes there're still some tests failing, see #776 for test cases.

  • GenericExpansionWithGenericMethod: Arg_NullReferenceException (if I reduce the expansion level, it will yield NotSupported_SubclassOverride)
  • GenericExpansionWithInterfaces: NotSupported_SubclassOverride
  • GenericExpansionWithStructsViaDirectMethodCall: compilation never finish

@MichalStrehovsky
Copy link
Member Author

With above changes there're still some tests failing, see #776 for test cases.

Thanks for checking! Yeah, I intentionally didn't make this resolve #776 - this is just the bare minimum. I expect 2-3 more pull requests before it gets in a shape that I'll be happy with.

@MichalStrehovsky
Copy link
Member Author

Are you running the tests in reflection disabled mode? The tests do use reflection and reflection disabled mode didn't bother overriding non-abstract methods on System.Type, so that would explain NotSupported_SubclassOverride.

The NullReferenceException is expected because "If we cutoff the generic recursion within a generic dictionary, the failure mode at runtime is a NullRef shortly after a dictionary lookup (because the cut off slot will be null). It’s not great. I think we can update the lookup helpers to check for null while restricting the check to only the helpers where we’ve seen a damaged slot. We also need to make sure such lookups never get inlined by RyuJIT and always go through our helpers. I think it’s doable and not hard. I just don’t want to put more in this pull request than what’s strictly necessary.").

@hez2010
Copy link

hez2010 commented Oct 29, 2021

Are you running the tests in reflection disabled mode?

Yes. I forgot to enable reflection while testing :D

@yowl
Copy link
Contributor

yowl commented Oct 29, 2021

Cc @yowl - the change in CorInfoImpl.RyuJit.cs (or the equivalent change in ILImporter.Scanner.cs) is relevant to WASM as well -

Alternatively I suppose if the methods in questions were compiled with the new RyuJit process they would get the benefit of your changes in CorInfoImpl.RyuJit.cs?

@MichalStrehovsky
Copy link
Member Author

Alternatively I suppose if the methods in questions were compiled with the new RyuJit process they would get the benefit of your changes in CorInfoImpl.RyuJit.cs?

Yup, that should just fall out there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants