-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Text.Json incremental source generator performance issues #68353
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsThe System.Text.Json.SourceGeneration.Roslyn4.0.Tests project defines ~35 large I ran a performance investigation and came up with the following findings:
I'm not sure if 1) could be improved, however it definitely seems like 2) is a performance bug in our source generator implementation that we should try to fix. Not sure how it could be done without some refactoring of the cc @layomia @ericstj @eerhardt
|
cc @chsienki |
Here's a naive workaround that improves performance when editing the project in Visual Studio. It avoids regenerating sources for unchanged context classes by avoiding As it stands though the workaround suffers from a couple of issues:
|
We clearly have a lot more to learn about building these. Do we have the right inputs and APIs available to make our source-regeneration incremental? I know @eiriktsarpalis and @sharwell looked at this before and made improvements. One thing to be careful of -- which I noticed for our own SLN -- is that it seemed to be loading the "first" source generator for JSON that it hit which was sometimes the old non-incremental one. It wouldn't load the incremental version since it had the exact same name. That's an internal only problem since customers would see only the one appropriate for their compiler version. |
[With this comment I'm assuming that generating less files might have less impact on memory and cores, even if the material content generated remains the same]
I recall @eerhardt gave feedback about reducing the number of files we output. Today we generate individual files for the following:
Since we've identified this as a perf bottleneck, we should consider combining these files (especially the first 3). Unique files for each type is nice for debugging source-generated code for a single type, so it might be helpful to define a file-generation strategy, say based on an msbuild property. |
Issues 2 & 3 should be looked into and addressed.
Definitely worth looking into resolving this. |
Never do this. This action instructs the command line compiler that it is allowed to produce inaccurate outputs in an actual .dll or .exe, and will allow a production machine to publish/sign/ship that invalid output. |
There were several Roslyn bugs that were causing this to occur when it shouldn't. Roslyn's design is to not regenerate sources on every keystroke regardless of the implementation of the source generator itself, and either 17.2 or 17.3 has been updated to better reflect this desired behavior. |
FWIW I deleted the Roslyn3.1 projects locally to ensure no interference and can confirm that the performance problems persist even with just the 4.0 projects present. |
That's interesting, I would have assumed there were no incremental caching concerns when building from the command line. I can see how this can cause the source generator to fire less frequently than is desirable in interactive scenaria, but I don't think it should be capable of passing unsound inputs to the source generator. |
Hmmm, I tried attaching the Roslyn process from the latest VS 17.3 to the debugger and it would seem like every edit still triggers a new compilation. |
It's also possible to enable the incremental cache for VBCSCompiler, with a long-term goal to improve compilation performance by reducing the number of cases where a source generator needs to run.
Edits can trigger new compilations, but it's no longer tied to the inner typing loop and should operate on longer delays. Continued typing should automatically increase the delay. Some (non-default) user settings can override this behavior and increase the frequency the generators run. Under a debugger, it may be hard to tell the difference between the 17.1 and 17.3 behavior. However, if you watch the ETW events to see the total number of generator executions/updates, you should see a significant reduction in the numbers during active typing scenarios. |
This is definitely an issue with the latest versions of roslyn. Locally, i'm measuring the json generator using 75% of all the CPU in devenv and our external server. This impacts all features that need accurate semantics. For example, lightbulbs, diagnostics, etc. i've measured tens of seconds of delay introduced by this. |
#69332 Helps address:
|
Moving to 8.0.0 as we won't have time to address this in .NET 7 |
I'm developing a similar source generator as I'm a bit shocked with Roslyn "incremental" source generation, it needs significant change to make it real "incremental":
So the viable solution would be using |
Correct. That's because a good incremental generator will rarely ever get to that step after the pipeline has run once. With most users edits/changes, the pipeline will stop early, so we can use the texts/trees from the prior generation as is.
As you said, it's an implementation detail. Exposing it would lock us into that impl detail forever.
You should extract from the INamedType the data your generator actually needs, and use that subset of data as the value your IVP produces. That way, if that subset doesn't change, the pipeline will stop. |
So if I understand correctly, the pipeline can be short-circuited, it's not incremental. Let's say we have 100 classes type1...type100 defined in the project, the first run will generate source code file 1...100, each depends on INamedTypeSymbol1...INamedTypeSymbol100, and public members of each INamedTypeSymbol. If an edit in IDE triggers no change of generated 100 files at all, the pipeline can stop before actually generating the files. This is good improvement compares to V1 Now let's get to pipeline short-circuit. Theoretically we can extract from the |
Short circuiting is incremental. This is the same way that incremental parsing works. The portions to be redone are computed, and the system stops once it discovers that redoing work would produce the same results as before. At that point, the old results are carried forward. |
It does support this. Just produce the same model values for nodes 1-4 and 6-100 and we won't do the following steps after that. We'll only recompute the source for model 5 and reuse the rest. |
No, you won't have to do this. The syntaxtree instances themselves are identical. So, if they are your model value (or part of your model), they will be equal across edits outside of them. So no need to diff them at all. And, if you even did diff, you'd just do the pointer check. You only need to diff the trees that are actually different (i.e. don't have reference equality). |
You only need the red node tree for the current file. And that's not a problem because literally every feature (including the compiler) needs it. So it's going to be computed. If you are that worried though, you can use IsIncrementallyEquivalent on the nodes as you walk the tree to avoid going into red nodes we can prove are identical. |
Note: if you'd like to see a demonstration of this in action, I recommend looking at the impl in roslyn of ForAttributeWithMetadataName and the simple name helper it calls. You'll see how it is able to just reanalyze only a particular tree when it changes to build intermediary state. And it then uses that state to determine what even to examine elsewhere in the entire project. |
Thanks a lot for all your explanations! I dug a little further into Roslyn source code repo, you're right, the pipeline is incremental.
I can't find |
Sorry, it's called IsIncrementallyIdenticalTo. :) |
We'd need to speak practically about what you're trying to do to to be able to make any determinations about the issue. |
The System.Text.Json.SourceGeneration.Roslyn4.0.Tests project defines ~35 large
JsonSerializerContext
classes, resulting in approximately 3500 generated source files. The size of the project has had a detrimental effect on performance, both when building but especially when editing the project: on my primary dev machine the IDE essentially goes unresponsive when attempting to make any change in the project.I ran a performance investigation and came up with the following findings:
SourceProductionContext.AddSource
method and is not related to the source generator itself.Compilation
object. This is explicitly called out as an anti-pattern in the incremental source generation doc, since any change will force re-evaluation of the source generator.Collect()
on theClassDeclarationSyntax
values provider, feeding every singleJsonSerializerContext
declaration to the source generator regardless of whether it's a newly updated or cached value.The source generator is not honoring theFixed in Make json source generation cancellable #69332CancellationToken
passed by Roslyn.I'm not sure if 1) could be improved, however it definitely seems like 2) is a performance bug in our source generator implementation that we should try to fix. Not sure how it could be done without some refactoring of the
Parser
class.cc @layomia @ericstj @eerhardt
The text was updated successfully, but these errors were encountered: