System.Text.Json incremental source generator performance issues #68353

eiriktsarpalis · 2022-04-21T22:02:03Z

The System.Text.Json.SourceGeneration.Roslyn4.0.Tests project defines ~35 large JsonSerializerContext classes, resulting in approximately 3500 generated source files. The size of the project has had a detrimental effect on performance, both when building but especially when editing the project: on my primary dev machine the IDE essentially goes unresponsive when attempting to make any change in the project.

I ran a performance investigation and came up with the following findings:

There appear to be performance issues inherent in Roslyn ingesting such big numbers of generated source files. On my machine a build will max out all 20 cores of my CPU and consume 4-7 GB of memory. It can be isolated to making a large number of calls to the SourceProductionContext.AddSource method and is not related to the source generator itself.
Our incremental source generator implementation will regenerate every single source file on every keystroke. This is attributable to a combination of two factors:
- Our incremental values are combining the entire Compilation object. This is explicitly called out as an anti-pattern in the incremental source generation doc, since any change will force re-evaluation of the source generator.
- We are calling Collect() on the ClassDeclarationSyntax values provider, feeding every single JsonSerializerContext declaration to the source generator regardless of whether it's a newly updated or cached value.
~~The source generator is not honoring the CancellationToken passed by Roslyn.~~ Fixed in Make json source generation cancellable #69332

I'm not sure if 1) could be improved, however it definitely seems like 2) is a performance bug in our source generator implementation that we should try to fix. Not sure how it could be done without some refactoring of the Parser class.

cc @layomia @ericstj @eerhardt

The text was updated successfully, but these errors were encountered:

ghost · 2022-04-21T22:02:10Z

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

The System.Text.Json.SourceGeneration.Roslyn4.0.Tests project defines ~35 large JsonSerializerContext classes, resulting in approximately 3500 generated source files. The size of the project has had a detrimental effect on performance, both when building but especially when editing the project: on my primary dev machine the IDE essentially goes unresponsive when attempting to make any change in the project.

I ran a performance investigation and came up with the following findings:

There appear to be performance issues inherent in Roslyn ingesting such big numbers of generated source files. On my machine a build will max out all 20 cores of my CPU and consume 4-7 GB of memory. It can be isolated to making a large number of calls to the SourceProductionContext.AddSource method and is not related to the source generator itself.
Our incremental source generator implementation will regenerate every single source file on every keystroke. This is attributable to a combination of two factors:
- Our incremental values are combining the entire Compilation object. This is explicitly called out as an anti-pattern in the incremental source generation doc, since any change will force re-evaluation of the source generator.
- We are calling Collect() on the ClassDeclarationSyntax values provider, feeding every single JsonSerializerContext declaration to the source generator regardless of whether it's a newly updated or cached value.
The source generator is not honoring the CancellationToken passed by Roslyn.

I'm not sure if 1) could be improved, however it definitely seems like 2) is a performance bug in our source generator implementation that we should try to fix. Not sure how it could be done without some refactoring of the Parser class.

cc @layomia @ericstj @eerhardt

Author:	eiriktsarpalis
Assignees:	-
Labels:	`area-System.Text.Json`, `untriaged`
Milestone:	-

eiriktsarpalis · 2022-04-22T06:49:54Z

cc @chsienki

eiriktsarpalis · 2022-04-22T13:27:48Z

Here's a naive workaround that improves performance when editing the project in Visual Studio. It avoids regenerating sources for unchanged context classes by avoiding Collect() and invoking the source generator separately for each newly updated class. It also employs a custom equality comparer that ignores the Compilation component so that arbitrary changes do not trigger source generation.

As it stands though the workaround suffers from a couple of issues:

Invoking the source generator independently for every source class is suboptimal in batch compilation. I was looking for a variant of IncrementValuesProvider.Collect() that only surfaces the newly updated values but such functionality doesn't seem to exist.
The source generator will only trigger if the context classes themselves are modified, but not if any of their JsonSerializableAttribute dependencies are. A refinement of this might need to resolve the syntax nodes from the attribute symbols but I'm not sure if and how this might be possible.

ericstj · 2022-04-23T00:22:48Z

We clearly have a lot more to learn about building these. Do we have the right inputs and APIs available to make our source-regeneration incremental? I know @eiriktsarpalis and @sharwell looked at this before and made improvements.

One thing to be careful of -- which I noticed for our own SLN -- is that it seemed to be loading the "first" source generator for JSON that it hit which was sometimes the old non-incremental one. It wouldn't load the incremental version since it had the exact same name. That's an internal only problem since customers would see only the one appropriate for their compiler version.

layomia · 2022-04-28T17:57:15Z

[With this comment I'm assuming that generating less files might have less impact on memory and cores, even if the material content generated remains the same]

It can be isolated to making a large number of calls to the SourceProductionContext.AddSource method and is not related to the source generator itself.

I recall @eerhardt gave feedback about reducing the number of files we output. Today we generate individual files for the following:

the template context implementation including ctors, intialization of source-generated options, helpers to get converters
the JsonSerializerContext.GetTypeInfo implementation
JsonEncodedText-typed members for property names used for fast-path deserialization
unique files for every type specified on the context, including types nested in their object graphs (this can be a lot)

Since we've identified this as a perf bottleneck, we should consider combining these files (especially the first 3). Unique files for each type is nice for debugging source-generated code for a single type, so it might be helpful to define a file-generation strategy, say based on an msbuild property.

layomia · 2022-04-28T17:57:40Z

Issues 2 & 3 should be looked into and addressed.

One thing to be careful of -- which I noticed for our own SLN -- is that it seemed to be loading the "first" source generator for JSON that it hit which was sometimes the old non-incremental one. It wouldn't load the incremental version since it had the exact same name. That's an internal only problem since customers would see only the one appropriate for their compiler version.

Definitely worth looking into resolving this.

sharwell · 2022-04-28T17:59:21Z

It also employs a custom equality comparer that ignores the Compilation component

Never do this. This action instructs the command line compiler that it is allowed to produce inaccurate outputs in an actual .dll or .exe, and will allow a production machine to publish/sign/ship that invalid output.

sharwell · 2022-04-28T18:03:06Z

will regenerate every single source file on every keystroke

There were several Roslyn bugs that were causing this to occur when it shouldn't. Roslyn's design is to not regenerate sources on every keystroke regardless of the implementation of the source generator itself, and either 17.2 or 17.3 has been updated to better reflect this desired behavior.

eiriktsarpalis · 2022-04-29T10:03:00Z

One thing to be careful of -- which I noticed for our own SLN -- is that it seemed to be loading the "first" source generator for JSON that it hit which was sometimes the old non-incremental one. It wouldn't load the incremental version since it had the exact same name. That's an internal only problem since customers would see only the one appropriate for their compiler version.

Definitely worth looking into resolving this.

FWIW I deleted the Roslyn3.1 projects locally to ensure no interference and can confirm that the performance problems persist even with just the 4.0 projects present.

eiriktsarpalis · 2022-04-29T10:14:34Z

It also employs a custom equality comparer that ignores the Compilation component

Never do this. This action instructs the command line compiler that it is allowed to produce inaccurate outputs in an actual .dll or .exe, and will allow a production machine to publish/sign/ship that invalid output.

That's interesting, I would have assumed there were no incremental caching concerns when building from the command line. I can see how this can cause the source generator to fire less frequently than is desirable in interactive scenaria, but I don't think it should be capable of passing unsound inputs to the source generator.

eiriktsarpalis · 2022-04-29T11:20:30Z

will regenerate every single source file on every keystroke

There were several Roslyn bugs that were causing this to occur when it shouldn't. Roslyn's design is to not regenerate sources on every keystroke regardless of the implementation of the source generator itself, and either 17.2 or 17.3 has been updated to better reflect this desired behavior.

Hmmm, I tried attaching the Roslyn process from the latest VS 17.3 to the debugger and it would seem like every edit still triggers a new compilation.

sharwell · 2022-04-29T14:42:49Z

That's interesting, I would have assumed there were no incremental caching concerns when building from the command line.

It's also possible to enable the incremental cache for VBCSCompiler, with a long-term goal to improve compilation performance by reducing the number of cases where a source generator needs to run.

Hmmm, I tried attaching the Roslyn process from the latest VS 17.3 to the debugger and it would seem like every edit still triggers a new compilation.

Edits can trigger new compilations, but it's no longer tied to the inner typing loop and should operate on longer delays. Continued typing should automatically increase the delay.

Some (non-default) user settings can override this behavior and increase the frequency the generators run.

Under a debugger, it may be hard to tell the difference between the 17.1 and 17.3 behavior. However, if you watch the ETW events to see the total number of generator executions/updates, you should see a significant reduction in the numbers during active typing scenarios.

CyrusNajmabadi · 2022-05-13T20:05:32Z

This is definitely an issue with the latest versions of roslyn. Locally, i'm measuring the json generator using 75% of all the CPU in devenv and our external server. This impacts all features that need accurate semantics. For example, lightbulbs, diagnostics, etc. i've measured tens of seconds of delay introduced by this.

CyrusNajmabadi · 2022-05-16T22:46:08Z

#69332 Helps address:

The source generator is not honoring the CancellationToken passed by Roslyn.

eiriktsarpalis · 2022-08-04T11:56:31Z

Moving to 8.0.0 as we won't have time to address this in .NET 7

weifenluo · 2023-03-16T04:33:40Z

I'm developing a similar source generator as System.Text.Json, to generate a custom type descriptor for each type.

I'm a bit shocked with Roslyn "incremental" source generation, it needs significant change to make it real "incremental":

First of all, SourceProductionContext.AddSource, which is the final step of any source generator, is not incremental. Only values of IncrementalValueProvider<T> and IncrementalValuesProvider<T> during transformations can be cached, the transformations, however, are normally cheap operations;
Roslyn is implemented via red and green nodes: You, the user, only ever see the red tree; the green tree is an implementation detail. We need the green tree node to detect whether it has been changed. For example, we cannot cache INamedTypeSymbol, because it is created on demand and varies from every Compilation object. The green tree node must be exposed publicly to write incremental source generator.

So the viable solution would be using RegisterImplementationSourceOutput instead of RegisterSourceOutput, which declares that the source produced has no semantic impact on user code from the point of view of code analysis - we can workaround this by defining types into a separate project/assembly. I'm not sure STJ can go this approach though.

CyrusNajmabadi · 2023-03-16T14:51:46Z

First of all, SourceProductionContext.AddSource, which is the final step of any source generator, is not incremental

Correct. That's because a good incremental generator will rarely ever get to that step after the pipeline has run once. With most users edits/changes, the pipeline will stop early, so we can use the texts/trees from the prior generation as is.

We need the green tree node to detect whether it has been changed

As you said, it's an implementation detail. Exposing it would lock us into that impl detail forever.

For example, we cannot cache INamedTypeSymbol, because it is created on demand and varies from every Compilation object.

You should extract from the INamedType the data your generator actually needs, and use that subset of data as the value your IVP produces. That way, if that subset doesn't change, the pipeline will stop.

weifenluo · 2023-03-17T00:54:53Z

First of all, SourceProductionContext.AddSource, which is the final step of any source generator, is not incremental

Correct. That's because a good incremental generator will rarely ever get to that step after the pipeline has run once. With most users edits/changes, the pipeline will stop early, so we can use the texts/trees from the prior generation as is.

We need the green tree node to detect whether it has been changed

As you said, it's an implementation detail. Exposing it would lock us into that impl detail forever.

For example, we cannot cache INamedTypeSymbol, because it is created on demand and varies from every Compilation object.

You should extract from the INamedType the data your generator actually needs, and use that subset of data as the value your IVP produces. That way, if that subset doesn't change, the pipeline will stop.

So if I understand correctly, the pipeline can be short-circuited, it's not incremental.

Let's say we have 100 classes type1...type100 defined in the project, the first run will generate source code file 1...100, each depends on INamedTypeSymbol1...INamedTypeSymbol100, and public members of each INamedTypeSymbol.

If an edit in IDE triggers no change of generated 100 files at all, the pipeline can stop before actually generating the files. This is good improvement compares to V1 ISourceGenerator; however, if an edit in IDE triggers a change which will affect output source file5, all 100 files will be re-generated all together. IMO this is not incremental - an incremental source generator will only need to generate file5, leaving others untouched. I believe current pipeline does not support this, a real incremental source generator API cannot be that simple, that's why I was bit shocked when I first saw it.

Now let's get to pipeline short-circuit. Theoretically we can extract from the INamedTypeSymbol the data which generator actually needs, and use that subset of data as the value your IVP produces. Practically we're diffing almost the whole tree for almost every key stroke in IDE. The IDE compiler is true incremental, but also very complex, because it only needs to update a very small portion of green tree nodes for every change; however, the source generator has a different story, it receives every snapshot of Compilation object, and since we don't know where the edit happened, we have to diff the entire tree for every pipeline run. This will create a lot, if not all, red nodes, which is anti-pattern of Roslyn API, assuming only a very small portion of red nodes need to be created on demand.

CyrusNajmabadi · 2023-03-17T22:06:22Z

the pipeline can be short-circuited, it's not incremental.

Short circuiting is incremental. This is the same way that incremental parsing works. The portions to be redone are computed, and the system stops once it discovers that redoing work would produce the same results as before. At that point, the old results are carried forward.

CyrusNajmabadi · 2023-03-17T22:09:11Z

I believe current pipeline does not support this

It does support this. Just produce the same model values for nodes 1-4 and 6-100 and we won't do the following steps after that. We'll only recompute the source for model 5 and reuse the rest.

CyrusNajmabadi · 2023-03-17T22:11:44Z

we have to diff the entire tree for every pipeline run.

No, you won't have to do this. The syntaxtree instances themselves are identical. So, if they are your model value (or part of your model), they will be equal across edits outside of them. So no need to diff them at all. And, if you even did diff, you'd just do the pointer check.

You only need to diff the trees that are actually different (i.e. don't have reference equality).

CyrusNajmabadi · 2023-03-17T22:14:09Z

we have to diff the entire tree for every pipeline run. This will create a lot, if not all, red nodes, which is anti-pattern of Roslyn API, assuming only a very small portion of red nodes need to be created on demand.

You only need the red node tree for the current file. And that's not a problem because literally every feature (including the compiler) needs it. So it's going to be computed.

If you are that worried though, you can use IsIncrementallyEquivalent on the nodes as you walk the tree to avoid going into red nodes we can prove are identical.

CyrusNajmabadi · 2023-03-17T22:21:58Z

Note: if you'd like to see a demonstration of this in action, I recommend looking at the impl in roslyn of ForAttributeWithMetadataName and the simple name helper it calls. You'll see how it is able to just reanalyze only a particular tree when it changes to build intermediary state. And it then uses that state to determine what even to examine elsewhere in the entire project.

https://github.com/dotnet/roslyn/blob/e47b15b2485bee978bafb5c7dcce99d8bdc2f8fd/src/Compilers/Core/Portable/SourceGeneration/Nodes/SyntaxValueProvider_ForAttributeWithSimpleName.cs#L65

weifenluo · 2023-03-21T01:25:39Z

@CyrusNajmabadi

Thanks a lot for all your explanations! I dug a little further into Roslyn source code repo, you're right, the pipeline is incremental.

If you are that worried though, you can use IsIncrementallyEquivalent on the nodes as you walk the tree to avoid going into red nodes we can prove are identical.

I can't find IsIncrementallyEquivalent anywhere, neither API documentation nor search in the source code repo. I do worry about the excessive red node creation, because the scenario I posted is greatly simplified for demonstration purpose only.

CyrusNajmabadi · 2023-03-21T01:48:00Z

Sorry, it's called IsIncrementallyIdenticalTo. :)

CyrusNajmabadi · 2023-03-21T01:48:57Z

because the scenario I posted is greatly simplified for demonstration purpose only.

We'd need to speak practically about what you're trying to do to to be able to make any determinations about the issue.

dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Apr 21, 2022

layomia removed the untriaged New issue has not been triaged by the area owner label Apr 28, 2022

layomia added this to the 7.0.0 milestone Apr 28, 2022

eiriktsarpalis mentioned this issue Apr 29, 2022

System.Text.Json source generator crashes when processing unresolved symbols #68072

Closed

eiriktsarpalis added the tenet-performance Performance related issue label May 3, 2022

eiriktsarpalis mentioned this issue Jul 6, 2022

Polyfill the incremental generator ForAttributeWithMetadataName from roslyn (for JsonGenerator). #71653

Merged

eiriktsarpalis modified the milestones: 7.0.0, 8.0.0 Aug 4, 2022

This was referenced Aug 26, 2022

Add custom comparer to use incremental generator caching for LoggerMessageGenerator and JsonSourceGenerator #74558

Closed

LoggerMessageGenerator and JsonSourceGenerator not using incremental generator caching #74557

Closed

3schwartz mentioned this issue Aug 29, 2022

Fix missing incremental generator caching andrewlock/NetEscapades.EnumGenerators#38

Closed

layomia mentioned this issue Oct 13, 2022

System.Text.Json Source generator improvements for .NET 8.0 #77019

Closed

26 tasks

eiriktsarpalis mentioned this issue Dec 14, 2022

Add nullable reference type annotations to System.Text.Json source gen #79613

Merged

eiriktsarpalis mentioned this issue Feb 22, 2023

Create source generator for configuration binding #82179

Merged

eiriktsarpalis added the source-generator Indicates an issue with a source generator feature label Mar 9, 2023

eiriktsarpalis self-assigned this Mar 22, 2023

eiriktsarpalis mentioned this issue Apr 21, 2023

Remove common tests from the Roslyn3.11 test project. #85165

Merged

This was referenced May 11, 2023

Make the STJ source generator cache friendly #86121

Merged

Remove System.Reflection wrappers from the STJ source generator. #86334

Merged

This was referenced May 19, 2023

Apply the SourceWriter pattern to the STJ source generator. #86526

Merged

Avoid capturing diagnostic objects in SG incremental values #86605

Merged

Remove STJ dependency on IncrementalValuesProvider.Collect() #86616

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 23, 2023

eiriktsarpalis closed this as completed in #86616 May 23, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 23, 2023

Youssef1313 mentioned this issue Jun 4, 2023

Generator testing should assert cancellation token respect in a timely manner dotnet/roslyn-sdk#1098

Open

ghost locked as resolved and limited conversation to collaborators Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.Text.Json incremental source generator performance issues #68353

System.Text.Json incremental source generator performance issues #68353

eiriktsarpalis commented Apr 21, 2022 •

edited

Loading

ghost commented Apr 21, 2022

eiriktsarpalis commented Apr 22, 2022

eiriktsarpalis commented Apr 22, 2022 •

edited

Loading

ericstj commented Apr 23, 2022

layomia commented Apr 28, 2022

layomia commented Apr 28, 2022

sharwell commented Apr 28, 2022 •

edited

Loading

sharwell commented Apr 28, 2022

eiriktsarpalis commented Apr 29, 2022

eiriktsarpalis commented Apr 29, 2022

eiriktsarpalis commented Apr 29, 2022

sharwell commented Apr 29, 2022 •

edited

Loading

CyrusNajmabadi commented May 13, 2022

CyrusNajmabadi commented May 16, 2022

eiriktsarpalis commented Aug 4, 2022

weifenluo commented Mar 16, 2023

CyrusNajmabadi commented Mar 16, 2023

weifenluo commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023 •

edited

Loading

CyrusNajmabadi commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023 •

edited

Loading

weifenluo commented Mar 21, 2023

CyrusNajmabadi commented Mar 21, 2023

CyrusNajmabadi commented Mar 21, 2023

System.Text.Json incremental source generator performance issues #68353

System.Text.Json incremental source generator performance issues #68353

Comments

eiriktsarpalis commented Apr 21, 2022 • edited Loading

ghost commented Apr 21, 2022

eiriktsarpalis commented Apr 22, 2022

eiriktsarpalis commented Apr 22, 2022 • edited Loading

ericstj commented Apr 23, 2022

layomia commented Apr 28, 2022

layomia commented Apr 28, 2022

sharwell commented Apr 28, 2022 • edited Loading

sharwell commented Apr 28, 2022

eiriktsarpalis commented Apr 29, 2022

eiriktsarpalis commented Apr 29, 2022

eiriktsarpalis commented Apr 29, 2022

sharwell commented Apr 29, 2022 • edited Loading

CyrusNajmabadi commented May 13, 2022

CyrusNajmabadi commented May 16, 2022

eiriktsarpalis commented Aug 4, 2022

weifenluo commented Mar 16, 2023

CyrusNajmabadi commented Mar 16, 2023

weifenluo commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023 • edited Loading

CyrusNajmabadi commented Mar 17, 2023

CyrusNajmabadi commented Mar 17, 2023 • edited Loading

weifenluo commented Mar 21, 2023

CyrusNajmabadi commented Mar 21, 2023

CyrusNajmabadi commented Mar 21, 2023

eiriktsarpalis commented Apr 21, 2022 •

edited

Loading

eiriktsarpalis commented Apr 22, 2022 •

edited

Loading

sharwell commented Apr 28, 2022 •

edited

Loading

sharwell commented Apr 29, 2022 •

edited

Loading

CyrusNajmabadi commented Mar 17, 2023 •

edited

Loading

CyrusNajmabadi commented Mar 17, 2023 •

edited

Loading