Only format source generated file on demand #49685

YairHalberstadt · 2020-12-01T09:50:07Z

This is probably more of a VS issue than a roslyn one, but it will demand some coordination between the two so I'll post it here.

I maintain the StrongInject Source Generator.

When generating the code, I generate everything on a single line, without any superfluous whitespace.

I then call CSharpSyntaxTree.ParseText(SourceText.From(file, Encoding.UTF8)).GetRoot().NormalizeWhitespace().SyntaxTree.GetText(); to format the generated code.

Well over 2/3ds of the time in my Source Generator is spent just on this, even though StrongInject is a very computation intensive SG.

However in the majority of cases this is completely unnecessary, as the user will never see the code.

Instead it would be useful if when adding a file to a compilation via the SourceGenerator I could set an unformatted flag. Then when VS displays the file to the user it will format it for me. I imagine this would lead to extremely large savings in a lot of SG projects.

The text was updated successfully, but these errors were encountered:

jaredpar · 2020-12-01T13:06:20Z

This is an interesting question. Essentially how do we balance speed in source generators vs. displaying the text nicely in the IDE when customers do step through it. Think it's probably more on the IDE side though so moving there for now.

jinujoseph · 2020-12-01T19:08:23Z

cc @jasonmalinowski , @chsienki

jasonmalinowski · 2020-12-01T20:14:45Z

I do wonder what the expensive part is in that dance -- is the intermediate parse or the normalization? And if it's the normalization is there some low hanging fruit we can fix there?

@chsienki: as it stands @YairHalberstadt is having to do two text reparses: the initial parse to get a tree to pass to NormalizeWhitespace, and then there's a second parse of the tree he had to convert back to. If the "add text" API had a "please format it" option, then we could do a parse, and do NormalizeWhitespace, and presume internally that NormalizeWhitespace doesn't a valid tree into an invalid one. (If it does, that's our own bug.) Even if we can't reduce the cost of the normalization (or the need to do it entirely), at least we can reduce a parse here.

@jaredpar: the tricky bit here is debugging: if the compiled version is the unformatted code, then the spans embedded in the PDB won't be right.

@AArnott also reported this internally a few weeks ago.

sharwell · 2020-12-01T20:17:45Z

Do we have a feedback ticket with a performance trace to make sure NormalizeWhitespace isn't doing more work than necessary?

YairHalberstadt · 2020-12-01T20:19:00Z

It's almost all NormalizeWhitespace, although the other parts aren't cheap either.

Also doing the double parse creates a lot of garbage.

jasonmalinowski · 2020-12-01T20:19:01Z

@sharwell: @AArnott did share an ETL internally. I can fully imagine NormalizeWhitespace was never carefully optimized.

YairHalberstadt · 2020-12-02T04:42:28Z

If this is only formatted on demand by VS, it would also allow us to use the simplify APIs.

Currently I generate fully qualified code for everything (including global::), to remove any chance of a conflict. However users would much rather see the file with imports added and names simplified. That API is both expensive, and only available in the IDE layer, so not possible to do in a source generator.

CyrusNajmabadi · 2020-12-02T20:26:39Z

I'd rather we just fixup NormalizeWhitespace to be fast. I don't see why it would need to be slow.

CyrusNajmabadi · 2020-12-02T20:30:57Z

Do we have a feedback ticket with a performance trace to make sure NormalizeWhitespace isn't doing more work than necessary?

My understanding of NW (from conversations over hte years) is taht it's completely unoptimized. So every token is visited and every token is rewritten no matter what. Though, in yair's case, that just might what needs to happen as he's just emitting one long string.

sharwell · 2020-12-02T21:18:23Z

If someone submits a feedback ticket with a performance trace I'll be all over it 😄

sharwell · 2020-12-02T22:33:22Z

Jason sent me a trace. First thing I notice is #48568 will help this. I'll send a PR shortly with additional improvements.

sharwell · 2020-12-02T23:36:27Z

I submitted #49758 to implement a targeted improvement, and filed #49759 to address one of the more significant sources of performance overhead later. With both of those implemented, the next issue is finding a way to avoid calling GetNextToken so many times during the normalization process.

stefanloerwald · 2020-12-03T08:06:28Z

As much as I love performance improvements to NormalizeWhitespace: shouldn't we avoid computations in SGs as much as possible to keep their performance high? I would expect it to be a tiny fraction of generated source files that will ever be seen by a human, so operations that only do cosmetic changes to the generated code seem badly placed when they're happening essentially on every keystroke rather than just when the user actually wants to see the code.

AArnott · 2020-12-03T15:28:07Z

@stefanloerwald We should normalize anyway. Imagine the debugging experience and exception stack traces. When an exception is thrown and shows a line number from a generated source file, do you want it to always say it's on "line 1"? If I were the one investigating a failure, I would certainly want a meaningful line number shown there.
Beyond stack traces, the pdb will only match against your unformatted source file. So if the IDE were to format it just as the document opened, the debugger would reject the file. While that might be a solvable problem, it would involve several teams and be quite expensive. If we can make NormalizeWhitespace decently fast, it's a better use of resources and ends up with VS shipping more meaningful features.

stefanloerwald · 2020-12-03T15:38:29Z

Good points, @AArnott, thanks for elaborating. Although of course not everything has to be in just one line, even if you don't normalize whitespace ;-)

YairHalberstadt · 2020-12-03T15:41:58Z

Although of course not everything has to be in just one line, even if you don't normalize whitespace ;-)

Yep. The complexity of indenting my generated code is much higher than that of adding new lines.

sharwell · 2020-12-03T16:54:53Z

The overhead can be nearly eliminated by making NormalizeWhitespace operate one token "ahead" of where it does today (i.e. instead of calling GetNextToken it just reads a field storing the previous token), in combination with #49759. I'm not a fan of different content for different build scenarios because it breaks debugging and all editor features. If we were to optimize a fast-path here, it would be the creation of a skeleton document (member signatures present, but the implementations all throw null) so IntelliSense understands the available members quickly.

YairHalberstadt · 2021-01-22T10:07:05Z

@sharwell

The overhead can be nearly eliminated by making NormalizeWhitespace operate one token "ahead" of where it does today (i.e. instead of calling GetNextToken it just reads a field storing the previous token), in combination with #49759.

I'm currently giving this a go.

Currently NormalizeWhitespace uses GetNextToken() to calculate trailing trivia. Since CSharpSyntaxRewriter modifies the same token as it is currently visiting, this will require one of the following:

NormalizeWhitespace moves such trailingTrivia to LeadingTrivia of the next token, with some special casing for the final token.
We separately use InternalSyntax.GreenNode.EnumerateNodes to calculate the NextToken, without having to create a Red tree.
Something else? I'm not sure what exactly - do you have any ideas?

sharwell · 2021-01-22T17:00:39Z

@YairHalberstadt I'm not sure what the final form here would look like. I don't think either of the proposed changes will be particularly easy.

Eli-Black-Work · 2022-04-14T02:34:57Z

I'd like to follow up on this, since it's been a year 🙂

We have the same use case as @YairHalberstadt: We've written a source generator that outputs C# via string interpolation (e.g. $@"string {varName} = ""{stringValue}""), and we'd like to format the source before writing it.

InflexCZE · 2022-08-02T21:57:07Z

Same use case for us. Any improvements there would be greatly appreciated.
Regarding the possible optimizations of NormalizeWhitespace, there are few hot candidates:

GC -> 30%
SyntaxNormalizer.GetDeclarationDepth -> 10%
GreenNode.CreateList -> 5%
ChildSyntaxList.ItemInternal -> 3%

dotTrace sampling profile follows:

CyrusNajmabadi · 2022-08-02T22:00:33Z

This is currently on the backlog. Recommendation would be to write out the new added source using just a string-builder, or invest in PRs that can improve things here. SyntaxNormalizer is very heavyweight though and, imo, not suitable for use in generators.

Eli-Black-Work · 2022-11-17T06:22:31Z

@AArnott Do you think it'd be worth filing a separate issue with your suggestion from #49685 (comment) (code generated by source generators should automatically be normalized)?

I fear that excellent suggestion will be lost among the other comments in this issue 🙂

CyrusNajmabadi · 2022-11-17T06:40:19Z

automatic normalization absolutely cannot happen. It breaks semantics. Normalization doesn't preserve the original meaning of code, and thus must absolutely only happen in an opt-in fashion.

Eli-Black-Work · 2022-11-17T06:55:30Z

@CyrusNajmabadi Example? I'm having a hard time visualizing.

CyrusNajmabadi · 2022-11-17T07:42:02Z

Foo();

//...
void Foo([CallerLineNumber] int i = 0);

Move 'Foo' and you change the meaning of the code and what is emitted. There are tools today that use these sorts of locations to make decisions. Changing the code around can change that and violate expectations they have for how code will be laid out (e.g. putting multiple things on the same line that may not have been before).

SGs are not a quick-and-dirty system that tries to infer what the user wants. It's a plain API: give us the exact source you want us to compile and we will treat that exactly as is.

If you want normalization, then just normalize the code yourself. Then you are stated explicitly that that is fine for your domain. But it leaves the compiler out of the job of having to say that it will both manipulate your code, but also completely preserve its meaning.

AArnott · 2022-11-17T13:53:04Z

I think normalization will always be a perf hit and should therefore always be opt in. But I am curious when whitespace normalization would ever change the semantics of the code. @CyrusNajmabadi in your example, where would normalization move Foo such that semantics would change? Certainly rearranging code relative to each other can cause a problem, but how does adding/removing whitespace break it?
I'm excluding the case where whitespace is mandatory in order to avoid build breaks, since emitting a syntax tree with no trivia at all tends to not compile, yet adding whitespace tends to be what the developer intended.

CyrusNajmabadi · 2022-11-17T14:23:36Z

I consider a "break" to be any scenario where the compiler emits different IL or metadata. In that event what is in the final dll is not the same as what would be there if no normalization had happened.

The above example demonstrates one case where such a break would occur. Location of code is reflected in the IL we emit, and I know if at least two cases where SG authors use that information. E.g. they use the calling file/line info to look up data in a map that the SG has prepopulated. So if the code moves, that will no longer work.

AArnott · 2022-11-17T14:48:04Z

File/line info is in the PDB though. That's not IL or metadata. Are you only talking about moving source code invalidating where the SG expected to find the code references in the PDB? Or are you saying that adding whitespace changes IL or metadata in the dll itself?

CyrusNajmabadi · 2022-11-17T14:55:39Z

File/line info is in the PDB thoug

No. It's also in IL. See my example from above. This is a technique already being used.

Or are you saying that adding whitespace changes IL or metadata in the dll itself?

Yup. That's what I'm saying.

Eli-Black-Work · 2022-11-18T03:43:35Z

@CyrusNajmabadi Would you mind expounding on those two cases where SG authors use file/line info to look up data in a map that the SG has prepopulated? 🙂 I'm having a bit of trouble following.

Eli-Black-Work · 2023-02-06T06:00:35Z

Foo();

//...
void Foo([CallerLineNumber] int i = 0);
Move 'Foo' and you change the meaning of the code and what is emitted. ...

@CyrusNajmabadi I just gave this another read. Now I understand your argument about how automatically formatting SG output could change the generated IL 🙂

Dotnet-GitSync-Bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Dec 1, 2020

jaredpar added Area-IDE New Feature - Source Generators Source Generators and removed Area-Compilers labels Dec 1, 2020

jinujoseph added Concept-Continuous Improvement and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Dec 1, 2020

jinujoseph added this to the Backlog milestone Dec 1, 2020

sharwell mentioned this issue Dec 2, 2020

Avoid boxing in WithLeadingTrivia and WithTrailingTrivia #49758

Draft

YairHalberstadt mentioned this issue Jan 26, 2021

Improve SyntaxNormalizer performance #50778

Draft

lsoft mentioned this issue Mar 26, 2021

Source Generators: pass for intellisense output #52170

Closed

This was referenced Jun 2, 2022

Make NormalizeWhitespace cancellable #61561

Open

Improve NormalizeWhitespace for object initializers #61204

Closed

CyrusNajmabadi added Area-Compilers and removed Area-IDE labels Aug 2, 2022

Eli-Black-Work mentioned this issue Nov 16, 2022

Proposal: SyntaxNode.ToFullStringWithNormalizedWhitespaces API #52914

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only format source generated file on demand #49685

Only format source generated file on demand #49685

YairHalberstadt commented Dec 1, 2020

jaredpar commented Dec 1, 2020

jinujoseph commented Dec 1, 2020

jasonmalinowski commented Dec 1, 2020

sharwell commented Dec 1, 2020

YairHalberstadt commented Dec 1, 2020

jasonmalinowski commented Dec 1, 2020

YairHalberstadt commented Dec 2, 2020

CyrusNajmabadi commented Dec 2, 2020

CyrusNajmabadi commented Dec 2, 2020

sharwell commented Dec 2, 2020

sharwell commented Dec 2, 2020

sharwell commented Dec 2, 2020 •

edited

Loading

stefanloerwald commented Dec 3, 2020

AArnott commented Dec 3, 2020

stefanloerwald commented Dec 3, 2020

YairHalberstadt commented Dec 3, 2020

sharwell commented Dec 3, 2020 •

edited

Loading

YairHalberstadt commented Jan 22, 2021

sharwell commented Jan 22, 2021

Eli-Black-Work commented Apr 14, 2022

InflexCZE commented Aug 2, 2022

CyrusNajmabadi commented Aug 2, 2022

Eli-Black-Work commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022

Eli-Black-Work commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022

AArnott commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022

AArnott commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022 •

edited

Loading

Eli-Black-Work commented Nov 18, 2022

Eli-Black-Work commented Feb 6, 2023 •

edited

Loading

Only format source generated file on demand #49685

Only format source generated file on demand #49685

Comments

YairHalberstadt commented Dec 1, 2020

jaredpar commented Dec 1, 2020

jinujoseph commented Dec 1, 2020

jasonmalinowski commented Dec 1, 2020

sharwell commented Dec 1, 2020

YairHalberstadt commented Dec 1, 2020

jasonmalinowski commented Dec 1, 2020

YairHalberstadt commented Dec 2, 2020

CyrusNajmabadi commented Dec 2, 2020

CyrusNajmabadi commented Dec 2, 2020

sharwell commented Dec 2, 2020

sharwell commented Dec 2, 2020

sharwell commented Dec 2, 2020 • edited Loading

stefanloerwald commented Dec 3, 2020

AArnott commented Dec 3, 2020

stefanloerwald commented Dec 3, 2020

YairHalberstadt commented Dec 3, 2020

sharwell commented Dec 3, 2020 • edited Loading

YairHalberstadt commented Jan 22, 2021

sharwell commented Jan 22, 2021

Eli-Black-Work commented Apr 14, 2022

InflexCZE commented Aug 2, 2022

CyrusNajmabadi commented Aug 2, 2022

Eli-Black-Work commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022

Eli-Black-Work commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022

AArnott commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022

AArnott commented Nov 17, 2022

CyrusNajmabadi commented Nov 17, 2022 • edited Loading

Eli-Black-Work commented Nov 18, 2022

Eli-Black-Work commented Feb 6, 2023 • edited Loading

sharwell commented Dec 2, 2020 •

edited

Loading

sharwell commented Dec 3, 2020 •

edited

Loading

CyrusNajmabadi commented Nov 17, 2022 •

edited

Loading

Eli-Black-Work commented Feb 6, 2023 •

edited

Loading