-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only format source generated file on demand #49685
Comments
This is an interesting question. Essentially how do we balance speed in source generators vs. displaying the text nicely in the IDE when customers do step through it. Think it's probably more on the IDE side though so moving there for now. |
I do wonder what the expensive part is in that dance -- is the intermediate parse or the normalization? And if it's the normalization is there some low hanging fruit we can fix there? @chsienki: as it stands @YairHalberstadt is having to do two text reparses: the initial parse to get a tree to pass to NormalizeWhitespace, and then there's a second parse of the tree he had to convert back to. If the "add text" API had a "please format it" option, then we could do a parse, and do NormalizeWhitespace, and presume internally that NormalizeWhitespace doesn't a valid tree into an invalid one. (If it does, that's our own bug.) Even if we can't reduce the cost of the normalization (or the need to do it entirely), at least we can reduce a parse here. @jaredpar: the tricky bit here is debugging: if the compiled version is the unformatted code, then the spans embedded in the PDB won't be right. @AArnott also reported this internally a few weeks ago. |
Do we have a feedback ticket with a performance trace to make sure NormalizeWhitespace isn't doing more work than necessary? |
It's almost all NormalizeWhitespace, although the other parts aren't cheap either. Also doing the double parse creates a lot of garbage. |
If this is only formatted on demand by VS, it would also allow us to use the simplify APIs. Currently I generate fully qualified code for everything (including |
I'd rather we just fixup NormalizeWhitespace to be fast. I don't see why it would need to be slow. |
My understanding of NW (from conversations over hte years) is taht it's completely unoptimized. So every token is visited and every token is rewritten no matter what. Though, in yair's case, that just might what needs to happen as he's just emitting one long string. |
If someone submits a feedback ticket with a performance trace I'll be all over it 😄 |
Jason sent me a trace. First thing I notice is #48568 will help this. I'll send a PR shortly with additional improvements. |
As much as I love performance improvements to |
@stefanloerwald We should normalize anyway. Imagine the debugging experience and exception stack traces. When an exception is thrown and shows a line number from a generated source file, do you want it to always say it's on "line 1"? If I were the one investigating a failure, I would certainly want a meaningful line number shown there. |
Good points, @AArnott, thanks for elaborating. Although of course not everything has to be in just one line, even if you don't normalize whitespace ;-) |
Yep. The complexity of indenting my generated code is much higher than that of adding new lines. |
The overhead can be nearly eliminated by making |
I'm currently giving this a go. Currently NormalizeWhitespace uses
|
@YairHalberstadt I'm not sure what the final form here would look like. I don't think either of the proposed changes will be particularly easy. |
I'd like to follow up on this, since it's been a year 🙂 We have the same use case as @YairHalberstadt: We've written a source generator that outputs C# via string interpolation (e.g. |
Same use case for us. Any improvements there would be greatly appreciated.
|
This is currently on the backlog. Recommendation would be to write out the new added source using just a string-builder, or invest in PRs that can improve things here. SyntaxNormalizer is very heavyweight though and, imo, not suitable for use in generators. |
@AArnott Do you think it'd be worth filing a separate issue with your suggestion from #49685 (comment) (code generated by source generators should automatically be normalized)? I fear that excellent suggestion will be lost among the other comments in this issue 🙂 |
automatic normalization absolutely cannot happen. It breaks semantics. Normalization doesn't preserve the original meaning of code, and thus must absolutely only happen in an opt-in fashion. |
@CyrusNajmabadi Example? I'm having a hard time visualizing. |
Foo();
//...
void Foo([CallerLineNumber] int i = 0); Move 'Foo' and you change the meaning of the code and what is emitted. There are tools today that use these sorts of locations to make decisions. Changing the code around can change that and violate expectations they have for how code will be laid out (e.g. putting multiple things on the same line that may not have been before). SGs are not a quick-and-dirty system that tries to infer what the user wants. It's a plain API: give us the exact source you want us to compile and we will treat that exactly as is. If you want normalization, then just normalize the code yourself. Then you are stated explicitly that that is fine for your domain. But it leaves the compiler out of the job of having to say that it will both manipulate your code, but also completely preserve its meaning. |
I think normalization will always be a perf hit and should therefore always be opt in. But I am curious when whitespace normalization would ever change the semantics of the code. @CyrusNajmabadi in your example, where would normalization move |
I consider a "break" to be any scenario where the compiler emits different IL or metadata. In that event what is in the final dll is not the same as what would be there if no normalization had happened. The above example demonstrates one case where such a break would occur. Location of code is reflected in the IL we emit, and I know if at least two cases where SG authors use that information. E.g. they use the calling file/line info to look up data in a map that the SG has prepopulated. So if the code moves, that will no longer work. |
File/line info is in the PDB though. That's not IL or metadata. Are you only talking about moving source code invalidating where the SG expected to find the code references in the PDB? Or are you saying that adding whitespace changes IL or metadata in the dll itself? |
No. It's also in IL. See my example from above. This is a technique already being used.
Yup. That's what I'm saying. |
@CyrusNajmabadi Would you mind expounding on those two cases where SG authors use file/line info to look up data in a map that the SG has prepopulated? 🙂 I'm having a bit of trouble following. |
@CyrusNajmabadi I just gave this another read. Now I understand your argument about how automatically formatting SG output could change the generated IL 🙂 |
This is probably more of a VS issue than a roslyn one, but it will demand some coordination between the two so I'll post it here.
I maintain the StrongInject Source Generator.
When generating the code, I generate everything on a single line, without any superfluous whitespace.
I then call
CSharpSyntaxTree.ParseText(SourceText.From(file, Encoding.UTF8)).GetRoot().NormalizeWhitespace().SyntaxTree.GetText();
to format the generated code.Well over 2/3ds of the time in my Source Generator is spent just on this, even though StrongInject is a very computation intensive SG.
However in the majority of cases this is completely unnecessary, as the user will never see the code.
Instead it would be useful if when adding a file to a compilation via the SourceGenerator I could set an
unformatted
flag. Then when VS displays the file to the user it will format it for me. I imagine this would lead to extremely large savings in a lot of SG projects.The text was updated successfully, but these errors were encountered: