Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generators should be able to add source files originating from a StringBuilder #49082

Open
jasonmalinowski opened this issue Oct 30, 2020 · 14 comments · May be fixed by #61251
Open

Generators should be able to add source files originating from a StringBuilder #49082

jasonmalinowski opened this issue Oct 30, 2020 · 14 comments · May be fixed by #61251

Comments

@jasonmalinowski
Copy link
Member

Right now the only way you add source is with a SourceText. Unfortunately if you want to use a StringBuilder to build up your generated file, and then want to pass that, your only out-of-the-box way to do it is to realize the StringBuilder to a string. That's pretty icky if the string is huge.

See where I worked around this: #47252 (comment)

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Oct 30, 2020
@jaredpar jaredpar removed the untriaged Issues and PRs which have not yet been triaged by a lead label Nov 2, 2020
@jaredpar jaredpar added this to the 16.9 milestone Nov 2, 2020
@jasonmalinowski jasonmalinowski added this to Compiler Backlog in Source Generators Nov 16, 2020
@AArnott
Copy link
Contributor

AArnott commented Nov 20, 2020

I don't want to use a StringBuilder either. I have a CompilationUnit that I'd like to just "write out" to the SourceText but can't figure out how to do it given the API.

@chsienki
Copy link
Contributor

@AArnott This mostly comes back to the parsing problem. Roslyn has some strong internal assumptions that anything it's working with is well formed (at least from the perspective that it was successfully parsed, it can of course contain error tokens etc).

In order to accept arbitrary syntax, we have to validate that somehow. The only real way to do that is to parse it, so even if you passed us the syntax, we'd just turn it into a string and re-parse it ourselves anyway.

I do wonder if we should accept syntax, so that under the hood we can 'stream' convert it to text and re-parse it, rather than making it the responsibility of the user to do so.

@jasonmalinowski
Copy link
Member Author

@chsienki That might be good at least because that way we can do it "efficiently" as opposed to letting the generator author do it less efficiently. 😄

@chsienki
Copy link
Contributor

@jasonmalinowski Yeah that's exactly my thinking.

@jasonmalinowski
Copy link
Member Author

I also wonder if the parsing overhead ever becomes expensive, then we can attempt to do some well-formedness validation. I know that's not easy to do (read: seriously I don't have a good idea either, good luck), but at least we'd be able to change it rather than suffer for the fact it was embedded in the generator itself.

@chsienki
Copy link
Contributor

@jasonmalinowski Yep, I think it basically gives us more flexibility under the hood to make changes in the future, while also potentially making it easier for users. Win-win.

I'll add it to the proposal backlog and schedule some time to get a proper proposal written and reviewed.

@AArnott
Copy link
Contributor

AArnott commented Nov 20, 2020

By far the very slowest part of my code generation is the (non-cancelable!) call to NormalizeWhitespace(), so if I could just pass my CompilationUnit to Roslyn and let Roslyn not only decide how/whether to re-parse it but also decide how to format it with whitespace so that it complies with .editorconfig or whatever, that would be great.

BTW, It's very likely that my syntax tree isn't exactly what the C# parser would produce, particularly around the tokens that trivia is attached to. So I have no problem with you re-parsing it. In my last comment I wasn't exactly asking for you to take the CompilationUnit itself but rather than you give me a way to efficiently write it out to a SourceText without allocating huge strings so that I can then send you the SourceText to parse from.

@YairHalberstadt
Copy link
Contributor

Implementation at #49689

@chsienki chsienki modified the milestones: 16.9, Compiler.Next Jan 14, 2021
@Neme12 Neme12 linked a pull request May 11, 2022 that will close this issue
@Neme12
Copy link
Contributor

Neme12 commented May 12, 2022

Alternative implementation at #61251
API proposal here: #61326

@Neme12
Copy link
Contributor

Neme12 commented May 24, 2022

@AArnott

In my last comment I wasn't exactly asking for you to take the CompilationUnit itself but rather than you give me a way to efficiently write it out to a SourceText without allocating huge strings so that I can then send you the SourceText to parse from.

The best way to get a SourceText from a CompilationUnit, or any SyntaxNode in general, is to use syntaxNode.GetText(Encoding.UTF8) - that way, no huge string should be allocated.

IMO, it would be a good idea to mention this somewhere in the docs for source generators, so that people do the right thing, as opposed to calling syntaxNode.ToString().

@AArnott
Copy link
Contributor

AArnott commented May 24, 2022

Wow. It's that easy? I had no idea I could just replace ToFullString() with GetText(Encoding.UTF8). Thank you!
I agree, this should be documented in the walkthroughs.

@Neme12
Copy link
Contributor

Neme12 commented May 25, 2022

@jasonmalinowski

I also wonder if the parsing overhead ever becomes expensive, then we can attempt to do some well-formedness validation. I know that's not easy to do (read: seriously I don't have a good idea either, good luck), but at least we'd be able to change it rather than suffer for the fact it was embedded in the generator itself.

I don't think this could be done. If an overload that takes a syntax was added today and it just reparsed the text, it couldn't be changed in the future to do some validation, because that would be a breaking change - if any validation was added later, it could throw on syntax that previously worked because it used to reparse the text. So it wouldn't give Roslyn any flexibility to be able to change it in the future.

Also, getting a SourceText from the syntax is already easy to do in a pretty good way, by using syntaxNode.GetText(Encoding.UTF8). This just needs to be documented so that source generator authors know that this is the recommended practice for getting the text from a syntax, in case they're using the syntax APIs to generate the source.

@canton7
Copy link
Contributor

canton7 commented May 25, 2022

That only helps if you're building a SyntaxTree in your SG. Microsoft guidance is (or at least was) to build a string, and that leaves you with a StringBuilder.

@Neme12
Copy link
Contributor

Neme12 commented May 25, 2022

That only helps if you're building a SyntaxTree in your SG. Microsoft guidance is (or at least was) to build a string, and that leaves you with a StringBuilder.

Yes, the approach I mentioned is for when you're building syntax, which a lot of people do because they find it useful. That was just a response to @AArnott. But you don't have to use syntax. And this issue for being able to pass in a StringBuilder is still open and unrelated to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment