Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul how regex generated code is structured #66432

Merged
merged 4 commits into from Mar 10, 2022

Conversation

stephentoub
Copy link
Member

Fixes #63502
Fixes #63512

For every regex, we currently emit a private type next to the partial member we're implementing, and that private type contains all the code for the regex, including all the helpers it needs. There are multiple issues with this:

  • We're emitting a lot of duplicated code into an assembly that has multiple/many uses of [RegexGenerator(...)]. System.Private.Xml has ~10 uses of it now and it has ~1000 lines of unnecessarily duplicative code being generated.
  • We have to global::-qualify all uses of types we rely on in the implementation, because C# binding rules would otherwise bind to things declared inside the user's partial type we're generating into. This is particularly bad for extension methods, which we then can't use as extensions, and we make heavy use of MemoryExtensions throughout the generated code for all of our interactions with spans.
  • We could avoid the code duplication by putting all the helpers into a shared utilities type, but then that type would need to be internal and would effectively be shipping surface area we don't want to ship.
  • Code inside that user's type can see the private type we're emitting, and EditorBrowsable doesn't currently help (EditorBrowsable is useless for source generators to hide stuff roslyn#60080).

With this PR, we instead move all of the types into a separate namespace and parent class, which then enables a) having usings inside the namespace declaration so that we can avoid fully-qualifying everything and b) separating out all of the helper methods into a class that's then private to the parent wrapper class, shielding it from the rest of the assembly.

Hopefully C# 11 will see a solution to dotnet/csharplang#5529, at which point we can use that to hide all of the separated out generated code other than the partial method definitions (with the changes in this PR, doing so should involve minimal tweaks to the code generation). Until then, we still have a similar problem as we do today for the generated Regex-derived types being visible to other code in the assembly, but at least the shared helpers aren't.

cc: @joperezr, @chsienki

@ghost
Copy link

ghost commented Mar 10, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #63502
Fixes #63512

For every regex, we currently emit a private type next to the partial member we're implementing, and that private type contains all the code for the regex, including all the helpers it needs. There are multiple issues with this:

  • We're emitting a lot of duplicated code into an assembly that has multiple/many uses of [RegexGenerator(...)]. System.Private.Xml has ~10 uses of it now and it has ~1000 lines of unnecessarily duplicative code being generated.
  • We have to global::-qualify all uses of types we rely on in the implementation, because C# binding rules would otherwise bind to things declared inside the user's partial type we're generating into. This is particularly bad for extension methods, which we then can't use as extensions, and we make heavy use of MemoryExtensions throughout the generated code for all of our interactions with spans.
  • We could avoid the code duplication by putting all the helpers into a shared utilities type, but then that type would need to be internal and would effectively be shipping surface area we don't want to ship.
  • Code inside that user's type can see the private type we're emitting, and EditorBrowsable doesn't currently help (EditorBrowsable is useless for source generators to hide stuff roslyn#60080).

With this PR, we instead move all of the types into a separate namespace and parent class, which then enables a) having usings inside the namespace declaration so that we can avoid fully-qualifying everything and b) separating out all of the helper methods into a class that's then private to the parent wrapper class, shielding it from the rest of the assembly.

Hopefully C# 11 will see a solution to dotnet/csharplang#5529, at which point we can use that to hide all of the separated out generated code other than the partial method definitions (with the changes in this PR, doing so should involve minimal tweaks to the code generation). Until then, we still have a similar problem as we do today for the generated Regex-derived types being visible to other code in the assembly, but at least the shared helpers aren't.

cc: @joperezr, @chsienki

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@ghost ghost assigned stephentoub Mar 10, 2022
// We emit usings here now that we're inside of a namespace block and are no longer emitting code into
// a user's partial type. We can now rely on binding rules mapping to these usings and don't need to
// use global-qualified names for the rest of the implementation.
writer.WriteLine($" using System;");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new pragma warnings in the headers though? I believe it is common for people to add stylecop rules that require using statements to be outside of a namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such rules by default shouldn't apply to generated code. But even if they did, I don't want us to add pragmas for warnings that aren't in the .NET SDK; otherwise we open the doors to an unending laundry list of arbitrary 3rd-party analyzers we'd need to start considering.

Note there is a hidden diagnostic issued for this by the C# compiler (CS8019), but pragmas appear to be ineffective at disabling it. @sharwell may know why.

Copy link
Member

@sharwell sharwell Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CS8019

This is a hidden diagnostic used for fading out unnecessary usings in the IDE. It is intentionally reported even in generated files, and will not impact builds. IDE0005 is the complementary rule that is only reported in non-generated files, and might be raised to warning/error in builds.

Do we need a new pragma warnings in the headers though?

As long as you use a hint name ending in .g.cs or have // <auto-generated/> as the first line of the file, it will be treated as generated code and code style analyzers will not run in the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

It is intentionally reported even in generated files

I was surprised that they were still issued even with:

#pragma warning disable CS8019

at the top of the file. Is that expected?

As long as you use a hint name ending in .g.cs or have // as the first line of the file

We do both.

private static readonly string s_generatedCodeAttribute = $"[global::System.CodeDom.Compiler.GeneratedCodeAttribute(\"{typeof(RegexGenerator).Assembly.GetName().Name}\", \"{typeof(RegexGenerator).Assembly.GetName().Version}\")]";
/// <summary>Header comments and usings to include at the top of every generated file.</summary>
private static readonly string[] s_headers = new string[]
/// <summary>Emits the definition of the partial method. This method just delegates to the property cache on the generated Regex-derived type.</summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smal NIT here, should we take the namespace and type declaration out of this method? My question is basically that if you have one partial type which has two source generated methods, it would be nice if the generated code only has one namespace and type declaration which contains both of the generated methods that fetch the instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable but will require some additional changes, and this was preexisting these changes. We can do it subsequently. Though it will add additional expense to figure out all the groupings... I'm not sure if it's really worth it to avoid the extra partial definitions.

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments but this looks great, thanks @stephentoub

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants