Skip to content

Escape U+2028/U+2029 in regex source generator XML doc comments#126242

Merged
danmoseley merged 1 commit intodotnet:mainfrom
stephentoub:fix/regex-xml-escape-line-separators
Mar 28, 2026
Merged

Escape U+2028/U+2029 in regex source generator XML doc comments#126242
danmoseley merged 1 commit intodotnet:mainfrom
stephentoub:fix/regex-xml-escape-line-separators

Conversation

@stephentoub
Copy link
Copy Markdown
Member

Note

This PR was generated with the assistance of GitHub Copilot.

U+2028 (Line Separator) and U+2029 (Paragraph Separator) are valid XML characters but are C# line terminators. When the regex source generator emits these literally into /// doc comments, the compiler sees a line break mid-comment, and the continuation isn't a /// line — causing XML parse errors and CS1519/CS1056 compilation failures in the generated code.

The fix excludes these two characters from the literal pass-through range in EscapeXmlComment so they are escaped as \u2028 and \u2029 text instead. All other C# line terminators (CR, LF, NEL) were already handled by falling outside the existing ranges.

Changes

  • RegexGenerator.Emitter.cs: Add not 0x2028 and not 0x2029 to the valid-character pass-through pattern in EscapeXmlComment.
  • Regex.Match.Tests.cs: Add test cases with literal \u2028, \u2029, and \uFFFE in regex patterns, exercised across all engines including the source generator.

U+2028 (Line Separator) and U+2029 (Paragraph Separator) are valid XML
characters but are C# line terminators. When emitted literally into ///
doc comments by the regex source generator, they break the comment
across lines, causing compilation errors in the generated code.

Exclude these two characters from the literal pass-through range in
EscapeXmlComment so they are escaped as \u2028 and \u2029 text instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub requested a review from danmoseley March 28, 2026 03:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a source-generator compilation hazard where patterns containing U+2028 (Line Separator) / U+2029 (Paragraph Separator) could be emitted verbatim into /// XML doc comments, inadvertently terminating the C# line and breaking the generated source.

Changes:

  • Update EscapeXmlComment to exclude U+2028 and U+2029 from the “pass-through” XML character ranges so they’re emitted as \u2028 / \u2029 text.
  • Add functional test cases with patterns containing U+2028, U+2029, and U+FFFE to ensure the source-generator engine can compile and execute these regexes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Ensures U+2028/U+2029 don’t get emitted as literal line terminators inside generated /// doc comments.
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Adds coverage to exercise patterns containing these characters across all regex engines, including the source generator.

@stephentoub
Copy link
Copy Markdown
Member Author

/ba-g known failures

@danmoseley danmoseley merged commit 28be713 into dotnet:main Mar 28, 2026
98 of 102 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants