-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Significantly reduce allocations in ExpandMetadataLeaveEscaped #12002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors ExpandMetadataLeaveEscaped
to eliminate intermediate Regex.Replace
allocations by streaming matches directly into a SpanBasedStringBuilder
. It also converts the metadata evaluator to a struct and adds new ReplaceAndAppend
methods along with a span-based Equals
override.
- Introduce
SpanBasedStringBuilder.Equals(ReadOnlySpan<char>)
to compare builder contents without allocating. - Replace all
Regex.Replace
calls withRegularExpressions.ReplaceAndAppend
to append directly into the span builder. - Convert
MetadataMatchEvaluator
from a class to a struct and refactor itsExpandSingleMetadata
method. - Add two overloads of
ReplaceAndAppend
inRegularExpressions
.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
src/StringTools/SpanBasedStringBuilder.cs | Added Equals(ReadOnlySpan<char>) for zero-allocation comparisons. |
src/Build/Evaluation/Expander.cs | Replaced Regex.Replace invocations with ReplaceAndAppend , refactored evaluator to struct, and updated return logic. |
src/Build/Evaluation/Expander.cs (RegularExpressions) | Added ReplaceAndAppend overloads to stream matches into SpanBasedStringBuilder . |
Comments suppressed due to low confidence (3)
src/StringTools/SpanBasedStringBuilder.cs:123
- The new Equals(ReadOnlySpan) implementation needs unit tests covering empty spans, single-span vs. multi-span cases, and mismatched content to ensure correctness.
public bool Equals(ReadOnlySpan<char> other)
src/Build/Evaluation/Expander.cs:3194
- Add unit tests for this ReplaceAndAppend overload, including no-match, single-match, RightToLeft, and count-limited scenarios to validate correct behavior.
public static void ReplaceAndAppend(string input, Func<Match, MetadataMatchEvaluator, string> evaluator, MetadataMatchEvaluator metadataMatchEvaluator, SpanBasedStringBuilder builder, Regex regex)
src/Build/Evaluation/Expander.cs:3199
- [nitpick] The two overloads of ReplaceAndAppend use different parameter orders for the builder and regex. Standardizing the parameter sequence or encouraging named arguments could reduce confusion.
public static void ReplaceAndAppend(string input, Func<Match, MetadataMatchEvaluator, string> evaluator, MetadataMatchEvaluator metadataMatchEvaluator, int count, int startat, SpanBasedStringBuilder stringBuilder, Regex regex)
There is #12009 targetting the same area. Are these complementary or is this one making the other redundant please? |
Chatted offline. The changes are in related but separate areas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it and I think it is safe enough.
The expander is well covered for a lot of edge cases, see for example Expander_Tests.cs -> Medley.
Previously when I was playing around in expander, it caught basically anything so if the tests are passing, we should be good to go.
Overall looks good, |
Context
The current implementation of
ExpandMetadataLeaveEscaped()
heavily leverages Regex methods, and doing so allocates a considerable amount. I measure around 3.8GB of allocations in this path while building Roslyn. Many of these allocations come from a combination of internal Regex implementation and the existing patterns of consuming them. Take this code for example:Here there are several calls to
Regex.Replace()
that each return a new string that gets appended to theSpanBasedStringBuilder
. If you dig into the implementation ofRegex.Replace()
you can start to see where the inefficiencies really add upSo, for each call to
Regex.Replace()
we're, we're creating a newStringBuilder
and appending to it, causing it to expand it's internal data structures, as we evaluate the matches. Finally, we return the resulting string and add it to the span based builder. By modifying this implementation to directly use theSpanBasedStringBuilder
, we can eliminate a considerable amount of duplication. We can do some additional cleanup in this code to avoid some additional allocations as well.Before:
After:
Many types that were previously allocated are either entirely eliminated or significantly reduced. Before there are ~3.8GB of allocations here and only 2.7GB here after.
Changes Made
Testing
Notes