Skip to content

Significantly reduce allocations in ExpandMetadataLeaveEscaped #12002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 30, 2025

Conversation

Erarndt
Copy link
Contributor

@Erarndt Erarndt commented Jun 11, 2025

Context

The current implementation of ExpandMetadataLeaveEscaped() heavily leverages Regex methods, and doing so allocates a considerable amount. I measure around 3.8GB of allocations in this path while building Roslyn. Many of these allocations come from a combination of internal Regex implementation and the existing patterns of consuming them. Take this code for example:

                    using SpanBasedStringBuilder finalResultBuilder = Strings.GetSpanBasedStringBuilder();

                    int start = 0;
                    MetadataMatchEvaluator matchEvaluator = new MetadataMatchEvaluator(metadata, options, elementLocation, loggingContext);

                    if (itemVectorExpressions != null)
                    {
                        // Move over the expression, skipping those that have been recognized as an item vector expression
                        // Anything other than an item vector expression we want to expand bare metadata in.
                        for (int n = 0; n < itemVectorExpressions.Count; n++)
                        {
                            string vectorExpression = itemVectorExpressions[n].Value;

                            // Extract the part of the expression that appears before the item vector expression
                            // e.g. the ABC in ABC@(foo->'%(FullPath)')
                            string subExpressionToReplaceIn = expression.Substring(start, itemVectorExpressions[n].Index - start);
                            string replacementResult = RegularExpressions.NonTransformItemMetadataRegex.Replace(subExpressionToReplaceIn, new MatchEvaluator(matchEvaluator.ExpandSingleMetadata));

                            // Append the metadata replacement
                            finalResultBuilder.Append(replacementResult);

                            // Expand any metadata that appears in the item vector expression's separator
                            if (itemVectorExpressions[n].Separator != null)
                            {
                                vectorExpression = RegularExpressions.NonTransformItemMetadataRegex.Replace(itemVectorExpressions[n].Value, new MatchEvaluator(matchEvaluator.ExpandSingleMetadata), -1, itemVectorExpressions[n].SeparatorStart);
                            }

                            // Append the item vector expression as is
                            // e.g. the @(foo->'%(FullPath)') in ABC@(foo->'%(FullPath)')
                            finalResultBuilder.Append(vectorExpression);

                            // Move onto the next part of the expression that isn't an item vector expression
                            start = (itemVectorExpressions[n].Index + itemVectorExpressions[n].Length);
                        }
                    }

Here there are several calls to Regex.Replace() that each return a new string that gets appended to the SpanBasedStringBuilder. If you dig into the implementation of Regex.Replace() you can start to see where the inefficiencies really add up

        stringBuilder = new StringBuilder();
        int num = 0;
        do
        {
            if (match.Index != num)
            {
                stringBuilder.Append(input, num, match.Index - num);
            }

            num = match.Index + match.Length;
            stringBuilder.Append(evaluator(match));
            if (--count == 0)
            {
                break;
            }

            match = match.NextMatch();
        }
        while (match.Success);
        if (num < input.Length)
        {
            stringBuilder.Append(input, num, input.Length - num);
        }

So, for each call to Regex.Replace() we're, we're creating a new StringBuilder and appending to it, causing it to expand it's internal data structures, as we evaluate the matches. Finally, we return the resulting string and add it to the span based builder. By modifying this implementation to directly use the SpanBasedStringBuilder, we can eliminate a considerable amount of duplication. We can do some additional cleanup in this code to avoid some additional allocations as well.

Before:

image

After:

image

Many types that were previously allocated are either entirely eliminated or significantly reduced. Before there are ~3.8GB of allocations here and only 2.7GB here after.

Changes Made

Testing

Notes

@Copilot Copilot AI review requested due to automatic review settings June 11, 2025 23:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors ExpandMetadataLeaveEscaped to eliminate intermediate Regex.Replace allocations by streaming matches directly into a SpanBasedStringBuilder. It also converts the metadata evaluator to a struct and adds new ReplaceAndAppend methods along with a span-based Equals override.

  • Introduce SpanBasedStringBuilder.Equals(ReadOnlySpan<char>) to compare builder contents without allocating.
  • Replace all Regex.Replace calls with RegularExpressions.ReplaceAndAppend to append directly into the span builder.
  • Convert MetadataMatchEvaluator from a class to a struct and refactor its ExpandSingleMetadata method.
  • Add two overloads of ReplaceAndAppend in RegularExpressions.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/StringTools/SpanBasedStringBuilder.cs Added Equals(ReadOnlySpan<char>) for zero-allocation comparisons.
src/Build/Evaluation/Expander.cs Replaced Regex.Replace invocations with ReplaceAndAppend, refactored evaluator to struct, and updated return logic.
src/Build/Evaluation/Expander.cs (RegularExpressions) Added ReplaceAndAppend overloads to stream matches into SpanBasedStringBuilder.
Comments suppressed due to low confidence (3)

src/StringTools/SpanBasedStringBuilder.cs:123

  • The new Equals(ReadOnlySpan) implementation needs unit tests covering empty spans, single-span vs. multi-span cases, and mismatched content to ensure correctness.
public bool Equals(ReadOnlySpan<char> other)

src/Build/Evaluation/Expander.cs:3194

  • Add unit tests for this ReplaceAndAppend overload, including no-match, single-match, RightToLeft, and count-limited scenarios to validate correct behavior.
public static void ReplaceAndAppend(string input, Func<Match, MetadataMatchEvaluator, string> evaluator, MetadataMatchEvaluator metadataMatchEvaluator, SpanBasedStringBuilder builder, Regex regex)

src/Build/Evaluation/Expander.cs:3199

  • [nitpick] The two overloads of ReplaceAndAppend use different parameter orders for the builder and regex. Standardizing the parameter sequence or encouraging named arguments could reduce confusion.
public static void ReplaceAndAppend(string input, Func<Match, MetadataMatchEvaluator, string> evaluator, MetadataMatchEvaluator metadataMatchEvaluator, int count, int startat, SpanBasedStringBuilder stringBuilder, Regex regex)

@SimaTian
Copy link
Member

There is #12009 targetting the same area. Are these complementary or is this one making the other redundant please?

@Erarndt
Copy link
Contributor Author

Erarndt commented Jun 24, 2025

There is #12009 targetting the same area. Are these complementary or is this one making the other redundant please?

Chatted offline. The changes are in related but separate areas.

@YuliiaKovalova YuliiaKovalova self-assigned this Jun 26, 2025
Copy link
Member

@SimaTian SimaTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it and I think it is safe enough.
The expander is well covered for a lot of edge cases, see for example Expander_Tests.cs -> Medley.
Previously when I was playing around in expander, it caught basically anything so if the tests are passing, we should be good to go.

@YuliiaKovalova
Copy link
Member

Overall looks good, ReplaceAndAppend is undocumented and slightly challenging to follow. Please adjust it

@SimaTian SimaTian merged commit 436b95c into dotnet:main Jun 30, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants