Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption for string concatenation from O(n^2) to O(n) #37915

Merged
merged 2 commits into from Aug 15, 2019

Conversation

@gafter
Copy link
Member

commented Aug 12, 2019

Fixes #7398
Fixes #37572
Relates to aspnet/Razor#614 and many other customer accomodations that are no longer needed.

History of this bug: When constant folding a long sequence of string concatentations, there is an intermediate constant value for every left-hand operand. So the total memory consumed to compute the whole concatenation was O(n^2). The compiler would simply perform this work and eventually run out of memory, simply crashing with no useful diagnostic. Later, the concatenation implementation was instrumented so it would detect when it was likely to run out of memory soon, and would instead report a diagnostic at the last step. See f177077. Test Bug529600() was added to demonstrate that we produced a diagnostic. However, the compiler still consumed O(n^2) memory for the concatenation and this test used to consume so much memory that it would cause other tests running in parallel to fail because they might not have enough memory to succeed. So the test was disabled and eventually removed. The compiler would still crash with programs containing large string concatenations, or consume huge amounts of memory and take a long time before reporting a diagnostic, so the underlying problem had not been addressed.

Here we have revised the implementation of constant folding string concatenations so that it requires O(n) memory and remove the old instrumentation. As a consequence the test Bug529600() now runs very quickly and does not consume gobs of memory.

Reduce memory consumption for string concatenation from O(n^2) to O(n)
Fixes #7398
Fixes #37572
Relates to aspnet/Razor#614 and many other customer accomodations that are no longer needed.

History of this bug:  When constant folding a long sequence of string concatentations, there is
an intermediate constant value for every left-hand operand.  So the total memory consumed to
compute the whole concatenation was *O(n^2)*.  The compiler would simply perform this work and
eventually run out of memory, simply crashing with no useful diagnostic.  Later, the concatenation
implementation was instrumented so it would detect when it was likely to run out of memory soon,
and would instead report a diagnostic at the last step.
See f177077
Test `Bug529600()` was added to demonstrate that
we produced a diagnostic.  However, the compiler still consumed *O(n^2)* memory for the
concatenation and this test used to consume so much memory that it would cause other tests running
in parallel to fail because they might not have enough memory to succeed.  So the test was
disabled and eventually removed.  The compiler would still crash with some programs constaining large
string concatenations, so the underlying problem had not been addressed.

Here we have revised the implementation of constant folding string concatenations so that it
requires O(n) memory and remove the old instrumentation.  As a consequence the test `Bug529600()`
now runs very quickly and does not consume gobs of memory.

@gafter gafter added this to the 16.4 milestone Aug 12, 2019

@gafter gafter requested a review from dotnet/roslyn-compiler as a code owner Aug 12, 2019

@gafter gafter self-assigned this Aug 12, 2019

@gafter gafter added this to In Review in Compiler: Gafter Aug 12, 2019

{
if (chars0.Current != chars1.Current)
return false;
}

This comment has been minimized.

Copy link
@rynowak

rynowak Aug 12, 2019

Member

won't this loop bail early if one of the ropes is longer than the other? 'AAA' vs 'AA' #Closed

This comment has been minimized.

Copy link
@RikkiGibson

RikkiGibson Aug 12, 2019

Contributor

seems like we already ensured the two ropes have equal lengths by this point. #Closed

@RikkiGibson
Copy link
Contributor

left a comment

:shipit:

@333fred

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

// Copyright (c) Microsoft. All Rights Reserved. Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.

Consider enabling nullable for this file, since it's a totally new class. #Resolved


Refers to: src/Compilers/Core/Portable/Collections/Rope.cs:1 in a750642. [](commit_id = a750642, deletion_comment = False)

@333fred
Copy link
Member

left a comment

LGTM (commit 1)

{
int result = Length;
foreach (char c in GetChars())
result = Hash.Combine((int)c, result);

This comment has been minimized.

Copy link
@CyrusNajmabadi

CyrusNajmabadi Aug 12, 2019

Contributor

note: there are ways you can design ropes such that the sub parts contain their hashes, and they can be combined such that any equivalent ropes would have the same hash (i did this in my Rope implementation at Google :)).

If this is an operation called a lot, that may be worth doing. However, if the compiler has no need to hash these, then that work isn't necessary. #ByDesign

This comment has been minimized.

Copy link
@gafter

gafter Aug 14, 2019

Author Member

The compiler hashes these, but rarely. The only times it does is when a string appears as a case in a switch.


In reply to: 313170138 [](ancestors = 313170138)


return true;
}
public override int GetHashCode()

This comment has been minimized.

Copy link
@CyrusNajmabadi

CyrusNajmabadi Aug 12, 2019

Contributor
Suggested change
public override int GetHashCode()
public override int GetHashCode()
``` #Resolved

protected override IEnumerable<char> GetChars()
{
var stack = new Stack<Rope>();

This comment has been minimized.

Copy link
@CyrusNajmabadi

CyrusNajmabadi Aug 12, 2019

Contributor

if GetChars is called many times, consider pooling this stack. #ByDesign

This comment has been minimized.

Copy link
@gafter

gafter Aug 14, 2019

Author Member

It isn't.


In reply to: 313170423 [](ancestors = 313170423)

@@ -85,6 +85,14 @@ public override string StringValue
}
}

internal override Rope RopeValue

This comment has been minimized.

Copy link
@CyrusNajmabadi

CyrusNajmabadi Aug 12, 2019

Contributor
Suggested change
internal override Rope RopeValue
internal override Rope RopeValue => null;
``` #ByDesign

This comment has been minimized.

Copy link
@gafter

gafter Aug 14, 2019

Author Member

I would do that if all the surrounding ones used that style.


In reply to: 313170693 [](ancestors = 313170693)

public static readonly Rope Empty = ForString("");
public abstract override string ToString();
public abstract int Length { get; }
protected abstract IEnumerable<char> GetChars();

This comment has been minimized.

Copy link
@jaredpar

jaredpar Aug 13, 2019

Member

Did you consider using a Span<char> here given this type is meant to help with perf scenarios? #ByDesign

if (Length == 0)
return true;
var chars0 = GetChars().GetEnumerator();
var chars1 = other.GetChars().GetEnumerator();

This comment has been minimized.

Copy link
@jaredpar

jaredpar Aug 13, 2019

Member

Implementing GetChars in terms of Span<char> would avoid allocations in this method and let you use Span<T>.SequenceEquals which is very efficient on CoreClr #ByDesign

/// <summary>
/// A rope that wraps a simple string.
/// </summary>
private class StringRope : Rope

This comment has been minimized.

Copy link
@jaredpar

jaredpar Aug 13, 2019

Member

sealed #Resolved

/// <summary>
/// A rope that represents the concatenation of two ropes.
/// </summary>
private class ConcatRope : Rope

This comment has been minimized.

Copy link
@jaredpar

jaredpar Aug 13, 2019

Member

sealed #Resolved

return psb.ToStringAndFree();
}

protected override IEnumerable<char> GetChars()

This comment has been minimized.

Copy link
@jaredpar

jaredpar Aug 13, 2019

Member

Hmm, this method is going to make it hard to have a Span<char> method here :frown: #ByDesign

@gafter gafter merged commit 99cc381 into dotnet:release/dev16.4-preview1 Aug 15, 2019

20 checks passed

WIP Ready for review
Details
license/cla All CLA requirements met.
Details
roslyn-CI Build #20190814.4 succeeded
Details
roslyn-CI (Linux_Test coreclr) Linux_Test coreclr succeeded
Details
roslyn-CI (Linux_Test mono) Linux_Test mono succeeded
Details
roslyn-CI (MacOs_Test) MacOs_Test succeeded
Details
roslyn-CI (Windows_CoreClr_Unit_Tests debug) Windows_CoreClr_Unit_Tests debug succeeded
Details
roslyn-CI (Windows_CoreClr_Unit_Tests release) Windows_CoreClr_Unit_Tests release succeeded
Details
roslyn-CI (Windows_Correctness_Test) Windows_Correctness_Test succeeded
Details
roslyn-CI (Windows_Desktop_Spanish_Unit_Tests) Windows_Desktop_Spanish_Unit_Tests succeeded
Details
roslyn-CI (Windows_Desktop_Unit_Tests debug_32) Windows_Desktop_Unit_Tests debug_32 succeeded
Details
roslyn-CI (Windows_Desktop_Unit_Tests debug_64) Windows_Desktop_Unit_Tests debug_64 succeeded
Details
roslyn-CI (Windows_Desktop_Unit_Tests release_32) Windows_Desktop_Unit_Tests release_32 succeeded
Details
roslyn-CI (Windows_Desktop_Unit_Tests release_64) Windows_Desktop_Unit_Tests release_64 succeeded
Details
roslyn-CI (Windows_Determinism_Test) Windows_Determinism_Test succeeded
Details
roslyn-integration-CI Build #20190814.4 succeeded
Details
roslyn-integration-CI (VS_Integration debug_async) VS_Integration debug_async succeeded
Details
roslyn-integration-CI (VS_Integration debug_legacy) VS_Integration debug_legacy succeeded
Details
roslyn-integration-CI (VS_Integration release_async) VS_Integration release_async succeeded
Details
roslyn-integration-CI (VS_Integration release_legacy) VS_Integration release_legacy succeeded
Details
@gafter gafter referenced this pull request Aug 15, 2019

@jcouv jcouv modified the milestones: 16.4, 16.4.P1 Sep 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.