Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: Add a ValueStringBuilder #25587

Open
JeremyKuhne opened this issue Mar 22, 2018 · 60 comments
Open

API Proposal: Add a ValueStringBuilder #25587

JeremyKuhne opened this issue Mar 22, 2018 · 60 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime blocked Issue/PR is blocked on something - see comments
Milestone

Comments

@JeremyKuhne
Copy link
Member

We should consider making a value based StringBuilder to allow low allocation building of strings. While Span allows you to provide a writable buffer, in many scenarios we have a need to get or build strings and we don't know precisely how much space will be needed ahead of time. Having an abstraction that can grow beyond a given initial buffer is particularly useful as it doesn't require looping with Try* APIs- which can be both complicated and have negative performance implications.

We currently use ValueStringBuilder for this purpose internally. It starts with an optional initial buffer (which we often stackalloc) and will grow using ArrayPool if needed.

Design Goals

  1. Allow safe usage of stack memory
  2. Use pooled memory when needed to reduce GC pressure
  3. Allow dynamic and explicit capacity growth
  4. Facilitate interop scenarios (i.e. passing as char* szValue)
  5. Follow API semantics of StringBuilder & string where possible
  6. Be stack allocated

API

Here is the proposed API:

namespace System.Text
{
    public ref struct ValueStringBuilder
    {
        public ValueStringBuilder(Span<char> initialBuffer);

        // The logical length of the builder (end of the "string")
        public int Length { get; set; }

        // Available space in chars
        public int Capacity { get; }

        // Ensure there is at least this amount of space
        public void EnsureCapacity(int capacity);

        // Get a pinnable reference to the builder. "terminate" ensures the builder has a null char after Length in the buffer.
        public ref char GetPinnableReference(bool terminate = false);

        // Indexer, allows setting/getting individual chars
        public ref char this[int index] { get; }

        // Returns a string based off of the current position
        public override string ToString();

        // Returns a span around the contents of the builder. "terminate" ensures the builder has a null char after Length in the buffer.
        public ReadOnlySpan<char> AsSpan(bool terminate);

        // To ensure inlining perf, we have a separate overload for terminate
        public ReadOnlySpan<char> AsSpan();

        public bool TryCopyTo(Span<char> destination, out int charsWritten);

        public void Insert(int index, char value, int count = 1);
        public void Insert(int index, ReadOnlySpan<char> value, int count = 1);

        public void Append(char c, int count = 1);
        public void Append(ReadOnlySpan<char> value);

        // This gives you an appended span that you can write to
        public Span<char> AppendSpan(int length);

        // Returns any ArrayPool buffer that may have been rented
        public void Dispose()
    }
}

This is the current shape of our internal ValueStringBuilder:

namespace System.Text
{
    internal ref struct ValueStringBuilder
    {
        public ValueStringBuilder(Span<char> initialBuffer);
        public int Length { get; set; }
        public int Capacity { get; }
        public void EnsureCapacity(int capacity);

        /// <summary>
        /// Get a pinnable reference to the builder.
        /// </summary>
        /// <param name="terminate">Ensures that the builder has a null char after <see cref="Length"/></param>
        public ref char GetPinnableReference(bool terminate = false);
        public ref char this[int index] { get; }

        // Returns a string based off of the current position
        public override string ToString();

        /// <summary>
        /// Returns a span around the contents of the builder.
        /// </summary>
        /// <param name="terminate">Ensures that the builder has a null char after <see cref="Length"/></param>
        public ReadOnlySpan<char> AsSpan(bool terminate);

        // To ensure inlining perf, we have a separate overload for terminate
        public ReadOnlySpan<char> AsSpan();

        public bool TryCopyTo(Span<char> destination, out int charsWritten);
        public void Insert(int index, char value, int count);
        public void Append(char c);
        public void Append(string s);
        public void Append(char c, int count);
        public unsafe void Append(char* value, int length);
        public void Append(ReadOnlySpan<char> value);

        // This gives you an appended span that you can write to
        public Span<char> AppendSpan(int length);

        // Returns any ArrayPool buffer that may have been rented
        public void Dispose()
    }
}

Sample Code

Here is a common pattern on an API that could theoretically be made public if ValueStringBuilder was public:
(Although we would call this one GetFullUserName or something like that.)

https://github.com/dotnet/corefx/blob/050bc33738887d9d8fcc9bc5965b7d9ca65bc7f4/src/System.Runtime.Extensions/src/System/Environment.Win32.cs#L40-L56

The caller is above this method:

https://github.com/dotnet/corefx/blob/050bc33738887d9d8fcc9bc5965b7d9ca65bc7f4/src/System.Runtime.Extensions/src/System/Environment.Win32.cs#L13-L38

Usage of AppendSpan:

https://github.com/dotnet/corefx/blob/3538128fa1fb2b77a81026934d61cd370a0fd7f5/src/System.Runtime.Numerics/src/System/Numerics/BigNumber.cs#L550-L560

I'll add more usage details and possible API surface area.

Notes

@JeremyKuhne
Copy link
Member Author

@jkotas, @stephentoub, @KrzysztofCwalina, @danmosemsft, @ahsonkhan

@stephentoub
Copy link
Member

bool terminate

I didn't see that we'd added these arguments. I don't think we should have them. If you want to null terminate, you can call Append('\0').

@JeremyKuhne
Copy link
Member Author

I didn't see that we'd added these arguments. I don't think we should have them. If you want to null terminate, you can call Append('\0').

Discussed this one at length with @jkotas in dotnet/coreclr#16921. The problem with appending a null is that it changes the Length and breaks the convention on what Length means for strings & StringBuilder (i.e. doesn't include the terminator). It is incredibly error prone as a result (as my need for the mentioned pull demonstrates).

StringBuilder always terminates, but doing this comes at a cost. It is only really needed in interop calls that take a sz (most Win32).

@stephentoub
Copy link
Member

The problem with appending a null is that it changes the Length and breaks the convention on what Length means for strings & StringBuilder (i.e. doesn't include the terminator).

The bool terminate is also changing the Length.

@JeremyKuhne
Copy link
Member Author

The bool terminate is also changing the Length.

No, it leaves Length as is. It might increase Capacity.

@stephentoub
Copy link
Member

Hmm, I see, ok.

@omariom
Copy link
Contributor

omariom commented Mar 22, 2018

Are they needed if there is an overload taking ROSpan?

public void Append(string s);
public unsafe void Append(char* value, int length);

@JeremyKuhne
Copy link
Member Author

Are they needed if there is an overload taking ROSpan?

No, they aren't. I'll add a proposed section.

@omariom
Copy link
Contributor

omariom commented Mar 23, 2018

Can we get C# to allow using ref structs that have a Dispose() in a using statement?

There is a proposal dotnet/csharplang#93
It would be even better with this one dotnet/csharplang#114

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Mar 23, 2018

There are several related efforts in corfxlab. We should try to unify (or make them consistent). See:
https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.ReaderWriter/System/Buffers/Writer/BufferWriter.cs
https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.ReaderWriter/System/Buffers/Writer/BufferWriterT_ints.cs
https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.ReaderWriter/System/Buffers/Writer/BufferWriter_strings.cs
And here support for non-allocating composite formats: https://github.com/dotnet/corefxlab/blob/master/src/System.Text.Formatting/System/Text/Formatting/CompositeFormat.cs

This ValueStringBuilder is a specialized BufferWriter. And so if we add it, we need to rationalize it with BufferWriter, i.e. we should not have both BufferWriter and ValueStringBuilder in .NET.

BufferWriter (the non-generic one) is a byref struct that allows writing (and resizing) to a span buffer. Today, it writes Utf8, but you could imagine writing either Utf8 or utf16 based on some setting. After the buffer is written, tere could be ToString or something like that to produce string result.

@JeremyKuhne
Copy link
Member Author

@KrzysztofCwalina thanks for the links! I've updated the text with design goals and will review the other efforts.

@MarcoRossignoli
Copy link
Member

MarcoRossignoli commented Mar 24, 2018

Should we allow you to make it non-growable? (So we can avoid TryGet above?)

// some reason it isn't, we'll grow the buffer.

In case of growable could be useful to have a boolean properties to know if VSB has grown? In a hot path could be useful avoid to add pressure to underlying ArrayPool.Shared(today) and improve performance by choosing the right size of initial buffer.

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 25, 2018

Some questions, I guess...

public void Insert(int index, char value, int count = 1);
public void Insert(int index, ReadOnlySpan<char> value, int count = 1);
public void Append(char c, int count = 1);
public void Append(ReadOnlySpan<char> value);
  • Are these the only methods to build with, or will there be more parity with the existing StringBuilder API (AppendLine, string parameter, et al.) and those are omitted from the post for brevity?
  • Is there a reason these don't return the this instance to enable fluent building?

@khellang
Copy link
Member

Speaking of related efforts... There's also the InplaceStringBuilder from Microsoft.Extensions.Primitives, which is used in MVC and the HTTP abstractions. Might want to sync up with them as well.

// @pakrym @Tratcher

@JeremyKuhne
Copy link
Member Author

In case of growable could be useful to have a boolean properties to know if VSB has grown?

Perhaps we should add an EventSource log for this?

@JeremyKuhne
Copy link
Member Author

@Joe4evr

Are these the only methods to build with, or will there be more parity with the existing StringBuilder API (AppendLine, string parameter, et al.) and those are omitted from the post for brevity?

These were the most critical/obvious (based on current usage). I didn't want to start the discussion with too much complexity. Attacking this in priority order seemed like it would go a little quicker.

Is there a reason these don't return the this instance to enable fluent building?

Hadn't considered it is all. I'll take a look.

@jkotas
Copy link
Member

jkotas commented Mar 25, 2018

fluent building

Fluent pattern adds overhead, and it does not work for structs that needs to be disposed.

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 25, 2018

Fluent pattern adds overhead, and it does not work for structs that needs to be disposed.

Does it, now? Surely, it's less overhead than display class allocation or something.

@ghost
Copy link

ghost commented Mar 25, 2018

Combining two of @khellang's comment https://github.com/dotnet/corefx/issues/20378#issuecomment-304400894 & https://github.com/dotnet/corefx/issues/28379#issuecomment-375975366, it would be super helpful for almost all the .NET developers if we get a comprehensive blog article or Channel 9 video on all these newly optimized string related types in .NET Core and ASP.NET Core detailing when to use which and why. During that exercise if some overlapping types are found, please consider deprecating one. With every upcoming string-related type, it is getting scarier. 😱

@khellang
Copy link
Member

khellang commented Mar 25, 2018

it would be super helpful [...] if we get a comprehensive blog article or Channel 9 video on all these newly optimized string related types [...] detailing when to use which and why.

I think the general guidance (once Span<T> and friends officially hit the streets) is to use Span<T> (or ReadOnlySpan<T>, Memory<T>, ReadOnlyMemory<T> etc.) whenever you can.

StringSegment, ArraySegment<T> and various other related types (some of which has been around for some time) are all efforts to achieve some of the same benefits as the System.Memory types. At this point, I think you could argue they should all be viewed as "obsolete" or "redundant".

@jnm2
Copy link
Contributor

jnm2 commented Mar 26, 2018

StringSegment is obsolete? https://github.com/dotnet/corefx/issues/20378

@khellang
Copy link
Member

khellang commented Mar 26, 2018

No, but with the other new APIs (on ReadOnlySpan<char>), it's arguable how much value yet another slice-of-string type would bring vs the overhead of the type existing at all. So I would say "redundant" 😉

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 26, 2018

😕 The API proposal itself clearly states:

but spans have different type system constraints that not all consumers can -- or want to -- opt into.

@khellang
Copy link
Member

khellang commented Mar 26, 2018

I'm assuming @terrajobst was referring to the stack-only-ness of ReadOnlySpan<T>. That doesn't apply to ReadOnlyMemory<T>.

There might be valid reasons for yet another slice-of-string type, but as I mentioned in https://github.com/dotnet/corefx/issues/20378#issuecomment-304400894, the number of these types are getting a bit out of hand and it's hard to know which to use and which to support in libraries.

@JeremyKuhne
Copy link
Member Author

The key thing we're missing that this addresses is a low allocation expandable Span<T> provider. We could go super simple with something like ref SpanProvider<T> cutting all string-like methods and perhaps just providing the functionality as extensions. I'm not fundamentally opposed to going that route, but the fact is that we do provide StringBuilder. It is a concept that people are familiar with and we need a mechanism to replace it for performance sensitive code paths. It seems less burdensome to me to juggle StringBuilder/ValueStringBuilder than StringBuilder/SpanProvider<char>, particularly as we're already moving down the road of having Value* value type equivalents of other reference types (e.g. it is an emergent pattern).

@fanoI
Copy link

fanoI commented Mar 28, 2018

I have a question regarding the Dispose() of this structure it should be always called?
If so the posted examples are not risking to leaking memory?

@JeremyKuhne
Copy link
Member Author

I have a question regarding the Dispose() of this structure it should be always called?

It returns any potentially rented buffers to the array pool. ToString() internally also returns the buffer, I'll call that out. On a related note, we don't usually use these with try ... finally blocks as in many cases we don't expect throws to be common enough to warrant taking the performance hit of introducing the block- we're ok with not returning the buffer to the pool in those cases (which will get it garbage collected).

@JeremyKuhne
Copy link
Member Author

Note that I'm exploring alternative options utilizing some of the concepts @KrzysztofCwalina brought up (with @KrzysztofCwalina). We'll circle back around with more info.

@ViktorHofer
Copy link
Member

@JeremyKuhne have you made progress on exploring new options? I think we should definitely get this into 3.0.

@JeremyKuhne
Copy link
Member Author

have you made progress on exploring new options?

dotnet/corefxlab#2177 is part of what might negate the need for this. I currently own productizing BufferReader/Writer (dotnet/corefxlab#2166) and have just started in earnest.

I'll be experimenting with how I can provide the same functionality and taking a look at any tradeoffs. ValueStringBuilder is the backup plan if I can't provide a reasonable solution with the other new types.

I think we should definitely get this into 3.0.

I intend to get this or an equivalent in. I really want it as well as not having this sort of type is blocking adding other APIs that I'm very interested in seeing. :)

@Entomy
Copy link

Entomy commented Feb 15, 2020

A few months back I implemented basically this, thinking it was a good idea as well. Exact implementation will cause performance to vary, so take it with a grain of salt, but I was found was roughly:

For allocations of 5 chars long, less than 12 allocations was faster to use the traditional StringBuilder (tSB). For allocations less than 5 chars long, it was almost always faster to use the tSB. For allocations more than 10 chars long, it was almost always faster to use the tSB. For allocations of 5 chars long, only between 12 to 41 allocations were faster with the value/ref StringBuilder (vSB). For allocations of 8 chars long, only between 18 to 22 allocations were faster for the vSB.

Hopefully you're getting the picture without me needing to clone the old commit and recreate the 3D graphs. The amount of situations where the vSB was actually higher performing were extremely limited, and not clearly or obviously defined.

Furthmore, in every single code base I have where this might have been useful, I worked out ways to just use an array/buffer, where even additional computations or value passing still outperformed the vSB in every single instance.

So, it might be useful, but as a heads up to whoever implements this, you're gonna need to benchmark the ever living **** out of it, and document the small hole where it's actually useful.

@danmoseley
Copy link
Member

@Entomy by faster, presumably you mean allocation. There is also the garbage to consider.

@Entomy
Copy link

Entomy commented Feb 18, 2020

@danmosemsft Right, absolutely. That's basically the point I'm trying to make, is that this is complicated and it seems like there's only a very small, and unusual, range in which this is viable.

@stephentoub
Copy link
Member

stephentoub commented Feb 18, 2020

there's only a very small, and unusual

Reduction in allocation often has little-to-no measurable impact on serialized benchmark throughput. The benefit of allocation reduction when it comes to throughput is generally about the whole app running at scale.

@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020
@joperezr joperezr modified the milestones: 5.0.0, Future Jul 6, 2020
@danmoseley
Copy link
Member

This is marked blocked because it potentially requires language features. We need a way to enforce that structs are passed only by reference so that we don't double return arrays to the pool.

@kamronbatman
Copy link
Contributor

Anything slated for language changes to allow this? I think forcing ref structs makes sense as a new language construct.

@GazziFX
Copy link

GazziFX commented Mar 17, 2021

Will it come out in 6.0.0?

@kamronbatman
Copy link
Contributor

For now I exposed my own by essentially shamelessly copying it:
https://github.com/modernuo/ModernUO/blob/main/Projects/Server/Buffers/ValueStringBuilder.cs

Also you can use https://github.com/Cysharp/ZString which has a great StringBuilder-like API.

@danmoseley
Copy link
Member

This is essentially blocked on resolving discussion in #50389.

@SupinePandora43
Copy link

SupinePandora43 commented Jul 11, 2022

Since roslyn adds support for utf8 strings handled as ReadOnlySpan<byte>, i think it will be better to have something like ValueSpanBuilder<T> instead.

@CodingMadness
Copy link

Has been anything further done in this regard, i ask out of curiosity

@lsoft
Copy link

lsoft commented Jan 29, 2023

ValueStringBuilder is in runtime now, as I can see: https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/System/Text/ValueStringBuilder.cs

but not public for some reason. what is that reason? public ValueStringBuilder is the key to provide, for example, global::System.Net.WebUtility.HtmlDecode(ReadOnlySpan<char>) into the public space.

@stephentoub
Copy link
Member

but not public for some reason. what is that reason?

#25587 (comment)

@ericstj
Copy link
Member

ericstj commented Apr 3, 2024

Just saw another copy of this type pop up, and see that we currently have 20+ copies in the product. https://source.dot.net/#q=ValueStringBuilder

Is there any progress on the language feature that would make us comfortable exposing this type?

@stephentoub
Copy link
Member

and see that we currently have 20+ copies in the product

To clarify (I'm not sure if this is what you meant or not), we have that many in binaries, not that many source copies, with the same source built as internal into many places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime blocked Issue/PR is blocked on something - see comments
Projects
None yet
Development

No branches or pull requests