Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Get a Read-only view of a StringBuilder body and clear the StringBuilder. #97570

Open
jkoritzinsky opened this issue Jan 26, 2024 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime untriaged New issue has not been triaged by the area owner

Comments

@jkoritzinsky
Copy link
Member

Background and motivation

The Roslyn and runtime teams are looking at improving the experience of writing formatted text, primarily around source generators. An important feature for Roslyn is a mechanism to, without a copy, get a read-only view of any text that no one else can write to. The lack of an API to provide this has been blocking dotnet/roslyn#61326.

The Roslyn team has mentioned recently that they are okay with a non-optimal copy-based API for downlevel scenarios as long as there is a no-copy API on the horizon. This proposal provides such an API.

API Proposal

namespace System.Text;

public class StringBuilder
{
    public System.Buffers.ReadOnlySequence<char> ToReadOnlySequenceAndClear();
}

This API will create a read-only sequence over the chunks in the StringBuilder's content, reset the internal buffer of the builder to a new backing array, and remove references to previous chuncks.

This clearing behavior is in contrast to the behavior when Clear is called, where the internal array is not thrown away.

API Usage

ReadOnlySequence<char> GetReadOnlyView(StringBuilder builder)
{
    return builder.ToReadOnlySequenceAndClear();
}

Alternative Designs

Instead of enhancing StringBuilder as-is, we could make a type that wraps StringBuilder and provides a more fully-featured API set for the formatted text writing and provides this sort of API (built on top of StringBuilder.GetChunks() and the fact that it would own the StringBuilder instance).

Risks

No response

@jkoritzinsky jkoritzinsky added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 26, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 26, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 26, 2024
@jkoritzinsky jkoritzinsky added area-System.Runtime and removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 26, 2024
@ghost
Copy link

ghost commented Jan 26, 2024

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The Roslyn and runtime teams are looking at improving the experience of writing formatted text, primarily around source generators. An important feature for Roslyn is a mechanism to, without a copy, get a read-only view of any text that no one else can write to. The lack of an API to provide this has been blocking dotnet/roslyn#61326.

The Roslyn team has mentioned recently that they are okay with a non-optimal copy-based API for downlevel scenarios as long as there is a no-copy API on the horizon. This proposal provides such an API.

API Proposal

namespace System.Text;

public class StringBuilder
{
    public System.Buffers.ReadOnlySequence<char> ToReadOnlySequenceAndClear();
}

This API will create a read-only sequence over the chunks in the StringBuilder's content, reset the internal buffer of the builder to a new backing array, and remove references to previous chuncks.

This clearing behavior is in contrast to the behavior when Clear is called, where the internal array is not thrown away.

API Usage

ReadOnlySequence<char> GetReadOnlyView(StringBuilder builder)
{
    return builder.ToReadOnlySequenceAndClear();
}

Alternative Designs

Instead of enhancing StringBuilder as-is, we could make a type that wraps StringBuilder and provides a more fully-featured API set for the formatted text writing and provides this sort of API (built on top of StringBuilder.GetChunks() and the fact that it would own the StringBuilder instance).

Risks

No response

Author: jkoritzinsky
Assignees: -
Labels:

api-suggestion, area-System.Runtime

Milestone: -

@jkoritzinsky jkoritzinsky added the untriaged New issue has not been triaged by the area owner label Jan 26, 2024
@jkoritzinsky
Copy link
Member Author

cc: @CyrusNajmabadi @stephentoub

@stephentoub
Copy link
Member

Related to #87362

(As with that one, this would necessitate moving ReadOnlySequence into corelib.)

@MichalPetryka
Copy link
Contributor

I don't see how the Clear behaviour here makes sense, seems like you effectively want it to alloc a new StringBuilder instead.

@CyrusNajmabadi
Copy link
Member

I don't see how the Clear behaviour here makes sense, seems like you effectively want it to alloc a new StringBuilder instead.

We want to transfer ownership of hte underlying data. That's only safe if after the operation the SB itself doesn't point at the data (hence, it is "cleared").

@lambdageek
Copy link
Member

lambdageek commented Jan 26, 2024

I don't see how the Clear behaviour here makes sense, seems like you effectively want it to alloc a new StringBuilder instead.

We want to transfer ownership of hte underlying data. That's only safe if after the operation the SB itself doesn't point at the data (hence, it is "cleared").

I'm still unclear on the scenario. Either there's a single reference to the string builder instance - in which case it's ok to have 2 separate operations - one that gives a view of the underlying chunks and a separate clear operation (different from the existing Clear which mutates the internal array).

Or there are multiple references - in which case don't they already need some kind of coordination mechanism? Otherwise how could they possibly call Append twice and have any confidence that they're still writing to the same buffer.

It seems the "and clear" part can be separate.

@jkoritzinsky
Copy link
Member Author

I don't see how the Clear behaviour here makes sense, seems like you effectively want it to alloc a new StringBuilder instead.

We want to transfer ownership of hte underlying data. That's only safe if after the operation the SB itself doesn't point at the data (hence, it is "cleared").

I'm still unclear on the scenario. Either there's a single reference to the string builder instance - in which case it's ok to have 2 separate operations - one that gives a view of the underlying chunks and a separate clear operation (different from the existing Clear which mutates the internal array).

This is the expected scenario. In the Roslyn source generator scenario, the user would build up the source in the StringBuilder and then pass the resulting sequence between phases (so the StringBuilder would be single-use or a pooled instance, never saved in a field for later usage based on previous state).

Or there are multiple references - in which case don't they already need some kind of coordination mechanism? Otherwise how could they possibly call Append twice and have any confidence that they're still writing to the same buffer.

Today, there's not a good mechanism for coordination when using the GetChunks method while also clearing the buffer, which is what this API effectively enables.

It seems the "and clear" part can be separate.

We could just introduce a ResetBuffers() that would release all the internal buffers instead of what "Clear" does and then Roslyn could provide the "create a ReadOnlySequence and reset" mechanism on top (possibly internally).

@reflectronic
Copy link
Contributor

A similar API #72625 went from "ToImmutableAndClear" to "DrainToImmutable" in API review, perhaps this 'drain' naming pattern should be used here as well--DrainToReadOnlySequence?

@jkoritzinsky
Copy link
Member Author

I didn't realize there was precident for that naming in dotnet/runtime. I'd be fine with DrainToReadOnlySequence or the like.

@hamarb123
Copy link
Contributor

hamarb123 commented Apr 7, 2024

+1 for this, except I want an unsafe version that doesn't clear it to use.

namespace System.Text
{
    public class StringBuilder
    {
        public System.Buffers.ReadOnlySequence<char> DrainToReadOnlySequence(); //new
    }
}

namespace System.Runtime.InteropServices
{
    public class SequenceMarshal
    {
        public System.Buffers.ReadOnlySequence<char> FromStringBuilder(StringBuilder sb); //new
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

7 participants