Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a source generator for Stream that emits the boilerplate for most members and enforces using the most-efficient members #79839

Open
2 tasks
jozkee opened this issue Dec 20, 2022 · 16 comments
Assignees
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@jozkee
Copy link
Member

jozkee commented Dec 20, 2022

Background and motivation

Based on #79261 (comment), there's the possibility of using a source generator to guide developers to implement Stream correctly i.e: use span-based and task-based APIs instead of byte-array overloads and APM (Begin*/End*).
We can emit boilerplate for virtuals, non-span-based, and non-task-based methods; and then emit partials to enforce using the newest members in Stream implementations in projects that install the source generator proposed here.

A working prototype can be found here: https://github.com/Jozkee/StreamSourceGen.

API Proposal

namespace System.IO
{
    /// <summary>
    /// Instructs the System.IO.Stream source generator to generate boilerplate implementations for multiple legacy members, leaving the implementer only with the task of providing implementations for the core operations.
    /// </summary>
    [AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)]
    public sealed class GenerateStreamBoilerplateAttribute : Attribute
    {
        public bool CanRead { get; set; } = true;
        public bool CanSeek { get; set; } = true;
        public bool CanWrite { get; set; } = true;
        public bool CanTimeout { get; set; } = false;
    }
}

API Usage

This new source generator most likely will ship as an OOB NuGet package. Once installed, you opt-in into boilerplate generation by annotating any partial Stream implementation with [GenerateStreamBoilerplate].

[GenerateStreamBoilerplate]
public partial class MyStream : Stream
{
    public override long Length 
    { 
        get { throw new NotImplementedException(); }
    }

    public override long Position
    {
        get { throw new NotImplementedException(); }
        set { throw new NotImplementedException(); }
    }

    public override void Flush()
    {
        throw new NotImplementedException();
    }

    private partial int ReadCore(Span<byte> buffer)
    {
        // Do Read
    }

    private partial ValueTask<int> ReadCoreAsync(Memory<byte> buffer, CancellationToken cancellationToken)
    {
        // Do ReadAsync
    }

    private partial long SeekCore(long offset, SeekOrigin origin)
    {
        // Do Seek
    }

    private partial void SetLengthCore(long value)
    {
        // Do SetLength
    }

    private partial void WriteCore(ReadOnlySpan<byte> buffer)
    {
        // Do Write
    }

    private partial ValueTask WriteCoreAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken)
    {
        // Do WriteAsync
    }
}

Notes:

If you opt into boilerplate generation, you will get most methods sourcegen'd and you will be asked to provide bodies for partial *Core* methods. Opting in doesn't prevent you from overriding any member.

You can specify which members your stream supports by specifying CanRead, CanSeek, and CanWrite properties in GenerateStreamBoilerplateAttribute, i.e: if you want to turn-off one or more stream capabilities e.g: seeking, you need to specify CanSeek = false in the attribute, this will tell the source generator that for seek-related methods (Seek, SetLength), we will implement them to always throw NotSupportedException("Stream does not support seeking").

If one operation is specified as unsupported or if you override all the methods related to one kind of operation, e.g: Read, the partial ReadCore method won't be emitted and you will need to remove it if you were previously providing a body to it.

Alternative Designs

Be more aggressive on generation and emit boilerplate for all Stream-derived classes with a partial modifier.

We can relax the filter and encompass more types if we decide that any Stream-derived type would be a good candidate for boilerplate generation. This would also mean that we don't need an attribute for opting-in but we may want to offer one for opting-out.

Provide analyzers+fixers instead.

We already have an analyzer+fixer for preferring Memory overloads for Stream.Read/WriteAsync (#33790).
We could provide similar ones for sync Read/Write, and for APM (Begin*/End*).

Notes

TaskToApm is not public API so we will need to emit the helpers with the source generator.
We could lower the amount of sourcegen'd code for Begin*/End* if we reconsider #61729.

As suggested in #79261 (comment), I considered adding an Async property to GenerateStreamBoilerplateAttribute (to tell whether the stream was going to support sync, async, or both) but removed it at the end since I think custom streams are more likely to support both contexts and you can still manually override what you don't want to support.

Risks

N/A

Thanks to @madelson, @teo-tsirpanis for their suggestions in #79261.


Update

Based on recent conversations, we will start by providing a source generator with flexibility as most important aspect. Once done, tune the flexible vs prescriptive dial based on user experience.

This work can be split into 2 items:

  • A source generator that detects the attribute and a set of hints based on the Stream current implementation. This bullet addresses generation only. This is being tracked in Add Stream boilerplate source-generator and attribute runtimelab#2185.
  • Add analyzer rules into the source generator that prompt users about poor implementations, these rules will shepherd implementers to the best possible implementation. These rules can also be tuned to increase (error) or lower (warn or info) their severity.
@jozkee jozkee added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 20, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Dec 20, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Based on #79261 (comment), there's the possibility of using a source generator to guide developers to implement Stream correctly i.e: use span-based and task-based APIs instead of byte-array overloads and APM (Begin*/End*).
We can emit boilerplate for virtuals, non-span-based, and non-task-based methods; and then emit partials to force using the newest members in Stream implementations that install the source generator proposed here.

API Proposal

namespace System.IO
{
    /// <summary>
    /// Instructs the System.IO.Stream source generator to generate boilerplate implementations for multiple legacy members, leaving the implementer only with the task of providing implementations for the core operations.
    /// </summary>
    [AttributeUsage(AttributeTargets.Class, AllowMultiple = false, Inherited = false)]
    public sealed class GenerateStreamBoilerplateAttribute : Attribute
    {
        public bool CanRead { get; set; } = true;
        public bool CanSeek { get; set; } = true;
        public bool CanWrite { get; set; } = true;
    }
}

API Usage

This new source generator most likely will ship as an OOB NuGet package. Once installed, you opt-in into boilerplate generation by annotating any partial Stream implementation with [GenerateStreamBoilerplate].

[GenerateStreamBoilerplate]
public partial class MyStream : Stream
{
    public override long Length 
    { 
        get { throw new NotImplementedException(); }
    }

    public override long Position
    {
        get { throw new NotImplementedException(); }
        set { throw new NotImplementedException(); }
    }

    public override void Flush()
    {
        throw new NotImplementedException();
    }

    private partial int ReadCore(Span<byte> buffer)
    {
        // Do Read
    }

    private partial ValueTask<int> ReadCoreAsync(Memory<byte> buffer, CancellationToken cancellationToken)
    {
        // Do ReadAsync
    }

    private partial long SeekCore(long offset, SeekOrigin origin)
    {
        // Do Seek
    }

    private partial void SetLengthCore(long value)
    {
        // Do SetLength
    }

    private partial void WriteCore(ReadOnlySpan<byte> buffer)
    {
        // Do Write
    }

    private partial ValueTask WriteCoreAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken)
    {
        // Do WriteAsync
    }
}

Notes:

If you opt into boilerplate generation, you will get most methods sourcegen'd and you will be asked to provide bodies for partial *Core* methods. Opting in doesn't prevent you from overriding any member.

You can specify which member your stream supports by specifying CanRead, CanSeek, and CanWrite properties in GenerateStreamBoilerplateAttribute, i.e: if you want to turn-off one or more stream capabilities e.g: seeking, you need to specify CanSeek = false in the attribute, this will tell the source generator that for seek-related methods (Seek, SetLength), we will implement them to always throw NotSupportedException("Stream does not support seeking").

If one operation is specified as unsupported or if you override all the methods related to one kind of operation, e.g: Read, the partial ReadCore method won't be emitted and you will need to remove it if you were previously providing a body to it.

Alternative Designs

Be more aggressive on generation and emit boilerplate for all Stream-derived classes with a partial modifier.

We can relax the filter and encompass more types if we decide that any Stream-derived type would be a good candidate for boilerplate generation. This would also mean that we don't need an attribute for opting-in but we may want to offer one for opting-out.

Provide analyzers+fixers instead.

We already have an analyzer+fixer for preferring Memory overloads for Stream.Read/WriteAsync (#33790).
We could provide similar ones for sync Read/Write, and for APM (Begin*/End*).

Notes

TaskToApm is not public API so we will need to emit the helpers with the source generator.
We could lower the amount of sourcegen'd code for Begin*/End* if we reconsider #61729.

As suggested in #79261 (comment), I considered adding an Async property to GenerateStreamBoilerplateAttribute (to tell whether the stream was going to support sync, async, or both) but removed it at the end since I think custom streams are more likely to support both contexts and you can still manually override what you don't want to support.

Risks

N/A

Thanks to @madelson, @teo-tsirpanis for their suggestions in #79261.

Author: Jozkee
Assignees: -
Labels:

api-suggestion, area-System.IO

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 20, 2022
@jozkee jozkee added this to the 8.0.0 milestone Dec 20, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Dec 20, 2022
@madelson
Copy link
Contributor

madelson commented Dec 22, 2022

I think the codegen for CanSeek = true should be more aggressive. For example, rather thank making consumers implement SeekCore we can just implement Seek for them in terms of Length and Position:

public sealed override long Seek(long offset, SeekOrigin origin)
{
    // check for disposal (e.g. via CanSeek)

    long newPosition = origin switch
    {
        SeekOrigin.Begin => offset,
        SeekOrigin.Current => PositionCore + offset,
        SeekOrigin.End => LengthCore + offset,
        _ => throw ...
    };
    if (newPosition < 0) throw ...

    return Position = newPosition;
}

In addition, I think it is worth considering whether SetLengthCore could be replaced with a setter on the private LengthCore property.

These changes would lower the implementation burden for implementers (2 get/set properties instead of 1 get property, 1 get/set property, and 2 methods).

@jozkee
Copy link
Member Author

jozkee commented Dec 22, 2022

private LengthCore property.

@madelson you mean to have LengthCore as a partial property? That is not yet supported on C#, see dotnet/csharplang#6420.

Alternatively what we can do is:
source-gen Position and a private partial long GetPosition() method that can be called by Position getter:

public override long Position 
{
    get => GetPosition();
    set => Seek(SeekOrigin.Current, value);
}

That will make it 1 get property (Length) and 3 methods (SeekCore, GetPosition, and SetLengthCore).

@Joe4evr
Copy link
Contributor

Joe4evr commented Dec 23, 2022

That will make it 1 get property (Length) and 3 methods (SeekCore, GetPosition, and SetLengthCore).

But I presume that Length, SeekCore, and SetLengthCore are also used from different code paths, so that buys some efficiency overall with regards to what a user needs to implement.

@adamsitnik
Copy link
Member

we can just implement Seek for them in terms of Length and Position

Length is not always available for a seekable stream. Some file systems like procfs on Linux return 0 for file length, while the files are seekable and have non-empty content.

@madelson
Copy link
Contributor

madelson commented Jan 1, 2023

@madelson you mean to have LengthCore as a partial property? That is not yet supported on C#

@jozkee good call. Guess this would have to be separate getter and setter methods if we went with my approach instead.

Alternatively what we can do is:
source-gen Position and a private partial long GetPosition() method that can be called by Position getter

This is an interesting approach. Technically I think we could avoid GetPosition because Seek returns the new position within the stream. So GetPosition() could be replaced with SeekCore(0, SeekOrigin.Current).

The upside is eliminating a redundant method. The downside is that it implicitly wants this seek edge-case to be implemented efficiently as a position retrieval, which users may not consider. I suppose that if we keep GetPosition(), we could have Seek handle the special case and call GetPosition() instead of SeekCore.

On balance, I'd probably lean towards having fewer methods to implement. Thoughts?

Length is not always available for a seekable stream. Some file systems like procfs on Linux return 0 for file length, while the files are seekable and have non-empty content.

@adamsitnik thanks I was not aware of this edge case. However, since Seek returns the current position, if the stream fully supports seeking then couldn't Length be implemented this way:

var originalPosition = Position;
try { return Seek(0, SeekOrigin.End); }
finally { Position = originalPosition; }

I'm not saying that this would be a good default implementation of Length because it calls Seek twice and temporarily changes the stream's position; just trying to understand the constraint.

If we need to support SeekOrigin.End without Length, I think that @jozkee 's proposal of implementing Position on top of Seek would be a good approach to handling that.

@madelson
Copy link
Contributor

madelson commented Jan 1, 2023

2 additional ideas: Async and CanTimeout.

Async

I think we should add a property like this to the attribute:

public bool Async { get; set; } = true;

If this property is set to true, then the codegen will be the same as in the original proposal except that the user must implement FlushAsyncCore(CancellationToken) and DisposeAsyncCore() in addition to the other methods. With the original proposal, I worry that these methods may be forgotten, leaving accidental usage of the default sync implementation for DisposeAsync() and the default async-over-sync default implementation of FlushAsync().

If Async is set to false, I think the codegen should implement all async methods synchronously, e.g. how DisposeAsync works today.

CanTimeout

I think we should add a property like this to the attribute:

public bool CanTimeout { get; set; } = false;

This would influence the codegen for the CanTimeout property and Read/WriteTimeout properties (forcing you to implement vs. throwing, checking CanRead/Write, checking disposal, and validating arguments). It would still be up to users to incorporate the values of the read/write timeout properties into their read/write implementations.

The main benefit here is clarity for users. With this addition, we can entirely eliminate optional virtual method sets in favor of required implementations. This gives the user a guarantee that the are providing only and all the necessary implementations for their desired set of capabilities at compile time.

@jozkee
Copy link
Member Author

jozkee commented Jan 23, 2023

@madelson thanks for the input, sorry for the delay.

Async

Instead, we could make that the generator emit partials for FlushAsyncCore and DisposeAsyncCore based on a hint, say, ReadAsync or WriteAsync are implemented by the user.

CanTimeout

I agree that adding CanTimeout is good, but I'm concerned with it being false by default. It will be practically non-existent, unless the user discovers that they can set it on the Attribute ctor. Will include it in the proposal and see if there's more input against it.
Maybe there's also a hint that we can use to force implementing *Timeout properties.

@jozkee
Copy link
Member Author

jozkee commented Jan 23, 2023

@stephentoub please take a look.

@jozkee jozkee self-assigned this Jan 23, 2023
@madelson
Copy link
Contributor

Instead, we could make that the generator emit partials for FlushAsyncCore and DisposeAsyncCore based on a hint, say, ReadAsync or WriteAsync are implemented by the user.

@jozkee can you explain what this means in practice? I'd like it to be the case that if I decide to implement async IO then I need to provide implementation of FlushAsync/DisposeAsync and otherwise I should ignore them. Do you agree?

I'm concerned with it being false by default

That's fair, but FWIW, CanTimeout defaults to false today on Stream and most stream types do not implement it, so I don't think it would be crazy to follow the same pattern in the attribute.

I think that if we can eliminate optional virtual methods from the equation that will be a really nice boon for implementers. It is much easier to implement an abstract class when the compiler tells you exactly what set of methods you MUST implement.

@stephentoub
Copy link
Member

I provided feedback to @jozkee offline, but I'll summarize here.

We're going to have to make tradeoffs here between how flexible we are and how prescriptive we are. For me, I'd start by erring on the side of flexibility with the goal of reducing as much boilerplate as possible. Summarizing, I'd like to see us start with an approach where the developer writes only the minimal amount they need to do what they want and the generator fills in the rest as best as possible, while also issuing diagnostics about opportunities a developer has to do better.

For example, if a developer just writes:

[GeneratedStream]
public class MyStream
{
   public int Read(byte[] buffer, int offset, int count) { ... } 
}

then the generator would generate a CanRead that returns true, CanWrite and CanSeek that return false, and throwing implementations for all the Write methods. It would also issue an info-level diagnostic that without a span-based Read implementation, various inefficiencies result.

Or for example, if a developer just writes:

[GeneratedStream]
public class MyStream
{
   public ValueTask WriteAsync(ReadOnlyMemory<byte> source, CancellationToken cancellationToken) { ... }
}

CanWrite would be generated to return true, CanRead and CanSeek to return false, and all of the Read methods would be generated to throw. Most interestingly, the Write(byte[] buffer, ...) method would be generated as a sync-over-async delegation to WriteAsync, the BeginWrite/EndWrite methods would be generated to delegate to WriteAsync as well, and a diagnostic would be issued about the lack of a synchronous Write.

We could debate what diagnostics are issued and their default levels, but this would not only enable a developer to opt-in to things being more or less impactful (e.g. maybe they want to change the level about lack of async methods to be warnings instead of info), it would also enable evolution if/as more virtuals are added to Stream in the future (the source generator would generate things as best it could and issue a new diagnostic about the new method being available).

Effectively, the generator would heuristically look at everything the developer wrote and then choose to override some additional set of Stream's methods based on that.

As for next steps, I think we should start with a prototype in dotnet/runtimelab. No matter what path we choose here, I expect there's going to be a lot of work in tuning the approach and the specifics to get it "right". I'd also like to see us try it out on all of the streams in dotnet/runtime and other repos; how close to absolute minimal boilerplate does it let us get?

@madelson
Copy link
Contributor

@stephentoub thanks for sharing. I think that with sufficient diagnostics this can achieve what I'd personally hope to get out of this feature which is giving developers confidence that they've implemented exactly what they need to, no more and no less. I'd love to see the diagnostics be fairly opinionated by default in terms of guiding you to a complete and high-performing implementation.

@stephentoub
Copy link
Member

I'd love to see the diagnostics be fairly opinionated by default in terms of guiding you to a complete and high-performing implementation.

That is the intent of my suggested approach: the generator fills in literally everything such that no matter how much you write, you have a compiling implementation (e.g. if you write literally nothing in the class implementation, you end up with all the CanXx properties returning false and any method that requires an implementation and that can nop be implemented to throw), and it generates the best possible implementation based on what you wrote. And then a potential slew of diagnostics are issued to tell you all the things you could/should be doing better. In some cases the diagnostics could be suggesting you override additional methods, in some cases they could be suggesting you delete code you added, in some cases they could be analyzing code you wrote and making recommendations for how to do it better, etc.

@jozkee
Copy link
Member Author

jozkee commented Feb 13, 2023

I think we can split this work into two items, one is the sole generation being as flexible as possible, which will be addressed by dotnet/runtimelab#2185, and the next one is emitting warnings and errors that get caught by the source-gen. I've edited the top description to reflect this.

We can continue adding more items as they keep appearing.

@jeffhandley
Copy link
Member

I'm moving this to Future. For 8.0, we'll get Add Stream boilerplate source-generator and attribute by Jozkee · #2185 merged into runtimelab and available for trials, and then we'll assess if that should be further productized during .NET 9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

6 participants