Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CancellationToken to StreamReader.Read* methods #20824

Closed
Tracked by #64596
ChadNedzlek opened this issue Mar 29, 2017 · 32 comments · Fixed by #61898
Closed
Tracked by #64596

Add CancellationToken to StreamReader.Read* methods #20824

ChadNedzlek opened this issue Mar 29, 2017 · 32 comments · Fixed by #61898
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.IO
Milestone

Comments

@ChadNedzlek
Copy link
Member

ChadNedzlek commented Mar 29, 2017

EDIT: link to updated proposal: #20824 (comment)

Today we have the following methods on StreamReader:

public class StreamReader
{
    public Task<string> ReadLineAsync();
    public Task<string> ReadToEndAsync();
    public Task<int> ReadAsync(char[] buffer, int index, int count);
    public Task<int> ReadBlockAsync(char[] buffer, int index, int count);
}

None of these support cancellation via CancellationTokens, even though the underlying Stream ReadAsync API's do

Proposed API

Add the following methods

public class StreamReader
{
    public Task<string> ReadLineAsync(CancellationToken token);
    public Task<string> ReadToEndAsync(CancellationToken token);
    public Task<int> ReadAsync(char[] buffer, int index, int count, CancellationToken token);
    public Task<int> ReadBlockAsync(char[] buffer, int index, int count, CancellationToken token);
}

Motivation

Since these methods read from an underlying stream that may have long delays before data is available, they can block for long periods of time. Especially if the underlying stream is network based.

When trying to create a responsive application in these circumstances, where long running operations that might never return can be cancelled to return control back to the user, the inability to cancel these operations in difficult.

Currently the only real option is to fork the call into a fake cancellable operation that ends up ignoring original Task when a cancellation happens, with something like ThreadingTools.WithCancellation. But that leaves the reading Task (and potentially the stream and any continuations registered on it) floating, unreferenced.

@airbreather
Copy link

airbreather commented Sep 13, 2017

This raises some obvious questions... I guess I'll be the one to ask them.

  1. Should this go as high as System.IO.TextReader?
    1. It seems like bolting this on there, at least in a way that's similar to how I would design it if cancellation were supported from the beginning (i.e., subclasses can only override a method that has cancellation support), would be either a breaking change or a confusing API.
    2. On the plus side, it looks like the concerns I have re: System.IO.TextReader actually don't necessarily apply to System.IO.StreamReader. The latter already uses blocks like this to explicitly account for subclasses, so a subclass just might not get the benefits of cancellation support by calling base (much like they don't get all the benefits of async support today by calling base).
  2. Should this also apply to System.IO.StreamWriter?
    1. If so, everything in my previous bullet point also applies to that side of things... here's a block that's similar to the one I linked for System.IO.StreamReader.

@ChadNedzlek
Copy link
Member Author

ChadNedzlek commented Sep 14, 2017

@airbreather In regards to dotnet/corefx#2, in my opinion, it's slightly less important for the Writer side of things, since it's a less common scenario to be blocked forever on a Write that will never finish, and you now can't cancel out of. At least... as far as my imagination can take me down that route.

Though it probably makes sense to add them there too, for consistency sake, and I'm sure there is some networking scenario where you are blocked on a write operation somehow that you'd want to cancel.

@JeremyKuhne
Copy link
Member

We just added CancellationToken support to TextReader with our new Span APIs. dotnet/corefx#24434

@stephentoub Any comments on adding the cancellation token support for the existing methods?

@stephentoub
Copy link
Member

Any comments on adding the cancellation token support for the existing methods?

The new Memory-based overload we've added take care of ReadAsync and ReadBlockAsync; there's no need to add additional overloads for those.

They don't cover ReadLineAsync or ReadToEndAsync, so we could still choose to add cancelable overloads for those.

@JeremyKuhne
Copy link
Member

The new Memory-based overload we've added take care of ReadAsync and ReadBlockAsync; there's no need to add additional overloads for those.

So rely on wrapping the array rather than add an overload for char[]?

@stephentoub
Copy link
Member

Right.

@narciero
Copy link

narciero commented Sep 5, 2018

any update on adding cancellation token overloads to ReadLineAsync ?

This is a common use case for reading data from a streaming HTTP API via HttpContentStream, would be nice to be able to set a timeout per read

@JeremyKuhne
Copy link
Member

@narciero Just needs someone to drive updating the proposal. Feedback is:

  1. Should target TextReader
  2. Should only update ReadToEnd and ReadLine
  3. Should follow the same implementation pattern as ReadAsync(Memory<char>) (including overrides in StreamReader and StringReader)

Another question that should be considered is how a lower allocation version of these two APIs could/should be implemented. It might make more sense for these to return (and possibly take) StringBuilder, for example (potentially eliminating allocations). And/or it may be useful to have a TryRead* that takes a Memory<char> buffer.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@Shane32
Copy link
Contributor

Shane32 commented Mar 26, 2020

I'd also like to see CancellationTokens implemented for these methods. The above feedback sounds good.

@ChadNedzlek
Copy link
Member Author

ChadNedzlek commented Mar 31, 2020

@JeremyKuhne If it's only ReadLine and ReadToEndAsync, this doesn't actually cover the scenario that was blocking me that caused me to open this issue. Unless ReadAsync is somehow covered by some other changes, and now has a cancellable version.

@bartonjs
Copy link
Member

Unless ReadAsync is somehow covered by some other changes, and now has a cancellable version.

Yeah, there's now TextReader.ReadAsync(Memory<char>, CancellationToken)

@ChadNedzlek
Copy link
Member Author

That's good to hear. I should be able to rearrange my pile of inherited Streams/StreamReaders to be able to take advantage of that... but I still need the other methods. :-)

@khellang
Copy link
Member

Here's an updated proposal based on the feedback above:

public abstract class TextReader
{
    public virtual ValueTask<string?> ReadLineAsync(CancellationToken token);
    public virtual ValueTask<string> ReadToEndAsync(CancellationToken token);
}

The following changes have been made:

  • Moved the methods up to TextReader
  • Removed ReadAsync and ReadBlockAsync as they seem to be covered by the new Memory<byte>-based overloads.
  • Changed the return type to ValueTask
  • Added nullable annotations

Any chance this might get into a review anytime soon?

@orthoxerox
Copy link

I understand that the new methods must be virtual to ensure backward compatibility, but there's just no sensible default implementation of TextReader.ReadLineAsync(CancellationToken) I can come up with. I could just make it ignore the token, of course, since StringReader doesn't really require cancellation and StreamReader gets its own implementation, but it would make me feel dirty. Does it really have to go on TextReader? What are the use cases?

@airbreather
Copy link

Does it really have to go on TextReader? What are the use cases?

A caller may have a TextReader that may or may not be a StreamReader (or similar), and they have a CancellationToken that they would be happy to pass through if it's supported. Whenever it's a StreamReader, they want to be able to cancel the pending read, but for StringReader (or similar), they're fine with the read completing, since it's strictly synchronous anyway.

An actual, specific, real-world use case I do not have, but it would be really awkward for all callers to have to write this:

switch (someTextReader)
{
	case StreamReader someStreamReader:
		return await someStreamReader.ReadLineAsync(cancellationToken);

	default:
		return await someTextReader.ReadLineAsync();
}

So writing virtual methods on TextReader whose base implementations accept and ignore a CancellationToken will definitely be weird here, but IMO it's not actually that bad, and it seems to meet the strict compatibility requirements of .NET, so I support it.

@Shane32
Copy link
Contributor

Shane32 commented May 13, 2021

The existing default implementations of cancelable asynchronous methods (e.g. ReadAsync(Memory<Char>, CancellationToken)) currently start a new thread and synchronously read the data from that thread. Attempting to cancel with the cancellation token only cancels the read if the cancel request occurs before the thread is allocated. Once the synchronous read has begun, it completes synchronously. This makes sense given the scenario. The same logic would apply to a default implementation of ReadLineAsync(CancellationToken):

        public virtual Task<string?> ReadLineAsync(CancellationToken cancellationToken) =>
            Task<string?>.Factory.StartNew(static state => ((TextReader)state!).ReadLine(), this,
                cancellationToken, TaskCreationOptions.DenyChildAttach, TaskScheduler.Default);

For reference, see the existing default implementation of ReadLineAsync() here:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/TextReader.cs#L206..L208

For a truly asynchronous implementation, StreamReader must override the method and utilize asynchronous reads to the underlying stream. StringReader would likely not override the method at all, relying on the default implementation shown above.

An actual, specific, real-world use case I do not have

We can come up with one easy - how about a CSV data parser? The method may be defined as ValueTask<T> ReadRow<T>(TextReader reader, CancellationToken token = default). It relies on ReadLineAsync(CancellationToken cancellationToken) to read a row of text, and then parses it into columns and ultimately somehow into T -- perhaps by examining the first row and matching header names to property names. This is a simple use case of ReadLineAsync(CancellationToken) where the cancellation token should pass through to the underlying TextReader. It should be noted that this could be much faster code than attempting to work around the limitation by calling ReadAsync(Memory<char>, CancellationToken) repeatedly one character at a time, since the underlying StreamReader could potentially examine its underlying buffer to directly scan for newline tokens.

While it is certainly possible to add the asynchronous overload directly to StreamReader, I would say that the method belongs on TextReader along with the other existing asynchronous methods. It would certainly be unfortunate if we ended up in a scenario similar to IQueryable or IDbConnection where there is no standard for asynchronous methods.

@orthoxerox
Copy link

The changes to the classes do not look too threatening (although they haven't been tested yet), so the API review is probably a bigger blocker.

@lichti81
Copy link

lichti81 commented Jul 8, 2021

@orthoxerox: I think there are more changes necessary
Lets assume the client usually do following:

StreamReader reader = ...
while (!reader.EndOfStream)
{
  await reader.ReadLineAsync();
  ...
}

A short look at the implementation of EndOfStream shows me that there is the stream read into cache. This means when it comes to ReadLineAsync it is already to late, because there the info from cache is read to identify lines in text.

I think to be able to support cancelation following is needed:

StreamReader reader = ...
while (!await reader.IsEndOfStreamAsync(cancelationToken))
{
  await reader.ReadLineAsync(cancelationToken);
  ...
}

But I'm not a expert. I just invested a view minutes for check of code.

A possible alternative would be to provide and implement a own class which supports asynchron stream reading with cancelation.

As somebody was asking for a usecase. Here is one:
When you want like to implement server side events with ASP.NET Core you generate streams on server which you need to read as stream on client. There could be a long time (minutes or hours) between two junks of information dropping in on client side.
During this time the client hangs in the reader.EndOfStream, because there it is waiting for new content from the server.
A use of cancelation is not possible at the moment as the stream reader is not able to cancel the wait for new content.

Edit: It seems to be that the Grpc team had the same problem in the past. Therefore they defined and implemented a HttpContentClientStreamReader hidden behind interface IAsyncStreamReader. Please take a look at interface documentation to see how it behaves.
After some hops the implementation lands in ReadMessageAsync extension method which uses Stream.ReadAsync to read with cancelation the values out of stream.

@adamsitnik
Copy link
Member

@khellang thank you for updating the proposal (#20824 (comment)) !

I don't think that ReadToEndAsync should return a ValueTask (it allocates a LOT of memory since it reads the file to the end)

Here is the proposal that I would like to present to the review board:

public abstract class TextReader
{
    public virtual Task<string?> ReadLineAsync()
+   public virtual ValueTask<string?> ReadLineAsync(CancellationToken cancellationToken)
    public virtual Task<string> ReadToEndAsync()
+   public virtual Task<string> ReadToEndAsync(CancellationToken cancellationToken)
}

@adamsitnik adamsitnik self-assigned this Jan 17, 2022
@adamsitnik adamsitnik added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jan 17, 2022
@adamsitnik adamsitnik modified the milestones: Future, 7.0.0 Jan 17, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 17, 2022
@airbreather
Copy link

I don't think that ReadToEndAsync should return a ValueTask (it allocates a LOT of memory since it reads the file to the end)

Can you please elaborate on why that should affect the decision on whether to return Task or ValueTask?

My intuition is that ValueTask<string> could still be a reasonable choice, because:

  • a derived class may wish to use an IValueTaskSource<string>, and also because (separately)
  • not all use cases that would benefit from a cancellable ReadToEndAsync involve large files: they might be small files with high latency, or where it's backed by one of those Stream implementations that read a limited window of another Stream and its small data happens to be cut between internal read buffers.

@khellang
Copy link
Member

I don't think that ReadToEndAsync should return a ValueTask (it allocates a LOT of memory since it reads the file to the end)

@adamsitnik You seem pretty focused on a TextReader over a FileStream here, but this is an abstract class that anyone can implement to do whatever. I don't know how you can decide that every single implementation of this abstract class will allocate a lot and thus it doesn't make sense to return ValueTask<string>?

@stephentoub
Copy link
Member

I think @adamsitnik's point is this API is inherently not a high-performance API. By its nature, it's allocating a new byte[] on every call (unless the source is empty). If someone cares about the extra allocation for the task, why are they using this API in the first place rather than, say, renting an array, reading into it using Stream.ReadAsync, and returning the pooled array?

@Shane32
Copy link
Contributor

Shane32 commented Jan 18, 2022

I think @adamsitnik's point is this API is inherently not a high-performance API. By its nature, it's allocating a new byte[] on every call (unless the source is empty). If someone cares about the extra allocation for the task, why are they using this API in the first place rather than, say, renting an array, reading into it using Stream.ReadAsync, and returning the pooled array?

I agree; whether it returns Task or ValueTask doesn't matter much, considering the inefficient nature of the specific API calls.

For me, the point is to support cancellation during what could easily be a long-running task. Obviously there are more memory-efficient ways to read data than ReadToEndAsync. But as of now, if someone needs to do exactly that -- read the whole stream into a string -- there's no way to properly support canceling the operation in-process.

I believe these should be added to the TextReader with default implementations that map to the existing methods that do not have cancellation token support (see my comments above). StreamReader should override these methods to provide cancellation support when reading from streams.

On a side note, ValueTask might make sense for ReadLineAsync because the line of text may be buffered by the implementation, especially for blank or short lines. ReadToEndAsync would never be called in a loop, so whether it's another allocation or not doesn't matter much. So I might suggest ValueTask for ReadLineAsync but Task for ReadToEndAsync. But it doesn't matter to me, and I'm not sure of all the benefits of either choice, so I would defer to others' opinions on this.

I also see no need to add the ReadLineAsync(char[]...) method with cancellation token support, since there is already an overload of ReadLine with cancellation token support. However, I believe all async methods should accept a cancellation token as a matter of form. Ditto for ReadBlockAsync.

@bartonjs
Copy link
Member

bartonjs commented Jan 18, 2022

Video

Looks good as proposed

namespace System.IO
{
    public partial class TextReader
    {
        public virtual ValueTask<string?> ReadLineAsync(CancellationToken cancellationToken) => throw null;
        public virtual Task<string> ReadToEndAsync(CancellationToken cancellationToken) => throw null;
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 18, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 19, 2022
@huoyaoyuan
Copy link
Member

I'd prefer consistency when choosing between Task and ValueTask.

@stephentoub
Copy link
Member

I'd prefer consistency when choosing between Task and ValueTask.

Consistency with what? The existing overloads are Task.

@huoyaoyuan
Copy link
Member

huoyaoyuan commented Jan 19, 2022

Consistency with what?

With existing similar methods. ValueTask<string> looks strange that I can't recall any method returning it.

@stephentoub
Copy link
Member

stephentoub commented Jan 19, 2022

With existing similar methods. ValueTask<string> looks strange that I can't recall any method returning it.

ValueTask<int> was "inconsistent" the first time we added it :-) ValueTask<bool> was "inconsistent" the first time we added it. There's always a first time for a new type. There are also multiple places that return ValueTask<T>, which could easily have T==string.

@Shane32
Copy link
Contributor

Shane32 commented Jan 19, 2022

Side note: TextWriter.FlushAsync also needs a cancellation token argument. IO.Stream.FlushAsync has had a cancellation token argument since .NET Standard 1.0, and there exist TextWriter.WriteAsync methods with cancellation tokens -- but not FlushAsync.

bgrainger added a commit to bgrainger/runtime that referenced this issue Jan 20, 2022
System.IO - Streams automation moved this from To do to Done Jan 25, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 25, 2022
@adamsitnik
Copy link
Member

Side note: TextWriter.FlushAsync also needs a cancellation token argument. IO.Stream.FlushAsync has had a cancellation token argument since .NET Standard 1.0, and there exist TextWriter.WriteAsync methods with cancellation tokens -- but not FlushAsync.

Could you please create a new issue with the proposal?

@Shane32
Copy link
Contributor

Shane32 commented Jan 26, 2022

Side note: TextWriter.FlushAsync also needs a cancellation token argument. IO.Stream.FlushAsync has had a cancellation token argument since .NET Standard 1.0, and there exist TextWriter.WriteAsync methods with cancellation tokens -- but not FlushAsync.

Could you please create a new issue with the proposal?

Done

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO
Projects
Development

Successfully merging a pull request may close this issue.