Add ValueTask to corefx #4708

Closed
benaadams opened this Issue Nov 28, 2015 · 22 comments

Comments

Projects
None yet
10 participants
@benaadams
Collaborator

benaadams commented Nov 28, 2015

ValueTask being developed for System.Threading.Tasks.Channels in corefxlab is very valuable on its own and should be included independently of Channels (and earlier).

ReadAsync is defined to return a ValueTask<T>, a new struct type included in this library. ValueTask<T> is a discriminated union of a T and a Task<T>, making it allocation-free for ReadAsync<T> to synchronously return a T value it has available (in contrast to using Task.FromResult<T>, which needs to allocate a Task<T> instance). ValueTask<T> is awaitable, so most consumption of instances will be indistinguishable from with a Task<T>. If a Task<T> is needed, one can be retrieved using ValueTask<T>'s AsTask() method, which will either return the Task<T> it stores internally or will return an instance created from the T it stores.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Nov 28, 2015

Member

This is something we plan to do, it's just a matter of when and how. It's in the Channels library right now as a placeholder. You can see there's also a copy of it used internally in corefx, at least in one place.

Member

stephentoub commented Nov 28, 2015

This is something we plan to do, it's just a matter of when and how. It's in the Channels library right now as a placeholder. You can see there's also a copy of it used internally in corefx, at least in one place.

@stephentoub stephentoub self-assigned this Nov 28, 2015

@stephentoub stephentoub added this to the Future milestone Nov 28, 2015

@davidfowl

This comment has been minimized.

Show comment
Hide comment
@davidfowl

davidfowl Nov 28, 2015

Contributor

We're going to do some measurements and get some concrete numbers. This could be huge for us (asp.net) in certain scenarios.

Contributor

davidfowl commented Nov 28, 2015

We're going to do some measurements and get some concrete numbers. This could be huge for us (asp.net) in certain scenarios.

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Nov 28, 2015

Contributor

Oh hell yes. Would use that. Would be great (although I imagine that ship has sailed) if libs like ADO.NET could use that for scenarios where data is probably available, like:

while(await reader.ReadAsync(cancel)) {
    ...
}

And of course sockets etc (it reminds me of the cheap return option of the non-task async IO API), and streams.

I can't see of any nice way of retrofitting this onto all the existing code without making the APIs ugly, but: yes. So much yes. I spend my life hunting allocations.

Contributor

mgravell commented Nov 28, 2015

Oh hell yes. Would use that. Would be great (although I imagine that ship has sailed) if libs like ADO.NET could use that for scenarios where data is probably available, like:

while(await reader.ReadAsync(cancel)) {
    ...
}

And of course sockets etc (it reminds me of the cheap return option of the non-task async IO API), and streams.

I can't see of any nice way of retrofitting this onto all the existing code without making the APIs ugly, but: yes. So much yes. I spend my life hunting allocations.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Nov 29, 2015

Collaborator

Might loose some benefits with the Stream Async methods needing Task<int> returns; amended my Stream Api proposal dotnet/coreclr#2179 to use ValueTask<int> rather than Task<int>; other non-value Tasks are ok as they can use cached Tasks

Collaborator

benaadams commented Nov 29, 2015

Might loose some benefits with the Stream Async methods needing Task<int> returns; amended my Stream Api proposal dotnet/coreclr#2179 to use ValueTask<int> rather than Task<int>; other non-value Tasks are ok as they can use cached Tasks

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Nov 29, 2015

Collaborator

Glancing through this code is like when you open a present and only then realise you'd been wanting this thing you hadn't known existed, for some time.

Collaborator

JonHanna commented Nov 29, 2015

Glancing through this code is like when you open a present and only then realise you'd been wanting this thing you hadn't known existed, for some time.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Nov 29, 2015

Member

Glad you like it :)

Member

stephentoub commented Nov 29, 2015

Glad you like it :)

@benaadams benaadams referenced this issue in aspnet/KestrelHttpServer Nov 29, 2015

Closed

Value tasks + Less Async #432

@omariom

This comment has been minimized.

Show comment
Hide comment
@omariom

omariom Nov 29, 2015

Contributor

👍

Contributor

omariom commented Nov 29, 2015

👍

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Nov 30, 2015

Collaborator

Are there plans to let async be used with this? Having an async use the non-task-building constructor when it goes through a path that doesn't hit await would be awesome. Having it do so when it does go through await but only ever got a non-task-containing ValueTask even more so.

Collaborator

JonHanna commented Nov 30, 2015

Are there plans to let async be used with this? Having an async use the non-task-building constructor when it goes through a path that doesn't hit await would be awesome. Having it do so when it does go through await but only ever got a non-task-containing ValueTask even more so.

@svick

This comment has been minimized.

Show comment
Hide comment
@svick

svick Nov 30, 2015

Contributor

@JonHanna I think you can generalize that to just "completed synchronously". That includes not hitting any awaits, awaiting non-task ValueTasks, awaiting already completed Tasks and awaiting any other awaitable whose IsCompleted returned true.

Contributor

svick commented Nov 30, 2015

@JonHanna I think you can generalize that to just "completed synchronously". That includes not hitting any awaits, awaiting non-task ValueTasks, awaiting already completed Tasks and awaiting any other awaitable whose IsCompleted returned true.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Nov 30, 2015

Member

Are there plans to let async be used with this?

No concrete plans, but it's being discussed (cc: @jaredpar, @MadsTorgersen). There is value in doing so. However, using a ValueTask<TResult> instead of a Task<TResult> isn't a pure win. There is overhead associated with it (for example, it's a field bigger, which means you're not only passing back an extra field from a method, if you're awaiting one of these it's likely going to increase the size of an async caller's state machine object by the size of a field), and that's pure overhead without benefit in the case where the operation does complete asynchronously. That means developers really need to think carefully about switching to such a ValueTask<TResult> when exposing it in an API, and if they're optimizing it, they likely also care about other involved overheads. As such, it's likely that a careful developer who cares about such costs would choose to use a non-async method that wraps an async one, with that wrapper doing argument validation, checking for a fast path, etc., e.g.

public ValueTask<Something> BarAsync(string arg)
{
    if (arg == null) throw new ArgumentNullException("arg");
    return CanUseFastPath(arg) ?
        new ValueTask<Something>(ComputeSync(arg)) :
        new ValueTask<Something>(BarAsyncCore(arg));
}

private async Task<Something> BarAsyncCore(string arg) { ... }

That's not to say there isn't value in being able to write:

public ValueTask<Something> BarAsync(string arg)
{
    if (arg == null) throw new ArgumentNullException("arg");
    return BarAsyncCore(arg);
}

private async ValueTask<Something> BarAsynccore(string arg) { ... }

There obviously is. I'm simply highlighting it to point out that ValueTask<TResult> should only be used after careful consideration, and with such careful consideration, you're likely also considering other optimizations that might lead you away from this. And of course there's cost involved in developing such a feature to the language, in developers needing to learn about it, etc.

Member

stephentoub commented Nov 30, 2015

Are there plans to let async be used with this?

No concrete plans, but it's being discussed (cc: @jaredpar, @MadsTorgersen). There is value in doing so. However, using a ValueTask<TResult> instead of a Task<TResult> isn't a pure win. There is overhead associated with it (for example, it's a field bigger, which means you're not only passing back an extra field from a method, if you're awaiting one of these it's likely going to increase the size of an async caller's state machine object by the size of a field), and that's pure overhead without benefit in the case where the operation does complete asynchronously. That means developers really need to think carefully about switching to such a ValueTask<TResult> when exposing it in an API, and if they're optimizing it, they likely also care about other involved overheads. As such, it's likely that a careful developer who cares about such costs would choose to use a non-async method that wraps an async one, with that wrapper doing argument validation, checking for a fast path, etc., e.g.

public ValueTask<Something> BarAsync(string arg)
{
    if (arg == null) throw new ArgumentNullException("arg");
    return CanUseFastPath(arg) ?
        new ValueTask<Something>(ComputeSync(arg)) :
        new ValueTask<Something>(BarAsyncCore(arg));
}

private async Task<Something> BarAsyncCore(string arg) { ... }

That's not to say there isn't value in being able to write:

public ValueTask<Something> BarAsync(string arg)
{
    if (arg == null) throw new ArgumentNullException("arg");
    return BarAsyncCore(arg);
}

private async ValueTask<Something> BarAsynccore(string arg) { ... }

There obviously is. I'm simply highlighting it to point out that ValueTask<TResult> should only be used after careful consideration, and with such careful consideration, you're likely also considering other optimizations that might lead you away from this. And of course there's cost involved in developing such a feature to the language, in developers needing to learn about it, etc.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Nov 30, 2015

Collaborator

I think you can generalize that to just "completed synchronously"

In terms of desire, certainly. In terms of implementation I could see one being simpler than the other.

That means developers really need to think carefully about switching to such a ValueTask when exposing it in an API, and if they're optimizing it, they likely also care about other involved overheads.

If it's ever attempted and abandoned I'd still love to know how close it came to having the first example you give perform like the second. (If it's ever attempted and adopted I'll likely be doing my own experiments on the results to decide when I should and shouldn't use it anyway).

Collaborator

JonHanna commented Nov 30, 2015

I think you can generalize that to just "completed synchronously"

In terms of desire, certainly. In terms of implementation I could see one being simpler than the other.

That means developers really need to think carefully about switching to such a ValueTask when exposing it in an API, and if they're optimizing it, they likely also care about other involved overheads.

If it's ever attempted and abandoned I'd still love to know how close it came to having the first example you give perform like the second. (If it's ever attempted and adopted I'll likely be doing my own experiments on the results to decide when I should and shouldn't use it anyway).

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Nov 30, 2015

Member

I would like to add this to corefx as part of a new library: System.Threading.Tasks.Extensions.dll.

Here’s a commit containing the library; I'll submit it as a PR once approved:
stephentoub@217376f

Here’s the API surface area being added:
https://github.com/stephentoub/corefx/blob/217376f756071bfd124b5408538457f411c7865c/src/System.Threading.Tasks.Extensions/ref/System.Threading.Tasks.Extensions.cs

The motivation for adding ValueTask is performance. We still encourage folks to use Task and Task<TResult> by default as the return type from async methods. However, for cases where the async method returns a Task<TResult>, very frequently returns synchronously, and is invoked on such a hot path that the cost of allocating the Task<TResult> for the synchronous result is prohibitive, ValueTask<TResult> may be used instead. ValueTask<TResult> is a struct-based discrimated union of a Task<TResult> and a TResult. If the method completes successfully synchronously, then it doesn’t need to allocate a Task<TResult>, and can instead just return a ValueTask<TResult> created from the raw TResult. Otherwise, it can return a ValueTask<TResult> created from the Task<TResult>.

Existing examples:

Member

stephentoub commented Nov 30, 2015

I would like to add this to corefx as part of a new library: System.Threading.Tasks.Extensions.dll.

Here’s a commit containing the library; I'll submit it as a PR once approved:
stephentoub@217376f

Here’s the API surface area being added:
https://github.com/stephentoub/corefx/blob/217376f756071bfd124b5408538457f411c7865c/src/System.Threading.Tasks.Extensions/ref/System.Threading.Tasks.Extensions.cs

The motivation for adding ValueTask is performance. We still encourage folks to use Task and Task<TResult> by default as the return type from async methods. However, for cases where the async method returns a Task<TResult>, very frequently returns synchronously, and is invoked on such a hot path that the cost of allocating the Task<TResult> for the synchronous result is prohibitive, ValueTask<TResult> may be used instead. ValueTask<TResult> is a struct-based discrimated union of a Task<TResult> and a TResult. If the method completes successfully synchronously, then it doesn’t need to allocate a Task<TResult>, and can instead just return a ValueTask<TResult> created from the raw TResult. Otherwise, it can return a ValueTask<TResult> created from the Task<TResult>.

Existing examples:

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Nov 30, 2015

Collaborator

it's a field bigger, which means you're not only passing back an extra field from a method

If the ValueTask return is always the last statement in each method in the call chain; will this type play well with jit tail call optimizations? Best I can seem to find is from 2009; which isn't RyuJit.

http://blogs.msdn.com/b/clrcodegeneration/archive/2009/05/11/tail-call-improvements-in-net-framework-4.aspx

  • The caller or callee returns a value type.

We changed the JIT to recognize this case as well. Specifically the X64 calling convention dictates that for value types that don’t fit in a register (1,2,4, or 8 byte sized value types), that the caller should pass a ‘hidden’ argument called the return buffer. The CLR 2 JIT would pass a new return buffer to the callee, and then copy the data from the callee’s return buffer to the caller’s return buffer. The CLR 4 runtime now recognizes when it is safe to do so, and then just passes the caller’s return buffer into the callee as its return buffer. This means even when we don’t do a tail call, the code has at least one less copy, and it enables one more case to be ‘easy’!

Which would suggest sizeof(type+ptr) > 8 byte; it wouldn't be copied at each step of the stack unwind at least?

Collaborator

benaadams commented Nov 30, 2015

it's a field bigger, which means you're not only passing back an extra field from a method

If the ValueTask return is always the last statement in each method in the call chain; will this type play well with jit tail call optimizations? Best I can seem to find is from 2009; which isn't RyuJit.

http://blogs.msdn.com/b/clrcodegeneration/archive/2009/05/11/tail-call-improvements-in-net-framework-4.aspx

  • The caller or callee returns a value type.

We changed the JIT to recognize this case as well. Specifically the X64 calling convention dictates that for value types that don’t fit in a register (1,2,4, or 8 byte sized value types), that the caller should pass a ‘hidden’ argument called the return buffer. The CLR 2 JIT would pass a new return buffer to the callee, and then copy the data from the callee’s return buffer to the caller’s return buffer. The CLR 4 runtime now recognizes when it is safe to do so, and then just passes the caller’s return buffer into the callee as its return buffer. This means even when we don’t do a tail call, the code has at least one less copy, and it enables one more case to be ‘easy’!

Which would suggest sizeof(type+ptr) > 8 byte; it wouldn't be copied at each step of the stack unwind at least?

@mgravell

This comment has been minimized.

Show comment
Hide comment
@mgravell

mgravell Nov 30, 2015

Contributor

I'd trail a tail-call for an unnecessary allocation in a heartbeat...

The bigger problem (for me, at least) is that a lot of the obvious places
to use this are in existing Task-based APIs: sockets, file-streams, ADO.NET
data-readers - the places that have probably read more than one "thing" and
can hand back synchronously most of the time. Without getting this down
into those layers, much of the useful scenarios may be limited.
On 30 Nov 2015 6:00 pm, "Ben Adams" notifications@github.com wrote:

If the ValueTask return is always the last statement in each method in the
call chain; will this type play well with jit tail call optimizations?


Reply to this email directly or view it on GitHub
#4708 (comment).

Contributor

mgravell commented Nov 30, 2015

I'd trail a tail-call for an unnecessary allocation in a heartbeat...

The bigger problem (for me, at least) is that a lot of the obvious places
to use this are in existing Task-based APIs: sockets, file-streams, ADO.NET
data-readers - the places that have probably read more than one "thing" and
can hand back synchronously most of the time. Without getting this down
into those layers, much of the useful scenarios may be limited.
On 30 Nov 2015 6:00 pm, "Ben Adams" notifications@github.com wrote:

If the ValueTask return is always the last statement in each method in the
call chain; will this type play well with jit tail call optimizations?


Reply to this email directly or view it on GitHub
#4708 (comment).

@benaadams benaadams changed the title from Add ValueTask to BCL to Add ValueTask to corefx Nov 30, 2015

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Nov 30, 2015

Member

lot of the obvious places to use this are in existing Task-based APIs

Yes. Ths is a niche thing, very valuable in a few cases and unnecessary or not applicable in the majority.

For non-generic Tasks, Task.CompletedTask is perfectly good. For generic Tasks, it's often the case that a cached task of some shape or form is sufficient to cover the majority of cases, e.g. MemoryStream/FileStream/etc.'s ReadAsync frequently returning a Task<int> containing the same result due to being asked for the same amount of data each time and having it available, and thus caching the last successful task it returned to be able to return it again. It's really only for the generic cases where caching isn't reasonable with a hot-path, frequently-synchronously-returning method that this provides value.

(I also really do want folks to think carefully about using this, as it not only has the potential to add overhead, it also complicates the programming model when it's used, at least a bit. But for the more niche cases where it adds value, it often adds a lot of value.)

Member

stephentoub commented Nov 30, 2015

lot of the obvious places to use this are in existing Task-based APIs

Yes. Ths is a niche thing, very valuable in a few cases and unnecessary or not applicable in the majority.

For non-generic Tasks, Task.CompletedTask is perfectly good. For generic Tasks, it's often the case that a cached task of some shape or form is sufficient to cover the majority of cases, e.g. MemoryStream/FileStream/etc.'s ReadAsync frequently returning a Task<int> containing the same result due to being asked for the same amount of data each time and having it available, and thus caching the last successful task it returned to be able to return it again. It's really only for the generic cases where caching isn't reasonable with a hot-path, frequently-synchronously-returning method that this provides value.

(I also really do want folks to think carefully about using this, as it not only has the potential to add overhead, it also complicates the programming model when it's used, at least a bit. But for the more niche cases where it adds value, it often adds a lot of value.)

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Nov 30, 2015

Collaborator

returning a Task<int> containing the same result due to being asked for the same amount of data each time and having it available

A weird side-effect of this would be in certain circumstances reading with a smaller buffer; then copying to larger may be more performant as it would allocate less Tasks; rather than reading with the larger buffer which would return different values; or better reading with small windows on the larger buffer.

Though that would probably end up with far stranger code than using a ValueTask with multiple paths.

Collaborator

benaadams commented Nov 30, 2015

returning a Task<int> containing the same result due to being asked for the same amount of data each time and having it available

A weird side-effect of this would be in certain circumstances reading with a smaller buffer; then copying to larger may be more performant as it would allocate less Tasks; rather than reading with the larger buffer which would return different values; or better reading with small windows on the larger buffer.

Though that would probably end up with far stranger code than using a ValueTask with multiple paths.

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Nov 30, 2015

Member

A weird side-effect of this would be in certain circumstances reading with a smaller buffer; then copying to larger may be more performant as it would allocate less Tasks; rather than reading with the larger buffer which would return different values; or better reading with small windows on the larger buffer.

I didn't quite follow that, but it's often the case that an optimization applies to a particular common pattern and stepping away from that pattern causes the optimization to not apply. That would be the case here as well. It's extremely common to always read from a given stream the same number of bytes each time, and often for you to get back the number you requested (for a stream like MemoryStream, that only doesn't happen if you're done reading, and even with a stream like FileStream it tries to give you the data from a buffer and then tries to read the remaining you asked for from the file), in which case the optimization would apply. If you frequently change the number of bytes you request, then yeah, this particular optimization wouldn't be relevant. There are certainly patterns of access that do that, they're just not the most common in my experience.

Member

stephentoub commented Nov 30, 2015

A weird side-effect of this would be in certain circumstances reading with a smaller buffer; then copying to larger may be more performant as it would allocate less Tasks; rather than reading with the larger buffer which would return different values; or better reading with small windows on the larger buffer.

I didn't quite follow that, but it's often the case that an optimization applies to a particular common pattern and stepping away from that pattern causes the optimization to not apply. That would be the case here as well. It's extremely common to always read from a given stream the same number of bytes each time, and often for you to get back the number you requested (for a stream like MemoryStream, that only doesn't happen if you're done reading, and even with a stream like FileStream it tries to give you the data from a buffer and then tries to read the remaining you asked for from the file), in which case the optimization would apply. If you frequently change the number of bytes you request, then yeah, this particular optimization wouldn't be relevant. There are certainly patterns of access that do that, they're just not the most common in my experience.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Nov 30, 2015

Collaborator

For example Stream.CopyToAsync without a specified buffer size will use _DefaultCopyBufferSize = 81920 which is a pretty sizeable buffer for anything to have completed sync so unless the stream being read from always has the same amount buffered each loop will need to return a different Task with a different value.

So you may experience perf gains by specifying a much smaller buffer size; as it could then return a cached Task<int> on each read.

Assuming the following was true for the read stream (else you'd just be increasing call count for no gain)

  • The underlying stream is continuing to buffer between calls to ReadAsync
  • ReadAsync doesn't wait till the buffer is completely full before returning - e.g. a network type stream I'd hope wouldn't wait for a 82kB buffer to fill before returning
Collaborator

benaadams commented Nov 30, 2015

For example Stream.CopyToAsync without a specified buffer size will use _DefaultCopyBufferSize = 81920 which is a pretty sizeable buffer for anything to have completed sync so unless the stream being read from always has the same amount buffered each loop will need to return a different Task with a different value.

So you may experience perf gains by specifying a much smaller buffer size; as it could then return a cached Task<int> on each read.

Assuming the following was true for the read stream (else you'd just be increasing call count for no gain)

  • The underlying stream is continuing to buffer between calls to ReadAsync
  • ReadAsync doesn't wait till the buffer is completely full before returning - e.g. a network type stream I'd hope wouldn't wait for a 82kB buffer to fill before returning
@terrajobst

This comment has been minimized.

Show comment
Hide comment
@terrajobst

terrajobst Dec 4, 2015

Member

No major concerns. We should agree, though, how we name types that provide a struct-based alternative to a class. Should this be a prefix or a suffix?

  • Prefix. Matches how must people read C# code, i.e. struct Foo, ValueFoo.
  • Suffix. Sorts better in IntelliSense, documentation, or any other case where APIs are sorted.

The best location seems to be System.Threading.Tasks. We need to check whether we can make this a partial façade for .NET Framework.

Member

terrajobst commented Dec 4, 2015

No major concerns. We should agree, though, how we name types that provide a struct-based alternative to a class. Should this be a prefix or a suffix?

  • Prefix. Matches how must people read C# code, i.e. struct Foo, ValueFoo.
  • Suffix. Sorts better in IntelliSense, documentation, or any other case where APIs are sorted.

The best location seems to be System.Threading.Tasks. We need to check whether we can make this a partial façade for .NET Framework.

@i3arnon

This comment has been minimized.

Show comment
Hide comment
@i3arnon

i3arnon Dec 5, 2015

Collaborator

Another place where this can be very useful is TPL Dataflow as calls to await block.SendAsync(item) on most blocks complete synchronously until the BoundedCapacity is reached and only then does asynchronously blocking becomes relevant.

Also, since TDF is used for concurrency to begin with, the probability of performance being an issue is higher than normal

Collaborator

i3arnon commented Dec 5, 2015

Another place where this can be very useful is TPL Dataflow as calls to await block.SendAsync(item) on most blocks complete synchronously until the BoundedCapacity is reached and only then does asynchronously blocking becomes relevant.

Also, since TDF is used for concurrency to begin with, the probability of performance being an issue is higher than normal

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Dec 5, 2015

Member

Another place

But that's also a case where it can and does return a cached task.

Member

stephentoub commented Dec 5, 2015

Another place

But that's also a case where it can and does return a cached task.

@i3arnon

This comment has been minimized.

Show comment
Hide comment
@i3arnon

i3arnon Dec 5, 2015

Collaborator

it can and does return a cached task.

Well then... another nice surprise from TPL Dataflow :)

I should have looked at the source first.

Collaborator

i3arnon commented Dec 5, 2015

it can and does return a cached task.

Well then... another nice surprise from TPL Dataflow :)

I should have looked at the source first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment