Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for async functions which run to completion #1

Open
wants to merge 13 commits into
base: completion_fn
Choose a base branch
from

Conversation

Matthias247
Copy link
Owner

This RFC introduces a new type of asynchronous functions, which have
run-to-completion semantics compared to cancel-anytime semantics.

This RFC introduces a new type of asynchronous functions, which have
run-to-completion semantics compared to cancel-anytime semantics.
Copy link

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic! No real complaints just left a few nits/comments.

text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved

The proposal adds 2 new items to Rust:
- A new `Future` type, which will reside in `core::task`. In the remains of this
document we will refer to this type as `RunToCompletionFuture`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
document we will refer to this type as `RunToCompletionFuture`.
document we will refer to this type as `CompletionFuture`.

What about that? :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it too. Definitely more concise. I was only not sure whether it conveys that well what it is. I'm personally fine with both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo I find it says the same thing but we can always leave the bike shedding up to the real post :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that CompletionFuture is more concise.

text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved

Users will continue to be able to write simple and safe `Future` implementations
with cancel-anytime semantics manually, and be able to use them in combination
with powerful flow-control macros like `select!` and `join!`. A lot of types

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big issue I see is that this affects leaf futures and therefore would require the entire future chain to be a completion future. Which is quite unfortunate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we provided some tokio::spawn_completion type then you could use its join handle to "forget" about that. That should be a somewhat and minimal way to not have to funnel completion up the chain.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean and leave it running in the background?
I think in order to avoid having those runaway tasks we will implement the scope proposal to work based on this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, what I have now for uring does this, even if the future is canceled it will still run in the background. But I think that is useful to mix with other control flow.

Co-Authored-By: Lucio Franco <luciofranco14@gmail.com>
Copy link

@DemiMarie DemiMarie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions

the main exception here are future types which need to run to completion, like
wrappers around IO completion operations. These are however unsafe by nature
and require careful implementation and review. Therefore the `unsafe` attribute
added here is justified.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to consider combinator authors. Combinators can be much more efficient in this design, but they will require unsafe code internally.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Combinators and any wrapping of futures. I'd like to see a section on combinators specifically. I fear that two disjunct traits would lead to a lot of duplication in combinators.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think that unfortunately combinators are so deeply affected that the proposal becomes infeasible in its current state:

Say a combinator like select: once returned from it, what happens to the unfinished future? Return to caller just like today? Who ensures the future is polled again before dropped?

Same for FuturesUnordered: Poll a couple of items out, but leave some inside, then drop it - compiler is none the wiser -- and the invariant is broken.

In general: A combinator that accepts completion futures must poll all of its inner futures to completion before returning from any of its async functions. It can neither store or return any incomplete futures because then they can be arbitrarily dropped.

Unfortunately, this gives us three bad options:

  • Combinators disallow this - severely degrading their utility.
  • Combinators allow this, but lose most of the guarantees of this proposal
  • Drop-bombs - panic at runtime if futures and combinators are not polled to completion at drop-time.

It all boils down to this existing drop-semantics we have today - whether we're in async or not. Hence, some rudimentary linear type features are necessary, as far as I can tell.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betamos a combinator that accepts a completion future would itself be a completion future, so this problem is pushed up to the caller and ultimately to the executor.

Same for FuturesUnordered: Poll a couple of items out, but leave some inside, then drop it - compiler is none the wiser -- and the invariant is broken.

FuturesUnordered is a Stream, and a completion variant would be needed for this as well. One approach would be to instead buffer all of the futures into a single Vec before returning them.

That said, FuturesUnordered’s approach is an anti-pattern in many cases. A better API is

let (result1, result2, result3) = select!(completion_future_1, completion_future_2, completion_future_3);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, but I think if we invalidate people's existing use of combinators, we're gonna have a harder time getting this through review. There's a lot more convincing to do.

FuturesUnordered is used under the hood in many other combinators and stream transforms too..

How would you solve the standard issue of:

  • A TCP Listener produces a stream of Connections
  • For each connection concurrently:
    • Run a handler which returns a result. The handler is a completion future.
  • Collect the results out-of-order (in a buffer that never exceeds O(concurrent connections))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a completion version of FuturesUnordered that itself must be polled to completion. Of course, there would need to be a method for cancellation as well.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is basically the structured concurrency scope/TaskGroup/Nursery thing.

tokio-rs/tokio#2579 had an implementation for Rust/tokio without support for run to completion tasks. https://github.com/DougGregor/swift-evolution/blob/structured-concurrency/proposals/nnnn-structured-concurrency.md proposes a TaskGroup for Swift which is basically the same: A FuturesUnordered for run-to-completion coroutines with builtin support graceful cancellation.

And as said on zulip I don't think the current select!/join! combinators should go away. They will still have their roles in waiting e.g. for a channel and a timer in parallel.

The new combinators will focus more on the orchestration of bigger tasks. E.g. on the server example that @betamos had the TCP listener would open a task group, and start the processing of each connection as a subtask in this group. If the listener gets cancelled (e.g. via SIGINT), the subtasks also receive a graceful cancellations signal which still allows them to e.g. finish processing the current request, and then shut down.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw: I think the original comment from @DemiMarie was about something different. I interpret it more has "combinators which can assume run to completion guarantees can be more efficient and offer different features", than a "old combinators are no longer useful".

I fully agree with that, but it's to be seen what combinators we really need. The number 1 on my list is really the TaskGroup thing, which however could also enable borrowing from the parent tasks as a new feature (like crossbeam scopes).

more similar to normal functions than current `async fn`s do. It might therefore
have been interesting to see `#[completion] async fn` as the new "default" async
function type - which might be the mostly used function type used by application
writers in the end. However this is not possible anymore, since `async fn` is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible in Rust 2021 😄


A future Edition of Rust could adopt a different keyword for async completion
functions in order to reduce the verbosity. This should be done
after carefully studying the usage of the various `async fn` types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
after carefully studying the usage of the various `async fn` types.
after carefully studying the usage of the various `async fn` types.
It would also be possible to make `RunToCompletionFuture` be the default. `#[completion] async` would become just `async`, and `async` would become `#[completion(false)] async` (or similar). This transformation can be performed automatically.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's totally possible. But I would rather not recommend this, since those kinds of semantic changes mean a lot of churn for people updating their projects, as well as might confuse people which work on projects in both editions. I therefore don't expect this to face a lot of acceptance.

Copy link

@betamos betamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm late on the ball. Lots of stuff in here is gold - and I have really been bothered by the same issue for a long time. +1000 we need to solve these issues.

In its current state, unfortunately I think it's not feasible due to the drop-semantics of Rust programs today.

async fn transmit_data() {
let buffer: String = "Data to transfer".into();
let data: Bytes = buffer.into();
let (bytes_transferred, data) = engine.send(data).await;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get how this workaround solves the problem. If engine.send is still a multi-step operation you could end up in an inconsistent state by using abortable/carelessly using select and so on? To me there is no reasonable workaround today (except through docs and manually checking each call site - impossible in e.g. a library). As such, it may give the wrong impression to state that this problem can be worked around.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mainly solves the problem of corrupting memory if the operation is aborted. With this workaround the memory is temporarily leaked while the operation continues running in the background, but the overall operation is safe.
As you mentioned, it might not solve other logic issues.


For certain functions it is absolutely necessary for correctness that they
actually runs to completion.
The reason is typically that the function forms an atomic transaction. If the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the buffer use-case is more "real-world", but there's an even simpler example:

async fn commit(&mut self) {
  self.a = true;
  something_else().await; // possible exit point
  self.b = true;
}

This method can abort at the await point and leave inconsistent data - very different from sync code. The next user of the value gets mutable access, and can't even detect the inconsistent state. Working around this "every await point is possible termination" is infeasible in practice.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Or just the pattern which is kind of common in other languages:

async fn doSth(&mut self) -> Result<(), ()> {
  if self.errored {
    return Err(());
  }
  self.errored = true;
  something_else().await?;
  self.errored = false;
  Ok(())
}

There's countless variations of it. The question is which ones are most easy to understand and follow for the audience.

text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
and still deliver a return value. It **could** however also continue to run for
a certain time. E.g. in order to finalize the important transaction.
3. The issuer of the cancellation request waits for the cancelled async function
to return. This can e.g. be achieved through a `WaitGroup` or `Semaphore` type.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow why this is necessary? Normally a cancelled function would just return a cancellation error, and the caller just handles the error.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does the cancellation error get to the caller? The answer here is: When the caller waits for the operation to complete, and notices the error.

So this step is necessary. Note that also in Rust we don't have any standardized exception base class, and therefore also can't have a standardized derived CancellationError. This means even if new primitives (e.g. a new file IO API using completion based IO) returns a CancellationError, the methods around it might need to translate it to whatever error type the next layer is expecting.

in a backwards compatible fashion. It will not break code already running
on those runtimes.

## Cancellation support for run to completion functions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps put in title "graceful cancellation". The way graceful cancellation is implemented with your proposal remains the same as it is today. I've already implemented something similar, for instance. The community would publish and converge on a pattern, perhaps even as part of Context in the future, as you say.

I'd shorten this section and state that graceful cancellation is itself not affected by the proposal, but that graceful cancellation becomes more important to get right since futures will increasingly become completion based. This is to avoid confusion that graceful cancellation must be fully resolved in order to make progress - which is not true. We already have the tools to build it!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are fully right. We can cover graceful cancellation fully separate from this proposal. And in fact I thought about proposing a stop_token similar to the C++ one which can be used in sync as well as async contexts.

The main reason this was brought up here is to avoid questions from readers along:

Futures where great because they where so easy to cancel - how do I do the same in this new world?

If you and other feels we should move it out of this doc to focus purely on run to completion, I'm also good with it.

[guide-level-explanation]: #guide-level-explanation

The proposal adds 3 new items to Rust:
- A new `Future` type, which will reside in `core::task`. In the remains of this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Trait. This had me confused until the end of this doc, where the trait was explained.

the main exception here are future types which need to run to completion, like
wrappers around IO completion operations. These are however unsafe by nature
and require careful implementation and review. Therefore the `unsafe` attribute
added here is justified.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Combinators and any wrapping of futures. I'd like to see a section on combinators specifically. I fear that two disjunct traits would lead to a lot of duplication in combinators.

actually be equal to the one `poll` itself, since `poll` is also expected to
perform any final cleanup.

A different alternative might be to pair run to completion methods later with
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see, we came to the same conclusion. I think defer is very interesting for multi-exit control flow that we have so much of in Rust with ?.

cooperatively cancel methods with run to completion semantics also had been
successfully deployed in these environments.

# Unresolved questions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about crash consistency, and how this works in the case of panics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement poll_drop, then panics could run poll_drop to completion as the stack is being unwound. C++ takes a similar approach via unhandled_exception.

the main exception here are future types which need to run to completion, like
wrappers around IO completion operations. These are however unsafe by nature
and require careful implementation and review. Therefore the `unsafe` attribute
added here is justified.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think that unfortunately combinators are so deeply affected that the proposal becomes infeasible in its current state:

Say a combinator like select: once returned from it, what happens to the unfinished future? Return to caller just like today? Who ensures the future is polled again before dropped?

Same for FuturesUnordered: Poll a couple of items out, but leave some inside, then drop it - compiler is none the wiser -- and the invariant is broken.

In general: A combinator that accepts completion futures must poll all of its inner futures to completion before returning from any of its async functions. It can neither store or return any incomplete futures because then they can be arbitrarily dropped.

Unfortunately, this gives us three bad options:

  • Combinators disallow this - severely degrading their utility.
  • Combinators allow this, but lose most of the guarantees of this proposal
  • Drop-bombs - panic at runtime if futures and combinators are not polled to completion at drop-time.

It all boils down to this existing drop-semantics we have today - whether we're in async or not. Hence, some rudimentary linear type features are necessary, as far as I can tell.

the main exception here are future types which need to run to completion, like
wrappers around IO completion operations. These are however unsafe by nature
and require careful implementation and review. Therefore the `unsafe` attribute
added here is justified.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betamos a combinator that accepts a completion future would itself be a completion future, so this problem is pushed up to the caller and ultimately to the executor.

Same for FuturesUnordered: Poll a couple of items out, but leave some inside, then drop it - compiler is none the wiser -- and the invariant is broken.

FuturesUnordered is a Stream, and a completion variant would be needed for this as well. One approach would be to instead buffer all of the futures into a single Vec before returning them.

That said, FuturesUnordered’s approach is an anti-pattern in many cases. A better API is

let (result1, result2, result3) = select!(completion_future_1, completion_future_2, completion_future_3);

text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
/// Callers are not allowed `drop()` a future which returned `Pending` as
/// its last poll result. Futures are only allowed to be dropped if they
/// either had never been polled, or if the last `poll()` call returned `Ready`.
unsafe fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A supertrait of Future would be fine; any Future is trivially a RunToCompletionFuture.

Panics are a bigger problem. There are a few solutions I can think of:

  • Poll drop: whenever a panic happens in async contexts, any objects on the stack are dropped asynchronously. The actual panic doesn’t propagate until all the poll_drop methods have returned Poll::Ready. I believe this is how try-finally works in async ECMAScript functions. Since the only way to call RunToCompletionFuture::poll without using unsafe is via .await, this should work.
  • Premature drop calls abort: leaking a polled RunToCompletionFuture would still be undefined behavior, but dropping it would call abort instead.


Alternative proposals, like the one in
https://internals.rust-lang.org/t/pre-pre-rfc-unsafe-futures, also proposed to
directly add a `.cancel()` method on the new `Future` type which needs to initiate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I believe that robust handling of panic in #[completion] async fns requires a .poll_drop() that is invoked by the compiler in such cases. This is effectively a .cancel() method.

Copy link

@betamos betamos Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying this is a bad idea at all - but cancel and drop are completely different in my mind. Since drop is on the panic path, it simply attempts a best effort cleanup (dropping file descriptors), and protects against memory unsafety.

Cancel OTOH is highly cooperative - and on the expected path. Say you have a search problem where three tasks race to find a solution. Once one is found, the other two are cancelled - gracefully, and awaited as normal (their return values would be Err(Canceled) and discarded).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I suspect the implementations will often coincide, but that is by no means guaranteed.

trait to use it: The `unsafe` contract in the trait could require callers to
either `.poll()` the future to completion, or to switch over to calling `poll_drop`.

However the benefit seems low. By doing this the Future would again move away

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic handling seems to require poll_drop or similar, unless one wants to force panic=abort for async code.

Copy link

@betamos betamos Jan 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. The issue I find most hard is during parallelism, i.e. real threads. This occurs in multithreaded async runtimes, but also without async in regular threads. Crossbeam's scoped threads catch the panic and waits for the threads to terminate. Which can wait forever. It's somewhat terrifying that panic propagation is unbounded in time, but similarly terrifying to not be able to statically borrow between threads.. I have no good answers here - just sharing my thoughts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic propagation is already unbounded in time, since we have Turing-complete destructors. What this does mean is that we need std::panic::catch_unwind to be extremely fast, ideally free.

cooperatively cancel methods with run to completion semantics also had been
successfully deployed in these environments.

# Unresolved questions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implement poll_drop, then panics could run poll_drop to completion as the stack is being unwound. C++ takes a similar approach via unhandled_exception.

Co-authored-by: Demi Marie Obenour <demiobenour@gmail.com>

For certain functions it is absolutely necessary for correctness that they
actually runs to completion.
The reason is typically that the function forms an atomic transaction. If the
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Or just the pattern which is kind of common in other languages:

async fn doSth(&mut self) -> Result<(), ()> {
  if self.errored {
    return Err(());
  }
  self.errored = true;
  something_else().await?;
  self.errored = false;
  Ok(())
}

There's countless variations of it. The question is which ones are most easy to understand and follow for the audience.

in a backwards compatible fashion. It will not break code already running
on those runtimes.

## Cancellation support for run to completion functions
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are fully right. We can cover graceful cancellation fully separate from this proposal. And in fact I thought about proposing a stop_token similar to the C++ one which can be used in sync as well as async contexts.

The main reason this was brought up here is to avoid questions from readers along:

Futures where great because they where so easy to cancel - how do I do the same in this new world?

If you and other feels we should move it out of this doc to focus purely on run to completion, I'm also good with it.

and still deliver a return value. It **could** however also continue to run for
a certain time. E.g. in order to finalize the important transaction.
3. The issuer of the cancellation request waits for the cancelled async function
to return. This can e.g. be achieved through a `WaitGroup` or `Semaphore` type.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does the cancellation error get to the caller? The answer here is: When the caller waits for the operation to complete, and notices the error.

So this step is necessary. Note that also in Rust we don't have any standardized exception base class, and therefore also can't have a standardized derived CancellationError. This means even if new primitives (e.g. a new file IO API using completion based IO) returns a CancellationError, the methods around it might need to translate it to whatever error type the next layer is expecting.

text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
text/0000-run-to-completion-async-fn.md Outdated Show resolved Hide resolved
/// Callers are not allowed `drop()` a future which returned `Pending` as
/// its last poll result. Futures are only allowed to be dropped if they
/// either had never been polled, or if the last `poll()` call returned `Ready`.
unsafe fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether there is a big difference between requiring the callers to call poll_drop after a panic, or simply continue to call poll. Since everything that the compiler puts into poll_drop could be in the same way in poll.

Maybe one of the arguments for a dedicated poll_drop is that it doesn't have the output value that poll has.
I think I would be ok with adding a poll_drop to the new Future - but more for pure compiler-generated drop glue than for humans to implement.

Premature drop calls abort: leaking a polled RunToCompletionFuture would still be undefined behavior, but dropping it would call abort instead.

Yeah, that's the "simple way out". For my use-cases I would be rather fine with it, since I think panics + coroutines/futures is a super indeterministic thing anyway. But if others rely on the ability to handle panics in async run to completion contexts, it should indeed be taken into account.

Is this implication always true? I buy that the consistency and result of the function is inconsistent and "poisoned", but memory safety sounds like it would only apply in certain cases, such as the subtask-example.

It doesn't matter whether its 1 example or thousands. If it's more than 0 examples it has to be labeled as unsafe in Rust. People rely on safe functions really being safe - even under heavy API misuse. That's why there also had been very unpleasant discussions for authors of crates which provided support for completion based IO. Those had APIs which very basically always safe, as long as one doesn't do really weird things with them. But still they got push-back from the community about not marking them unsafe because theoretically those could lead to memory issues.

the main exception here are future types which need to run to completion, like
wrappers around IO completion operations. These are however unsafe by nature
and require careful implementation and review. Therefore the `unsafe` attribute
added here is justified.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is basically the structured concurrency scope/TaskGroup/Nursery thing.

tokio-rs/tokio#2579 had an implementation for Rust/tokio without support for run to completion tasks. https://github.com/DougGregor/swift-evolution/blob/structured-concurrency/proposals/nnnn-structured-concurrency.md proposes a TaskGroup for Swift which is basically the same: A FuturesUnordered for run-to-completion coroutines with builtin support graceful cancellation.

And as said on zulip I don't think the current select!/join! combinators should go away. They will still have their roles in waiting e.g. for a channel and a timer in parallel.

The new combinators will focus more on the orchestration of bigger tasks. E.g. on the server example that @betamos had the TCP listener would open a task group, and start the processing of each connection as a subtask in this group. If the listener gets cancelled (e.g. via SIGINT), the subtasks also receive a graceful cancellations signal which still allows them to e.g. finish processing the current request, and then shut down.

the main exception here are future types which need to run to completion, like
wrappers around IO completion operations. These are however unsafe by nature
and require careful implementation and review. Therefore the `unsafe` attribute
added here is justified.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw: I think the original comment from @DemiMarie was about something different. I interpret it more has "combinators which can assume run to completion guarantees can be more efficient and offer different features", than a "old combinators are no longer useful".

I fully agree with that, but it's to be seen what combinators we really need. The number 1 on my list is really the TaskGroup thing, which however could also enable borrowing from the parent tasks as a new feature (like crossbeam scopes).

@SabrinaJewson
Copy link

I have published a crate based off the ideas in this RFC - completion. A couple things to note:

  • I did not make CompletionFuture a supertrait of Future. Doing so would prevent implementing CompletionFuture for Box<S: CompletionFuture> as it would overlap with the Future for Box<S: Future> implementation. It would also prevent implementing Future for wrappers whose inner type also implements Future, which could be useful.
  • If the completion future panics, I allowed it to be dropped. I chose this because it simplifies implementation and it should be up to the I/O futures to not panic as part of their soundness. Types that poll multiple futures at once (like Zip) are implemented with catch_unwind.

@Matthias247
Copy link
Owner Author

I have published a crate based off the ideas in this RFC - completion. A couple things to note:

Wow, that's pretty cool! Thanks for investing all the efforts to dive into this - it looks pretty immense! I did not think about using proc-macros to polyfill it, and wasn't even sure it would be possible.

How do you allow to await CompletionFutures only in completion functions? Is it using some::path::__FutureOrCompletionFuture(fut1).__into_future_unsafe().await;? So it wraps every future in a completion block in this kind of wrapper future, and polls it that way? I guess that might work, but you might need to bubble the unsafe attribute a bit higher up to be fully correct.

One thing I noticed is that you also implemented already the whole ecosystem around it (combinators, completion streams, completion async trait, etc). I think that's great in order to figure out whether there is a big oversight or blocker. Did you experience anything like this?
I however don't think we should look into stabilizing all those things at once. As we learned from the normal work on futures, it's a long process to get concensus on all of those features.

I did not make CompletionFuture a supertrait of Future. Doing so would prevent implementing CompletionFuture for Box<S: CompletionFuture> as it would overlap with the Future for Box<S: Future> implementation. It would also prevent implementing Future for wrappers whose inner type also implements Future, which could be useful.

Thanks. Those are good arguments for using blanket impls instead of using a supertrait.

If the completion future panics, I allowed it to be dropped. I chose this because it simplifies implementation and it should be up to the I/O futures to not panic as part of their soundness. Types that poll multiple futures at once (like Zip) are implemented with catch_unwind.

The panic behavior (as also pointed out by @DemiMarie and @betamos ) is definitely something that needs a bit more thought. I think the overall pitch needs to be "just works like a synchronous function", and this should also apply to panics. If a panic happens inside the function, it can directly return from there. The main tricky part is "what happens if a panic happens more towards the root of a task/await stack" - e.g. in the stack of a TaskGroup which manages multiple completion task. Those still need to run to completion, and therefore the panic can't immediatel tear everything down. It needs to continue to poll those inner tasks. In order to do this, the CompletionFuture which represents the TaskGroups scope might need to have a poll_drop method which is called in that path - or there is just a flag set and the next layer continues to call the normal poll function. Maybe this part also needs to be prototyped to get a better understanding what is required.

Another reason for a requirement to add poll_drop could be the addition of a defer function for both sync and async function, which could guarantee that code is even executed after panics - which seems related to what @betamos is looking for. That code path could then be executed inside the poll_drop function. However I am speculating the compiler might also be able to move all that code into the poll method.

@SabrinaJewson
Copy link

How do you allow to await CompletionFutures only in completion functions? Is it using some::path::__FutureOrCompletionFuture(fut1).__into_future_unsafe().await;? So it wraps every future in a completion block in this kind of wrapper future, and polls it that way? I guess that might work, but you might need to bubble the unsafe attribute a bit higher up to be fully correct.

This is explained more in src/macros/future.rs. It uses inherent method-based specialization: __FutureOrCompletionFuture<F: Future> has a no-op method __into_future_unsafe, and additionally there is a trait that provides a method __into_future_unsafe for __FutureOrCompletionFuture<F> which wraps it in an AssertCompletes. I'm not sure what you mean by bubbling the unsafe, but it's sound because that transformation is only applied to top-level .awaits inside the macro, and so the containing async block is guaranteed to complete by the macro.

I think that's great in order to figure out whether there is a big oversight or blocker. Did you experience anything like this?

There was nothing major I experienced (assuming my code is sound 😆). Most of the implementations are identical to ones in futures, but with CompletionFuture instead of Future.

I however don't think we should look into stabilizing all those things at once. As we learned from the normal work on futures, it's a long process to get concensus on all of those features.

The majority of the features I implemented were either already implemented in the standard library, futures, futures-lite or Tokio - I haven't been particularly adventurous with it. My only original creations are AssertCompletes and MustComplete, the I/O traits, ReadBufMut, and io::TakeUntil (just because it made implementing read_until and read_line easier). But yes, I'm not looking to stabilize any of this any time soon.

With regards to the panics, I have a suggestion. First, when poll panics the future is allowed to be dropped. Second, we add a new method to CompletionFuture:

fn cancel(self: Pin<&mut Self>);

All it will do is suggest to the future that it might want to cancel its operation. There are a couple reasons I think this is a good idea:

  1. It plays excellently with APIs like io_uring. The cancel method will simply send an async cancel request with the user data that the future already knows about.
  2. It avoids the complexity of multiple methods and several phases of the future. The future will only have one phase - running - and it can only complete in one way.
  3. It provides a way for dealing with cancellation in general.

I'm unsure about the details of this however - should it have an empty default implementation or not? And how would it work with async blocks?

With regards to defer, I consider that somewhat orthogonal. If async destructors get added to the language they'll be run on panic via the panicking mechanism, and there's no difference there when compared to regular futures.

@Matthias247
Copy link
Owner Author

With regards to the panics, I have a suggestion. First, when poll panics the future is allowed to be dropped. Second, we add a new method to CompletionFuture:

This is discussed in the section Omission of cancellation in the new RunToCompletionFuture type of the RFC, and the rationale for leaving it out is pointed out (it seems like it can be done on application level - and a suitable StopToken/CancellationToken can be standardized orthogonal to this effort, or just left in a foundational library).

If panic needs to call cancel this obviously won't work, since the compiler isn't aware about the token. But I'm not sure yet whether it needs to, or whether the panic handler in a higher level future (like a TaskGroup) would simply need to be aware of it's application-defined cancel token, cancel that, and then poll futures inside it to completion independent of how long it takes.

@SabrinaJewson
Copy link

SabrinaJewson commented Jan 18, 2021

My main problem with CancellationToken-based cancellation is that I would really like these two lines of code to just work:

select(a.write(data), b.write(data)).await;
select(#[completion] async { a.write(data).await }, #[completion] async { b.write(data).await }).await;

Having to using tokens just feels too verbose:

select(|token| a.write(data, token), |token| b.write(data, token)).await;
select(
    |token| #[completion] async { a.write(data, token).await },
    |token| #[completion] async { b.write(data, token).await },
).await;

And especially in a function like join/zip where a cancellation token has to be passed in case of panics, the user is forced to write out the whole closure just to support the really minor edge case of panicking. So while you're right that it can be done at the application level, it would be a lot more ergonomic to do it at this level.

This is why I now think that cancellation should be built-in to the CompletionFuture trait, because otherwise functions like join just become a pain to use. For compiler-generated #[completion] async block futures, this would be easy to implement - cancel would call cancel on the currently .awaited future and set a flag in the generated future, then when the .awaited future completes the outer future will exit. This method of cancellation could also be implemented in my macro.

Edit: Actually, on further thought my method would require having fn poll_cancel(self: Pin<&mut Self>, cx: Context<'_>) -> Poll<()>; because otherwise cancelled futures would be forced to return a value. I didn't really want to do that because of the complexity of having multiple states and having to call them in a specific order, I wonder if there's a better way.

@SabrinaJewson
Copy link

I would like to decide on a solution for cancellation so I can implement it in my library - @Matthias247, what do you think of my poll_cancel approach?

@Matthias247
Copy link
Owner Author

@KaiJewson sorry, currently also on a few other projects, therefore i didn’t have more time to dig into this yet.

I think for normal cancellation - initiated by other code path in the application - this approach won’t work. The cancellation can happen concurrently while the future is executing. Which means a mutable reference to it won’t be available.

I think a cancel method which takes a non mutable reference that inits the cancellation, plus another method that polls to completion can work.

But since that is equivalent to just passing a cancellation/stop token, and waiting with the normal poll method, i don’t think there is any benefit in this. The only use case I can currently see is regarding handling panics of the super task, and I didn’t think about it more in detail yet.

@SabrinaJewson
Copy link

I think for normal cancellation - initiated by other code path in the application - this approach won’t work.

Is this a common use case? In the current async ecosystem, I think that most premature dropping of futures occurs in select or when one of the futures in join panics. Personally I've never remotely cancelled a future.

I think a cancel method which takes a non mutable reference that inits the cancellation, plus another method that polls to completion can work.

If the poller has a mutable reference to it the canceller won't be able to get a shared reference either, so I don't think that this would have any advantages over poll_cancel for remote cancellation.

@SabrinaJewson
Copy link

SabrinaJewson commented Feb 4, 2021

An idea just occured to me. Currently, I/O futures typically register their provided Wakers in the reactor. However, since they can be forgotten at any time, it is necessary to make sure the wakers they use have the ability to outlive the future, so that use-after-frees do not occur. But now that we have completion futures, I/O futures have the ability to utilize a waker that only lives for as long as the future itself - so in theory we don't even need an Arc to store wakers.

The way this could be supported is by (1) adding a new unsafe method raw_clone to Waker that simply copies over the pointer and vtable pointer to a new Waker and (2) enforcing that when raw_clone is called on the waker passed to CompletionFuture::poll, that waker must be valid as long as the future exists. Callers can then use lazily-allocated wakers, which will probably look like this:

struct LazyWaker {
    heap: OnceCell<Arc<WakerState>>,
    local: WakerState,
}

So callers simply need to store a LazyWaker that lives for as long as the future. This system would mean that in the common case, no allocations are needed at all.

Edit: After more thinking, I realized that a raw_clone-based method wouldn't work because destructors would be called twice. Instead, the API could look something like this:

#[derive(Debug, Clone, Copy)]
struct WakerRef<'a> { ... }
impl WakerRef<'_> {
    pub fn wake(self) { ... }
}
impl Waker {
    pub fn as_ref(&self) -> WakerRef<'_> { ... }
}

Completion futures would then be given the guarantee that if they transmute WakerRef to a longer lifetime it is guaranteed to be valid as long as the future is running.

Edit 2: I have now experimentally implemented this in completion, and with it I wrote a zero-allocation implementation of block_on.

Edit 3: Turns out, at least for block_on, this method is slower. Additionally it adds a lot of complexity to implementations of things like executors mostly because it has to handle both the case of the waker being cloned and the waker not being cloned. So even though this could in theory provide performance advantages, I'm going to give up on this. Sorry for writing a page of text for nothing :D

@Venryx
Copy link

Venryx commented Apr 7, 2022

Just wanted to mention that, as a newcomer to Rust, this behavior of futures being "silently cancelled" was very unintuitive to me, and required several hours of (confused) debugging for me to diagnose.

A simplified version of what perplexed me is below:

pub async fn test1() {
    let mtx = Mtx {};

    // === [part 1/4]: this ran
    println!("1");

    tokio::time::sleep(Duration::from_millis(5000)).await;
    
    // === [part 2/4]: but this didn't run
    println!("2");

    mtx.end();
}

pub struct Mtx {}
impl Mtx {
    pub fn end(&self) {
        // === [part 3/4]: this didn't run either
        println!("Mtx::end called.");
    }
}
impl Drop for Mtx {
    fn drop(&mut self) {
        // === [part 4/4]: and yet somehow, this *did* run, despite parts 2 and 3 above not running
        println!("Mtx::drop called.");
    }
}

Ultimately, the issue seems to have been that my GraphQL server library async-graphql was dropping/canceling the futures for a query/subscription resolver when it saw that the client disconnected or canceled its subscription.

This led to confusion for me though, because I kept looking at my logs and wondering how the mtx struct was getting dropped without the println!("2") line executing. I eventually found this pull-request by searching for "rust await drop", and what it describes helped me identify what the likely underlying cause is.

All that to say: This feature would be very useful to me! (to ensure that in the futures I write, I can rely on the lines in my functions all executing, rather than cancelling at arbitrary points mid-way; I can see the usefulness of that in some cases, but in others it can be confusing and/or cause inconsistent states!)

In the meantime, is there any way to:

  1. Somehow wrap or modify my async function to ensure it always runs to completion? (I noticed the completion crate mentioned above, but from what I can tell, you can only use that if you control the "root" async function, such that you can modify it to return CompletionFuture rather than Future?)
  2. If no. 1 isn't possible yet, is there some way to "detect when a future is cancelled mid way", and immediately log a warning? (to at least help avoid the case of trying to debug an async function's execution, only to find that the issue was due to it getting canceled in the middle)

@DemiMarie
Copy link

Not sure.

Venryx added a commit to debate-map/app that referenced this pull request Apr 8, 2022
…e the write-lock acquisition was attempted prior to the read lock being dropped.

* Slightly improved formatting for error stack-traces.
* Added a new `Result::expect_lazy` function -- like `except`, but with a closure passed rather than a string.
* Fixed a couple mistakes in lq_batch. (batching system still not working yet, but making progress)

Note: The deadlock above took a long time to find, due to my not realizing that Rust futures can be "canceled" in the middle of a function -- causing the structs in the scope to be dropped, despite the scope's statements not having all completed. See here for more info: Matthias247/rfcs#1 (comment)
@Venryx
Copy link

Venryx commented Apr 9, 2022

For my first question, I've found a solution: use tokio::spawn within the async function, to create an independent future that will run to completion, and then just await the result from that "inner future".

Example:

async fn func_returning_future_that_might_be_dropped(param: Arc<MyParam>) -> MyData {
    let param_clone = param.clone();
    let result = tokio::spawn(async move {
        let my_data = get_my_data(param_clone).await;
        my_data
    }).await.unwrap();
    result
}

For this approach to work, apparently every variable that gets captured by the inner-scope must be "owned", such that tokio::spawn can capture ownership into itself, to satisfy the 'static constraint that tokio::spawn requires.

In my case, this was not too difficult, because the data I needed passed in was already cloneable (eg. by using Arc). If that weren't the case, restructuring the program to satisfy that constraint could work, but would be inconvenient. So I will likely raise an issue in the async-graphql repo, as this seems like a pretty serious foot-gun (that the async functions used for the graphql resolvers can get terminated mid-function at any .await call).

Venryx added a commit to debate-map/app that referenced this pull request Apr 9, 2022
… resolvers, to prevent the async functions from getting terminated at arbitrary `.await` points -- as would happen when the outer async-graphql future gets abruptly dropped/terminated. (see: Matthias247/rfcs#1 (comment))

* With the change above, the query-batching system finally works! (albeit slowly atm, due to lock contention)
@Matthias247
Copy link
Owner Author

Hi @Venryx . Thanks for your experience report. If you find the feature useful, it might be best if you can write your experience report at https://rust-lang.zulipchat.com/#narrow/stream/187312-wg-async where the async working group is reading.

Btw: Note that your workaround is only correct if the tokio runtime that is enclosing the code doesn't get dropped. If it would, tasks could still not run to completion. This is also one of the reasons why the 'static lifetime is necessary.

If no. 1 isn't possible yet, is there some way to "detect when a future is cancelled mid way", and immediately log a warning?

This should be possible btw. You could wrap your Future in a WarningFuture type with a custom .poll() method and a drop implemention which emits the warning if Poll::Ready has not been returned. But you will need to make sure you don't miss wrapping all Futures with that one.