Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak CancellableContinuation.invokeOnCancellation contract for consistency #1915

Closed
elizarov opened this issue Apr 16, 2020 · 3 comments
Closed

Comments

@elizarov
Copy link
Contributor

Consider the following code that uses suspendCancellableCoroutine:

val result = suspendCancellableCoroutiune<Unit> { cont: CancellableContinuation<Unit> ->
   // call later, asynchronously
   cont.resume(Unit)
}

This call to suspendCancellableCoroutiune will not necessarily produce result == Unit. It may also end up throwing CancellationException. While indistinguishable from the outside, internally there are two conceptually different cases when CancellationExcption could be produced:

  • Case A: The coroutine running this code may get canceled before the call to cont.resume(Unit).
  • Case B: The coroutine running this code may get canceled after the call to cont.resume(Unit), but before the continuation gets dispatched to the caller.

Note, that in a general case the caller of cont.resume(Unit) cannot distinguish these two cancellation cases either, because coroutine cancellation is asynchronous and may race with the call to cont.resume(Unit). The latter does not provide any indication to the caller whether cancellation or resumption came first. There's only an internal API called tryResume that can be used to find it out.

Now there are two ways to install a handler that gets invoked on cancellation:

They are currently not consistent between themselves with respect as to which cancellation cases they react to:

  • cont.invokeOnCancelltion { ... } continuation handler only gets invoked in case A of cancellation. If resume happened before cancellation, but the coroutine was canceled later (case B), then the continuation cancellation handler is not called.
  • cont.resume(Unit) { ... } cancellation handler gets invoked in both cases.

The proposal is to make them consistent so that both are invoked in both cases. While this is technically a breaking change, it is not clear what kind of correctly working code this kind of change can break, since the difference in behavior would only manifest in cases when cont.resume was called after cancellation, but there is no way to figure out whether cancel or resume was first anyway.

There is a motivation beyond consistency, too. This inconsistency turned out while working on #1813. When the prototype without atomic cancellation was implemented, the implementation of Semaphore stoped to work correctly in case of acquire cancellation. The root of the problem is that with atomic cancellation the case B was not possible and there was no need to worry about it. However, making the proposed change to invokeOnCancellation (so that it is called in case B) fixed Semaphore. This leads us to believe that this change may also eliminate some cancel/resume races that currently exist in 3rd party code, too, because 3rd party code did not have access to atomic cancellation so it did not have any means to distinguish two cases of cancellation and was likely incorrect in racing scenarios.

elizarov added a commit that referenced this issue Apr 21, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Apr 22, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Apr 22, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Apr 22, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
@1zaman
Copy link
Contributor

1zaman commented Jun 7, 2020

It seems to me that the two APIs have different use-cases in most cases:

  • cont.invokeOnCancellation { ... } is used to handle cancellation while the function is still suspended, by cancelling the asynchronous query that's running.
  • cont.resume(Unit) { ... } is used to handle cancellation after the function has resumed the continuation by returning a result, by potentially cleaning up the result which is now being discarded.

If the cancellation and resumption are done in separate threads, then they can race. This should not matter for the cont.resume(Unit) { ... } callback, since it will be invoked in any outcome of the race, which is the desired behaviour if we want to ensure that the result is handled before being discarded. For cont.invokeOnCancellation { ... }, this means that there might be cases where it's invoked even after the suspending function has received a result.

Since it can happen in race conditions anyway, I can see why it might seem a good idea to extend cont.invokeOnCancellation { ... } to handle all cancellations after resumption. Most futures/promises APIs should have no problem with handling cancellations after they have completed, and this can fix the cases like Semaphore where it needs to handle cancellation at all points in the same manner. However, if an implementation can't handle cancellations after resumption, and avoids race conditions by ensuring that both cancellations and resumptions are always performed on the same thread, then it could be broken by the proposed change.

For example, the Glide image loading library in Android does not provide an API for cancelling a request directly; instead we can only cancel the ongoing request on a specific target view. Therefore cancellation can only be performed correctly while the request is ongoing, since performing the cancellation after completion might cause a different request to be cancelled. Since it's a UI library, it's bound to the main thread, and all calls, callbacks, and cancellations can only be done on the main thread. Due to this, there can be no race conditions in cancellation and resumption calls, so using cont.invokeOnCancellation { ... } will ensure that the cancellation would only be performed on an ongoing request. This implementation would be broken after the proposed change, with the only alternative being to track resumption manually, and add a check for it in the cancellation handler. Of course, it will be fine as long as the Dispatchers.Main.immediate dispatcher is used, since it can do the dispatch directly without there being any intervening period where the thread is executing, but if the Dispatchers.Main dispatcher is used, then it will break.

While such cases are probably rare, cases like Semaphore where there needs to be one cancellation handler to be invoked on all cancellations might also be rare. Such cases could be handled by combining both cancellation APIs, though they would need to handle the case where both handlers are invoked due to race conditions. Another alternative might be to add a new API for handling all cancellations, whether encountered before or after resumption.

@elizarov
Copy link
Contributor Author

elizarov commented Jun 8, 2020

However, if an implementation can't handle cancellations after resumption, and avoids race conditions by ensuring that both cancellations and resumptions are always performed on the same thread, then it could be broken by the proposed change.

@1zaman Cancellation with coroutines is always asynchronous in nature and can be initiated by any thread. You cannot simply avoid it. The only way to ensure that cancellation and resumption are always performed on the same thread is to only use a single thread in your whole application, which is impractical limitation.

For example, the Glide image loading library in Android does not provide an API for cancelling a request directly; instead we can only cancel the ongoing request on a specific target view.

Note, that cont.invokeOnCancellation { ... } lambda can be called from any thread at any time now (and this is going to stay). So, to integrate with a library that provides only limited cancellation options you'll have to dispatch onto the right thread from inside of invokeOnCancellation and double-check all the relevant conditions in the right thread anyway, so I don't see how this code will be affected by the change in when invokeOnCancellation is called in the first place.

@1zaman
Copy link
Contributor

1zaman commented Jun 8, 2020

Hmm, this library can only be used from UI code, so both the scope cancellation and resumption will always be done from the main thread. However, I suppose there could also be edge cases like the coroutine being cancelled from another (child?) coroutine that was running in a separate thread, and therefore we should be prepared to handle cancellations from other threads as well.

So I suppose we can conclude that even though this change might break some working implementations in rare cases, these cases wouldn't have been implemented correctly according to the documentation to begin with. In light of this, I'll remove my objections to this change.

elizarov added a commit that referenced this issue Jun 16, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
elizarov added a commit that referenced this issue Sep 16, 2020
This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for #1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic

Fixes #1265
Fixes #1813
Fixes #1915
recheej pushed a commit to recheej/kotlinx.coroutines that referenced this issue Dec 28, 2020
…otlin#1937)

This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for Kotlin#1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic
* Channel onUnderliveredElement is introduced as a replacement.

Fixes Kotlin#1265
Fixes Kotlin#1813
Fixes Kotlin#1915
Fixes Kotlin#1936

Co-authored-by: Louis CAD <louis.cognault@gmail.com>
Co-authored-by: Vsevolod Tolstopyatov <qwwdfsad@gmail.com>
recheej pushed a commit to recheej/kotlinx.coroutines that referenced this issue Dec 28, 2020
…otlin#1937)

This is a problematic for Android when Main dispatcher is cancelled on destroyed activity.
Atomic nature of channels is designed to prevent loss of elements,
which is really not an issue for a typical application, but creates problem when used with channels.

* Internal suspendAtomicCancellableCoroutine -> suspendCancellableCoroutine
* Internal suspendAtomicCancellableCoroutineReusable -> suspendCancellableCoroutineReusable
* Remove atomic cancellation from docs
* Ensures that flowOn does not resume downstream after cancellation.
* MODE_ATOMIC_DEFAULT renamed into MODE_ATOMIC
* Introduced MODE_CANCELLABLE_REUSABLE to track suspendCancellableCoroutineReusable
* Better documentation for MODE_XXX constants.
* Added stress test for proper handling of MODE_CANCELLABLE_REUSABLE
  and fixed test for Kotlin#1123 bug with job.join (working in MODE_CANCELLABLE) that was not
  properly failing in the absence of the proper code in CancellableContinuationImpl.getResult
* Added test for Flow.combine that should be fixed
* Support extended invokeOnCancellation contract
* Introduced internal tryResumeAtomic
* Channel onUnderliveredElement is introduced as a replacement.

Fixes Kotlin#1265
Fixes Kotlin#1813
Fixes Kotlin#1915
Fixes Kotlin#1936

Co-authored-by: Louis CAD <louis.cognault@gmail.com>
Co-authored-by: Vsevolod Tolstopyatov <qwwdfsad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants