Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Introduce API to avoid cancel/fail races with CancellableContinuation #830
Consider this attempt to integrate coroutines with cancellable callback-based API using
So, this code attempts to avoid calling
There is no public (non-experimental, non-internal) API at the moment to work around this race.
W1. Use internal API on
Replace LINE (1) and LINE(2) with the following code:
This workaround ignores exceptions when continuation is cancelled and this is the disadvantage of this workaround at the same time. If a true exception (failure) happens concurrently with cancellation, then it is going to be ignored instead of being handled. Arguably this problem is much less severe, but still.
W2. Use internal API on
This workaround correctly handles exceptions that occur concurrently with cancellation (the call is either going to be cancelled or fails and we learn what had happened).
There are several possible solutions:
S1. Ignore exceptions when resuming a cancelled continuation using resumeWithException. Technically, this is a breaking change, but that is not a major problem here. The problem is that there could be a genuine exception that needs to be handled and ignoring it in
S2. Introduce new
S3. New API for cancellable callbacks with an intermediate cancelling state. The idea is that invocation of
changed the title from
Introduce CancellableContinuation API to avoid cancel/fail races
Introduce API to avoid cancel/fail races with CancellableContinuation
Nov 14, 2018
Let me clarify S3. Right now, when
With S3 the idea the we don't immediately resume a cancelled coroutine. Instead, we mark it as "cancelling" and wait until
But that means the coroutine may be dangling in a cancelling state if resumeXxx is never called? I mean, in OkHttp, if we have a request that is still in progress and there's an outer cancellation before it miserably fails with a 404 error or an IOException, invokeOnClose lambda with be called, which will call cancel() on the Call<T>, and resumeXxx will never be called as the callbacks are unregistered/forgotten on after cancel() call, so the suspendCancellableCoroutine machinery would be waiting for a resumeXxx call that will never happen. I think I misunderstood something, maybe we're not talking about the same resumeXxx call place?…
On Wed, Nov 14, 2018, 4:27 PM Roman Elizarov ***@***.***> wrote: Let me clarify *S3*. Right now, when CancellableContinuation is cancelled it immediately resumes the coroutine with CancellationException, so now if the operation we were waiting for had already crashed and resumeWithException is about to be called, it has no choice but to either ignore this exception or invoke uncaught exception handler. With *S3* the idea the we don't immediately resume a cancelled coroutine. Instead, we mark it as "cancelling" and wait until resumeXxx is invoked, so that if resumeWithException is invoked, we can resume the coroutine with this exception, so that we don't have this ugly choice between losing exception or handling it as uncaught one. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#830 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGpvBYMSZdFse0uV03LwglMj9jEtGLNdks5uvDZsgaJpZM4YdSMH> .
referenced this issue
Nov 14, 2018
@elizarov Since all my tests and beta works perfectly with W1 I think I'll go to prod with that solution.
Since it's internal, is there any risk that this workaround is no more working at some point before this new API is created, or is the risk low enough to be OK for prod?
@elizarov So went to prod with https://gist.github.com/Tolriq/dcbdc62c0e29d3b321034e990c3c85ce containing Workaround 1.
While it greatly reduce the problem (Catched exception re thrown) (no more seen in 7K user beta group) it still happens with a way larger active user base.
I then tested to add a CoroutineExceptionHandler in case the issue was somewhere else it did not fix anything.
I then tested Workaround 2 and it did not change anything.
I then stopped relying on Exceptions and encapsulated them in a sealed class
And problem solved.
So I wonder where else in the code there's a race, maybe somewhere I the worker pool I use, or maybe inside couroutine or channels, but no idea where to search.
My expectation is that most APIs will still call the completion callback on a cancellation scenario, so S3 seems adequate for most cases (@LouisCAD regarding Retrofit2, the documentation is not clear, however there seems to be a test just for that case - https://github.com/square/retrofit/blob/master/retrofit/src/test/java/retrofit2/CallTest.java#L647)
IMO, there should also be a non-internal/experimental way of achieving the behaviour of W2 also for the success case, i.e., ignoring a cancellation if the callback is called with success.
referenced this issue
Dec 4, 2018
I occur the same error.