Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CancellationException can be thrown not only to indicate cancellation #3658

Open
dkhalanskyjb opened this issue Mar 6, 2023 · 18 comments
Open
Labels

Comments

@dkhalanskyjb
Copy link
Contributor

https://betterprogramming.pub/the-silent-killer-thats-crashing-your-coroutines-9171d1e8f79b raises a good point: CancellationException should indicate that the coroutine in which it is raised is cancelled, but it's not always actually the case:

    fun testFoo() {
        val scope = CoroutineScope(Dispatchers.IO)
        val f = scope.async {
            delay(1000)
        }
        scope.cancel()
        var result = false
        runBlocking {
            launch {
                f.await() // throws CancellationException, silently stopping the `launch` coroutine
                result = true
            }.join()
        }
        check(result) { "issue" }
    }

Here, the programmer didn't make the mistake of catching a CancellationException and rethrowing it somewhere else, but still, the CancellationException leaked from a different scope.

@dkhalanskyjb
Copy link
Contributor Author

Need to dive deep into the implementation to see if this is possible, but it looks like modifying Deferred to throw a non-CancellationException may not be a good solution.

Consider this simplified scenario:

runBlocking {
  val result = async(Dispatchers.Default) {
    delay(5.seconds)
  }
  launch(Dispatchers.Unconfined) {
    result.await()
  }
  delay(1.seconds)
  cancel() // cancel the common scope
}

Here, we possibly have a race: the global scope is cancelled, which causes the coroutines in async and launch also to be cancelled. However, it can happen that this sequence of events happens before launch gets cancelled:

  1. async gets cancelled,
  2. the Deferred learns about the cancellation exception,
  3. await finishes with a non-CancellationException in the launch,
  4. which causes the launch to fail with a non-CancellationException,
  5. with which the entire runBlocking fails!

So, the code can fail with one exception or rarely with another. This is just one simplified example of a broader class of races that we could introduce.

@dovchinnikov
Copy link
Contributor

await (probably another function since this one is taken) should return kotlin.Result

  • if await resumes with CE, then it means the current coroutine was cancelled;
  • if await resumes with Result, then async was completed before the current coroutine was cancelled, and it represents a value/CE/failure of the async.

@dovchinnikov
Copy link
Contributor

dovchinnikov commented Mar 9, 2023

The above could be implemented as an extension function:

suspend fun Deferred<T>.awaitResult() : Result<T> {
  return try {
    Result.success(await())
  }
  catch (ce: CancellationException) {
    // this can "override" the original CE, but the current coroutine cancellation is preferred, so it's okay
    currentCoroutineContext().job.ensureActive()
    Result.failure(t)
  }
  catch (t: Throwable) {
    // resume awaitResult with the original throwable result to give a chance to handle it
    // even if the current coroutine is cancelled concurrently
    Result.failure(t)
  }
}

@dkhalanskyjb
Copy link
Contributor Author

This is a robust solution, but non-Kotlin-style. Kotlin tries to keep hierarchies of return types as thin as possible to save the users the boilerplate of unpacking the results. For example, Map<Int, String?>::get doesn't return an Optional<String?>, it just returns String?.

@dovchinnikov
Copy link
Contributor

dovchinnikov commented Mar 9, 2023

Well, this is as minimal as Value | Throwable union can be at the moment

@dkhalanskyjb
Copy link
Contributor Author

Just a Value (with the implicit Throwable that can surface at any point in a JVM program) is even more minimal.

@dovchinnikov
Copy link
Contributor

But then you lose info about the origin of the Throwable, which is the very problem here. To check the origin, one has to catch the CE and use ensureActive manually.

@dovchinnikov
Copy link
Contributor

dovchinnikov commented Mar 9, 2023

Another approach is to expose a subclass of CE, which will mean "another coroutine was cancelled":

try {
  await()
}
catch (dce: DeferredCancellationException) { // name TBD
  // deferred was cancelled
}
catch (ce: CancellationException) {
  // this coroutine was cancelled
}

@fvasco
Copy link
Contributor

fvasco commented Mar 12, 2023

Another approach is to expose a subclass of CE, which will mean "another coroutine was cancelled":

What should happen if the code in async fails with a DeferredCancellationException?

@elizarov
Copy link
Contributor

If you are using deferred.await() in our code and you want to distinguish the case where the coroutine you're awaiting for was cancelled or not, you already can do it by explicitly checking deferred.isCancelled. However, I don't think the need to make this distinction happens very often.

IMO, the great takeaway from the original post on the "The Silent Killer" is this pattern:

try {
    doStuff()
} catch (error: Throwable) {
    coroutineContext.ensureActive()
    handleError(error)
}

That should be the recommended pattern to use instead of trying to catch CancellationException and ignoring it. That is a clear documentation thing, but we might also consider doing some utilities to help with it.

However, there is also one thing that can be fixed in the coroutines library itself in a backwards compatible way. The current state of affairs is that the coroutine that dies by a rogue CancellationException indeed does so completely silently, even though it is not cancelled itself. This can be clearly demonstrated by the following code:

suspend fun main() {
    val cancelledDeferred = CompletableDeferred<Int>()
    cancelledDeferred.cancel()
    supervisorScope {
        // This coroutine fails in a regular way
        launch(CoroutineName("one")) {
            error("Regular failure")
        }
        // This coroutine fails while waiting on cancelled coroutine
        launch(CoroutineName("two")) {
            cancelledDeferred.await()
        }
    }
}

Playground link

Neither coroutine "one" nor coroutine "two" is cancelled, both fail with exception, yet only the exception for the failure of the first coroutine is shown. We can consider changing this logic to report that the second coroutine has failed with CancelletationException, too.

@dovchinnikov
Copy link
Contributor

The current state of affairs is that the coroutine that dies by a rogue CancellationException indeed does so completely silently, even though it is not cancelled itself

It can be handled by treating CE as failure if the current coroutine was not actually cancelled.
The coroutine can be cancelled concurrently, though, but the concurrent cancellation behaviour with a caught CE should be equivalent to the behaviour with a caught error("Regular failure").

Treating CEs in a non-cancelled coroutine as failure would imply that the manual cancellation should be done differently:

// instead of this:
launch {
  throw CancellationException() // would be a failure
}

// one would need to use this:
launch {
  coroutineContext.cancel()
  coroutineContext.ensureActive() // this CE will be caught in cancelled state -> not a failure
}

Now, a function for cancel() + ensureActive() which returns Nothing is useful regardless of how this would be resolved: cancel() (comparing to throw CancellationException()) triggers the cancellation of the child coroutines right away without waiting for CE to be caught by the coroutine.

@wkornewald
Copy link

If you are using deferred.await() in our code and you want to distinguish the case where the coroutine you're awaiting for was cancelled or not, you already can do it by explicitly checking deferred.isCancelled. However, I don't think the need to make this distinction happens very often.

IMO, the great takeaway from the original post on the "The Silent Killer" is this pattern:

try {
    doStuff()
} catch (error: Throwable) {
    coroutineContext.ensureActive()
    handleError(error)
}

That should be the recommended pattern to use instead of trying to catch CancellationException and ignoring it. That is a clear documentation thing, but we might also consider doing some utilities to help with it.

Please don't make that the recommended pattern. The point of CancellationException is that it's the only exception that safely bubbles up to the next checkpoint (be that launch or some try block with a custom subclass of CancellationException). This bubbling behavior is very important. The proposed pattern is bad because intermediate try-catch layers will believe there is an error instead of controlled backtracking. We use CancellationException as a signal for safe abortion in our codebase because that's the only exception which gets actively ignored by intermediate code layers.

The proper solution is to fix the APIs where possible and introduce new fixed APIs where backwards compatibility is needed. The ensureActive() workaround is imprecise and not a true causal relationship.

@revintec
Copy link

revintec commented Jul 19, 2023

I was debugging something like this for two days(in a production environment, had to add many log/counter and deploy and wait a reproducer, do that again for at least 20 times, VERY bad experience), even suspected at one point that it was a compiler BUG...

// simplified reproducer, the original is more complicated
// thus it's even harder to debug such kind of control-flow issue
suspend fun structuredConcurrency(toBeProcessedData:Collection<Any>){
    coroutineScope{
        for(data in toBeProcessedData)
            launch{handleData(data)}
    }
    println("all data was processed")
}

the all data was processed got printed so I assume that all data was processed(seriously, is this code looks anything wrong for anybody?
but guess what, it's not, and NO EXCEPTIONS IS LOGGED! it looks like a mysterious break was in the for loop, the coroutine just vanishes... no return, no exception, how is that possible...

well, this is because deep down in the handleData(data) callstack, someone introduced withTimeout... a reasonable change, but blow up the whole thing :(

the point is, throwing CancellationException IS NOT THE SAME as cancelling a coroutine! ANYTHING CAN THROW ANY EXCEPTION, isn't that the point of structured concurrency? if we call something, it has to EITHER RETURN OR THROW, not vanishing without a trace...

Anyway I diverge, I agree with @elizarov "We can consider changing this logic to report that the second coroutine has failed with CancelletationException, too", and here's my advice:

  1. when cancelling a coroutine, the automatically produced CancellationException should have a private mark on it / or the cancelling coroutine should have a private mark of the current propagating CancellationException
  2. if the marked CancellationException propagated, and is in the correct structured concurrency way, keep the current behavior
  3. if any other exception propagated, or if the afore mentioned CancellationException is not propagating in the correct structured concurrency way(for example the case mentioned in the original post, or the user grab that reference and throwing them in another coroutine etc), treat them as error(throw outwards and don't swallow them
  4. please don't complicate things by introducing Result or un-subclassing TimeoutCancellationException from CancellationException(backwards incompatible and very hard to find in tons of catch(e:CancellationException){...}

@dkhalanskyjb
Copy link
Contributor Author

@revintec I think what you describe is a different issue: #1374

@throwable-one
Copy link

Here are my 5 cents.

When you are waiting for coroutine you might face two different states:

  1. Coroutine completed and returned a value
  2. Coroutine was stopped. Be it error or scope cancellation: it doesn't matter.

Should I expect the latter?
It depends on how my code is structured.

suspend fun doAll() {
  coroutineScope { 
    async { 
      
    }.await() // I do not want to catch CE here, it doesn't make sense.
  }
}

But from the other hand:

suspend fun welcomeUser(getUserNameProcess:Deferred<String>) {
  try {
    val userName = getUserNameProcess.await()
    println("...$userName")
  }catch (_:CancellationException) {
    println("Failed to get user name, it seems you just cancelled this process!")
  }
}

See: coroutines are like processes. Sometimes you do not expect them to die (if this process
is an integral part of your application). Sometimes you are ok with it (if they are really "external" processes).

I'd say that CE shouldn't cross the scope boundaries. If you are waiting for Job from another scope -- your own scope shouldn't be affected by foreign CE.
Imagine your process killed because you waited for some other process that got SIGKILL.

I wish we had

sealed class AwaitResult<T> {
  abstract fun unwrap(): T
  class Stopped<T>(val e: CancellationException) : AwaitResult<T>() {
    override fun unwrap(): T = throw e
  }

  class Ok<T>(val v: T) : AwaitResult<T>() {
    override fun unwrap(): T = v
  }
}

suspend fun <T> Deferred<T>.newAwait(): AwaitResult<T> = ; /**/

suspend fun doAll() {
  coroutineScope {
    async {

    }.newAwait().unwrap() // I do not want to catch CE here, it doesn't make sense.
  }
}

suspend fun welcomeUser(getUserNameProcess: Deferred<String>) {
  // I do care about result here. I am not interested in catching foreign CE though
  when (val userName = getUserNameProcess.newAwait()) {
    is AwaitResult.Ok -> println("Welcome $userName")
    is AwaitResult.Stopped -> println("Failed to get user name, it seems you just cancelled this process!")
  }
}

@dkhalanskyjb
Copy link
Contributor Author

@throwable-one, it seems your problem is not with how exceptions are handled in the library overall but with the specific behavior of await. Here's one way to implement the function you want:

@OptIn(ExperimentalCoroutinesApi::class)
suspend fun <T> Deferred<T>.newAwait(): AwaitResult<T> {
    join()
    return when (val exception = getCompletionExceptionOrNull()) {
        null -> AwaitResult.Ok(getCompleted())
        is CancellationException -> AwaitResult.Stopped(exception)
        else -> throw exception
    }
}

dkhalanskyjb added a commit that referenced this issue Feb 12, 2024
Fixes two issues:
* It is surprising for some users that the same exception can be
  thrown several times. Clarified this point explicitly.
* Due to #3658, `await` can throw `CancellationException` in
  several cases: when the `await` call is cancelled, or when the
  `Deferred` is cancelled. This is clarified with an example of
  how to handle this.

Fixes #3937
dkhalanskyjb added a commit that referenced this issue Feb 13, 2024
Fixes two issues:
* It is surprising for some users that the same exception can be
  thrown several times. Clarified this point explicitly.
* Due to #3658, `await` can throw `CancellationException` in
  several cases: when the `await` call is cancelled, or when the
  `Deferred` is cancelled. This is clarified with an example of
  how to handle this.

Fixes #3937
@francescotescari
Copy link
Contributor

2 cents on the design, please destroy everything I say.

AFAIU CancellationException is just a signal to terminate a cancelled coroutine promptly without breaking the exceptions effect system (aka run the finally blocks). If there were no exceptions, then a coroutine could be terminated by simply suspending it without resuming it.
However, currently kotlinx coroutines

  • uses CancellationException not only as a signal, but also as a result (the result of a cancelled deferred coroutine can be CancellationException).
  • doesn't check for the origin of the CancellationException (even me manually throwing a CancellationException can result in a silent coroutine death)
fun main() = runBlocking {
    launch {
        throw CancellationException()
    }
    println("Hello, world!")
}

Kotlinx coroutines also encodes three possible states:

  • Finished(success)
  • Finished(failure)
  • Finished(cancelled)

with a single cause: Throwable? where the cancelled and failure states are unionized in the cause != null (cancelled=cause is CancellationException). In practice it makes sense, as the result of a couroutine can be technically an exception or a result, but IMHO it's important to remember that we use an exception just because we are forced to do so just for the cancellation signal to not break the exception system. Once that the exception reaches the coroutine it should be unpacked to its actual meaning: is this the cancellation signal? then the coroutine state changes to cancelled. Is this not the cancellation signal? The the coroutine state changes to completed exceptionally. Again, if there were no exceptions and we implemented cancellation by just suspending the coroutine, we would likely have the cancelled state as a seperate one instead of encoding it as cause is CancellationException.

If the above makes sense, then kotlinx coroutines could:

  • not use CancellationException as the result of a deferred coroutine. For example, if the deferred coroutine is cancelled, then await could throw a CancelledCoroutineException (a failure).
  • add a way to identity the correct cancellation signals, for example, adding a token to each CancellationException which allows the coroutine completed with it to verify the token and update the state in response.

In an ideal world, CancellationException could be an internal implementation detail of kotlinx coroutines which is used to implement the signal transparently. In practice, developers catching Throwable will always also catch the CancellationException, so they have to rethrow it.

Instead of

interface Job {
    public fun cancel(cause: CancellationException? = null)
}  

we could have

internal class CancellationException: Throwable

interface Job {
    public fun cancel(message: String? = null, cause: Throwable? = null)
}

fun Throwable.rethrowIfCancellationSignal() {
  if (this is CancellationException) { throw this }
}

The state of the coroutine conceptually would be:

sealed class State {
    data class Completed(val value: Any?): State()
    data class Failed(val cause: Throwable): State()
    data class Cancelled(val message: String, val cause: Throwable?): State()
}

and Deferred.await would do

fun await() = when(state) {
    is State.Cancelled -> throw CoroutineCancelledException(state.message, state.cause)
    is State.Completed -> state.value
    is State.Failed -> throw state.cause
}

This is just to say that IMHO this should not be "solved" by the developer using the coroutineContext.ensureActive() pattern, which is just a way of condensing the following logic:

  • if this coroutine is cancelled, suppress the other exception or cancellation signal incoming and replace it with my cancellation signal.
  • if this coroutine is not cancelled, then the exception is either a failure or a cancellation signal of another coroutine (how did it get there though!), handle them separately.

This is not a great pattern IMHO, and should be handled by the library if possible, even if getting there with backwards compatibility could be painful.

@dkhalanskyjb
Copy link
Contributor Author

dkhalanskyjb commented Feb 14, 2024

Overall, I agree with your points.

should be handled by the library if possible, even if getting there with backwards compatibility could be painful.

This is the main problem. Yes, this would be painful, to the point I personally would call hopeless. kotlinx-coroutines guarantees, for example, that if library A depends on B, where B only uses stable APIs, and C depends on kotlinx-coroutines, then upgrading the version of kotlinx-coroutines should not require recompiling the library B. So, even if we somehow magically manage to make all code automatically behave strictly better after it's recompiled, there's a whole world of binary compatibility headaches of the form "what if A sends a Deferred to B," or "what if B returns a Deferred to A," or "what if B calls cancel, but A checks the cancellation exception," and so on, and so on. There's a plethora of interactions we'd need to consider to even hope to do this correctly. By my estimates, this would amount to dropping everything else we're doing and brainstorming just this task, and even then, there wouldn't be a guarantee that we'd be able to evolve this (really ubiquitous) API.

If there ever is a kotlinx.coroutines 2.0 that is allowed to break compatibility freely (NB: I don't mean indiscriminately), I'm certain this is an area we'd do differently, more in line with what you're saying.

if this coroutine is cancelled, suppress the other exception or cancellation signal incoming and replace it with my cancellation signal.

I agree that the pattern proposed above is suboptimal. Here's a better one:

try {
  // do anything
} catch (e: Throwable) {
  if (e is CancellationException) currentCoroutineContext().ensureActive()
  handleException(e)
}

The pattern without if (e is CancellationException) isn't in line with the prompt cancellation guarantee that a lot of our functions give. In essence, prompt cancellation states that cancellation of the calling coroutine takes precedence over normal values, but not over exceptions. Without the if, the cancellation exception really does suppress the other exception, which is inconsistent and probably non-desirable.

dkhalanskyjb added a commit that referenced this issue Feb 15, 2024
Fixes two issues:
* It is surprising for some users that the same exception can be
  thrown several times. Clarified this point explicitly.
* Due to #3658, `await` can throw `CancellationException` in
  several cases: when the `await` call is cancelled, or when the
  `Deferred` is cancelled. This is clarified with an example of
  how to handle this.

Fixes #3937
caiodev added a commit to caiodev/GithubProfileSearcher that referenced this issue May 9, 2024
…lling Coroutines {

    https://betterprogramming.pub/the-silent-killer-thats-crashing-your-coroutines-9171d1e8f79b
    Kotlin/kotlinx.coroutines#3658 (comment)
    https://kotlinlang.org/docs/exception-handling.html
}

- Create the "Neutral" state for when CancellationExceptions are wrongfully launched (Refer to the above links to know when and how that happens)

- Add an Architecture that uses both By Layer and By Feature separation of concerns (https://www.youtube.com/watch?v=16SwTvzDO0A)

- Refactor dependencies between modules

- Remove datasource dependency from midfield (a.k.a domain)

- Create specific feature modules inside each base layer (ui, midfield (domain) and datasource (data/model))

- Bump dependency versions to newer ones

- Turn common dependencies into bundles

- Replace multiple dependency references with Bundles

- Refactor code

- Remove all occurences of suspend functions being executed using runBlocking from the project

- Remove runBlocking from the project

- Remove fundamental features (Pagination, Upper views management etc...) (Temporarily until I implement them back, but in Compose)

- Create specific modules for both JVM and Integrated/UI tests (The old way, JVM tests had access to Integrated/UI test dependencies and vice-versa, now that's sorted out)

- Split Remote datasource models from Local datasource models (Then, there would be only one class having both kotlin.serialization and Room annotations)

- Create mapping from Remote datasource model to Local datasource model (Entity)

- Create mapping from Local datasource model to clean model (No external dependencies referenced) (Plain Old Kotlin Object)

KNOWN ISSUES:

1- There's no pagination anymore (It will be re-implemented in the future using Paging Compose)
2- There may be UI inconsistencies when CancellationExceptions are launched intentionally or not (Will be addressed in the future)

FEATURES:

No new features have been introduced
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants