Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't join a canceled task #64

Closed
blastrock opened this issue May 31, 2017 · 21 comments
Closed

Can't join a canceled task #64

blastrock opened this issue May 31, 2017 · 21 comments

Comments

@blastrock
Copy link

Hi,

In the following code, I want to make sure my code blocks until the task is really finished:

        val job = async(CommonPool) {
            while (isActive);
            Thread.sleep(1000)
            System.out.println("end of async")
        }

        job.cancel()
        job.join()
        System.out.println("end of test")

The result is that I see "end of test" instantly, and "end of async" appears a second after, even though I asked to join() the task. Am I misusing the library or is this a bug?

@cy6erGn0m
Copy link
Contributor

According to join() it only waiting for completion. Job and running coroutine are not the same things and have different life-cycles. Once you cancel job it becomes completed: you can see it if you add job.invokeOnCompletion { println("completed") }. However even after job completion a running coroutine could continue execution.

@cy6erGn0m
Copy link
Contributor

@elizarov However I see why it is confusing as async {}, run {} and launch {} looks like a thread {} while job.join() looks like thread.join() this is why users get confused.

@cy6erGn0m
Copy link
Contributor

cy6erGn0m commented Jun 2, 2017

@blastrock so to get execution synchronized you can need some different way, consider the following


import kotlinx.coroutines.experimental.*
import kotlinx.coroutines.experimental.channels.*
import java.util.concurrent.*

fun main(args: Array<String>) {
    runBlocking {
        val c = RendezvousChannel<Boolean>()

        val job = async(CommonPool) {
            while (isActive);

            Thread.sleep(1000)

            System.out.println("end of async")
            c.send(isActive)
        }

        job.invokeOnCompletion {
            println("completed")
        }

        delay(1, TimeUnit.SECONDS)
        job.cancel()

        c.receive()

        System.out.println("end of test")

        delay(1, TimeUnit.SECONDS)
    }
}

Notice that it this solution is not the same as join()

@elizarov
Copy link
Contributor

elizarov commented Jun 2, 2017

The difference in behaviour between Thread.join and Job.join is indeed confusing. Thanks for pointing it out. Maybe we should find a better name.

@blastrock
Copy link
Author

Thanks for the answer.

If I may, even if you rename join to make it less confusing, I think a join() API would be very useful. I came up with a workaround similar to this:

        @Volatile
        var keepRunning = true
        val job = async(CommonPool) {
            while (keepRunning);
            Thread.sleep(1000)
            System.out.println("end of async")
        }

        keepRunning = false
        job.join()
        System.out.println("end of test")

Both your workaround and mine seem cumbersome to me.

It happens very often to me that I need to cancel a coroutine and be sure it is finished to ensure there is no race condition, for example when I am shutting down, or if I want to start a new coroutine which will access the same data as the one that is running.

I think the join() primitive would be very useful with cancelation, and probably not much harder to implement than what you have now.

@elizarov
Copy link
Contributor

elizarov commented Jun 5, 2017

The reason launch/async behaves like it is now was intentional to make the state of Job/Deferred objects simpler. Right now, there have essentially two states: active and completed, and a join/await wait for completion. Introducing a join operation that waits until the coroutine has finished it execution (let's call it a terminated state) makes their state more complex. A new state appears -- "cancelled by not yet terminated". I'm not convinced yet that the presented use cases warrant the complication of API that is already not so simple.

However, the workarounds presented so far are not ideal either. Channel is too heavy-weight for such light-weight task as tracking job termination. I'll think on how it can be done easier.

@elizarov
Copy link
Contributor

elizarov commented Jun 5, 2017

@blastrock As a better workaround for cases where you need to wait until the task is fully terminated I'd suggest the following utility function:

fun launchWithTermination(
    context: CoroutineContext,
    block: CoroutineScope.() -> Unit
): Job {
    val terminated = Job()
    launch(context) {
        try {
            block()
        } finally {
            terminated.cancel()
        }
    }
    return terminated
}

You can use it instead of launch and the Job that it return completes when the code is really finished. You cannot cancel the running coroutine though the resulting Job object, though. It is used only to signal termination. You can still cancel it via the parent Job in the context.

Note, that is the next release of kotlinx.coroutines (version 0.16) you'll need to change launch(context) in the above code to launch(context, CoroutineStart.ATOMIC) to make sure its body starts to execute even if it is cancelled before start.

@cy6erGn0m
Copy link
Contributor

@elizarov exception also need to be propagated, so it requires a lot of preparation to get this fake job work as intended

@elizarov
Copy link
Contributor

elizarov commented Jun 5, 2017

I would also highly recommend using an actor for the case where you need to ensure that at most one action is running at the same time. However, changing join behavior so that it waits until the job fully finishes on cancel is also on the table (with the downside of having a more complicated state machine).

@blastrock
Copy link
Author

Indeed it would introduce a new state in the Job class, but maybe this state doesn't need to be exposed. I have yet to find a use-case, in kotlin or another language, where I need to know if (or when) a job is "completed", I usually only need to know when it is "terminated" (as you called it). Of course the "canceled but running"-state must still be exposed inside the coroutine through the isActive call.

Maybe I can make a wrapper with a workaround, but maybe I would need to expose my own Job class to do that. I'll think about it.

As for the Actor pattern, I don't think I can implement job cancelation easily with it. But the one-task-at-a-time guarantee is there indeed.

@elizarov
Copy link
Contributor

elizarov commented Jun 26, 2017

I've pushed the corresponding changes to the develop branch. Now jobs (including Deferred) have an additional cancelling state. So, on invocation of cancel they go into cancelling state. isActive starts returning false, but isComplete is still false in this state. Both join and await wait for completion, that is they wait until the coroutine completes it execution.

It is important to note, that in current implementation a cancelled coroutine is doomed to complete exceptionally. Even if CancellationException is caught inside a coroutine and is replaced with any kind of result, it will still go into cancelled state on completion.

Note, that the naming of various states (see docs on Job and Deferred) are still under consideration and may change: https://github.com/Kotlin/kotlinx.coroutines/blob/develop/kotlinx-coroutines-core/src/main/kotlin/kotlinx/coroutines/experimental/Job.kt

@Nemikolh
Copy link

@elizarov Thanks a lot for working on this issue, it is truly appreciated. I am also facing a similar issue.
Do you know when would a new release with this new change would come?

I'm trying to figure out whether or not it's best for us to wait on this fix or use one of the workaround.

@elizarov
Copy link
Contributor

I plan to get it released soon (this or next week)

@kingsleyadio
Copy link
Contributor

@elizarov I'm thinking. Maybe we don't need to change the behavior of join. Or perhaps, maybe there could be another method that continues to behave like the current implementation.
The early return in the current implementation could sometimes be desirable.

Meanwhile. What will be the behavior in both situations below with the new implementation.

  1. Join immediately after cancel
  2. Cancel immediately after join

Thank you.

@elizarov
Copy link
Contributor

elizarov commented Jul 18, 2017

The reason to change the implementation of join and await is two-fold. One is consistency with Thread.join as was mentioned above, but that would not be so compelling if not for the other. The other is to ensure that async(context) { doSomething() }.await() is really working just as run(context) { doSomething() } without any spurious concurrency on cancellation. This is what really bothers me in the current (version <= 0.16) implementation of join and await, it is that when I cancel the coroutine, then the other coroutine that was waiting for it is going to be executed concurrently with whatever code that was still running in the coroutine that was cancelled, but has not completed just yet. This concurrency would be so rare, that it would be a source of very hard-to-diagnose bugs.

@elizarov
Copy link
Contributor

Released in version 0.17

@matejdro
Copy link

matejdro commented Jan 19, 2018

Is there a way to get old behavior in the new version? Maybe with another method instead of await/join?

For example if I'm wrapping coroutine around blocking task that is outside my control (task itself cannot be cancelled mid-execution) and I cancel the coroutine, I would want to return execution immediately and let that dangling blocking function just finish in the background instead of blocking everything.

@elizarov
Copy link
Contributor

You can just use cancel. It does not block. If you don't want to wait, then just don't invoke await.

@matejdro
Copy link

Yes, but what if cancel is triggered by another thread?

For example, on Android, coroutine is await-ing in the background, but in the meantime, user closes down the application, so cancel is called from Android UI thread. I want to cleanup as fast as possible instead of waiting for that to finish.

@elizarov
Copy link
Contributor

elizarov commented Jan 20, 2018

You can install callback that gets triggered as soon as cancel is invoked with job.invokeOnCompletion(onCancelling = true) { ... }. Using this callback you can define an extension that waits for a job to become cancelling (as soon as cancel is invoked) in the usual way:

suspend fun Job.awaitCancelling() = 
    suspendCancellableCoroutine<Unit> { cont -> 
        val handle = this@awaitCancelling.invokeOnCompletion(onCancelling = true) {
            cont.resume(Unit) 
        }
        disposeOnCompletion(handle)
    }

@matejdro
Copy link

that sounds like good workaround. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants