Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack trace recovery #74

Closed
wetnose opened this issue Jun 19, 2017 · 34 comments
Closed

Stack trace recovery #74

wetnose opened this issue Jun 19, 2017 · 34 comments
Assignees
Milestone

Comments

@wetnose
Copy link

wetnose commented Jun 19, 2017

Look at the example below

fun main(args: Array<String>) = runBlocking {
    xxx()
}


suspend fun xxx() {
    delay(100)
    throw Exception()
}

The execution of the code results to an exception with the incomplete stack trace.

Exception in thread "main" java.lang.Exception
	at TestKt$xxx$1.doResume(Test.kt:20) // throw Exception()
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
        ....
	at TestKt.main(Test.kt:13) //  runBlocking

The stack is starting from the bridge function and completes with the last continuation. All the intermediate calls are missing.

The problem makes it difficult to debug the app because it is not possible to determine the true continuation caller. The stack trace of an exception, thrown from a coroutine, becomes useless.

@elizarov
Copy link
Contributor

This is indeed a known issue on our plate. Thank you for starting the thread, though. I'm curious what kind of stack-track you would personally prefer to see in this case?

@wetnose
Copy link
Author

wetnose commented Jun 19, 2017

I want to see pointers to actual code lines with function calls. In my example above, the pointer to the line with call of xxx function is missing. If expand this example with a chain of calls (xxx -> yyy -> zzz -> suspend/resume -> throw Exception) then all the pointers will be missing in the stack trace excepting the last one which throws the exception.

@wetnose
Copy link
Author

wetnose commented Jun 20, 2017

One more thing. I think the problem is critical for Kotlin coroutines project: non-informative stack traces make debugging of coroutines too expensive as well as supporting of a program that uses coroutines. From my point of view the main goals of coroutines is to make asynchronous programming both easier and cheaper. But with expensive debugging this goal cannot be achieved.

@elizarov
Copy link
Contributor

It is indeed important. Kotlin coroutines are already easier than alternative async solutions which are all notoriously known for their hard-to-decipher stack traces, but we can also make Kotlin coroutines stack traces better. It is not a straightforward thing to do on JVM and will likely come at a considerable performance penalty (to be paid when exception is being propagated though coroutine), so the current idea is to make it available under a debug system property.

@wetnose
Copy link
Author

wetnose commented Jun 20, 2017

I think in the beginning the cost does not matter, especially that it will be optional.

@molixiaoge
Copy link

The serious thing is the main func immediately exit, when some func throw exception!

Example Codes:

fun main(args: Array<String>) = runBlocking {
    println("start")
    xxx()
    println("end")
}


suspend fun xxx() {
    delay(100)
    throw Exception()
}

OutPuts:

start
Exception in thread "main" java.lang.Exception
at MainKt$xxx$1.doResume(main.kt:97)
at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
at kotlinx.coroutines.experimental.DispatchedContinuation$resume$1.run(CoroutineDispatcher.kt:152)
at kotlinx.coroutines.experimental.EventLoopImpl$Dispatch.run(EventLoop.kt)
at kotlinx.coroutines.experimental.EventLoopImpl.processNextEvent(EventLoop.kt:90)
at kotlinx.coroutines.experimental.BlockingCoroutine.joinBlocking(Builders.kt:137)
at kotlinx.coroutines.experimental.BuildersKt.runBlocking(Builders.kt:90)
at kotlinx.coroutines.experimental.BuildersKt.runBlocking$default(Builders.kt:82)
at MainKt.main(main.kt:101)

Process finished with exit code 1

@fvasco
Copy link
Contributor

fvasco commented Jun 21, 2017

This is a typical thing really ugly to do, but very nice to have.

Manipulate stack trace looks hard, perhaps a cheaper way is to improve the current coroutine debugging facilities and provide this information only when DEBUG is enabled.

Idea for stack trace:

Exception in thread "main" java.lang.Exception
    at MainKt$main$1.doResume(main.kt:82)    
caused by: java.lang.Exception
    at MainKt$xxx$1.doResume(main.kt:97)
    ...

@fvasco
Copy link
Contributor

fvasco commented Jun 21, 2017

I reply myself: we cannot replace the original exception with a new one for many reason: primary the Exception constructor is not public.

Moreover using cause exception looks wrong.

May be a good idea to annotate correlate exceptions (the main stack trace and so on) using Java 7 suppressed exceptions?

@elizarov
Copy link
Contributor

@molixiaoge runBlocking works this way by design. You should use some other coroutine builder (look at launch and async) if you need different behavior.

@elizarov
Copy link
Contributor

elizarov commented Jun 21, 2017

@fvasco There are two basic approach to fix stack-traces in the example presented by @wetnose:

One approach is to simply modify the stack trace in exceptions that are thrown from inside of coroutines. This could be done by a byte-code modification debug agent that you would attach to your JVM for better debugging of coroutines and would generally incur additional cost only when exceptions are being thrown. We could do anything we want, e.g, for example, keep some nice dividers showing both the actual Java stack and the coroutine stack (that still lives on the heap)

The other approach is to provide some "debug" mode for Kotlin compiler that would emit byte code in such a way, that Java stack of a coroutine is fully restored on coroutine resume (similar to the way Quasar does it). However, this would make resumption of any coroutine a more eager, and thus more costly, procedure.

@fvasco
Copy link
Contributor

fvasco commented Jun 21, 2017

Regarding the second option: is it possible to produce the required bytecode but use a flag to enable it?
If it is false then the JVM should prune out the dead code.

@elizarov Thanks for clarification, both looks really better solutions.

@elizarov
Copy link
Contributor

That would considerable increase byte code size (or so it seems now). I'd rathe produce minimal byte code, but have a byte-code instrumenting agent that adjusts for better stack-traces in debug mode... That is all just stipulation so far and does require research.

@wetnose
Copy link
Author

wetnose commented Jun 21, 2017

@elizarov, the 2nd solution allows us to analyze stack traces from logs as well, so I prefer it.

@elizarov
Copy link
Contributor

@wetnose The first, too. It does not really matter. The end result is that exceptions will have the correct stack traces (how they got them is an implementation detail)

@wetnose
Copy link
Author

wetnose commented Jun 21, 2017

@elizarov, is this feature already planned to be included in an incoming lib version?

@fvasco
Copy link
Contributor

fvasco commented Jun 21, 2017

That would considerable increase byte code size

I suspected it, it is the same case of debug symbols and probably configure a flag to turn in on/off is the simplest part of the work.

Now Quasar use live instrumentation or AOT instrumentation, so it covers both scenarios.

@elizarov
Copy link
Contributor

@wetnose We are planning to work on the coroutine status & stack-trace recovery during this summer. Stay tuned.

@elizarov
Copy link
Contributor

elizarov commented Jul 24, 2017

Here is an update on this issue. Under the most recent version of Kotlin compiler (1.1.4-eap-33) the topic-starter's example code produces the following stack-trace:

Exception in thread "main" java.lang.Exception
	at TestKt.xxx(Test.kt:10)
	at TestKt$xxx$1.doResume(Test.kt)
...
	at TestKt.main(Test.kt:4)

As you can see, named suspending functions now execute directly in the the correspondingly named functions on JVM, so the top of the exception stack-trace directly refers to the name of the corresponding suspending function that threw exception in the source code.

@elizarov
Copy link
Contributor

I'm closing this particular issue since 1.1.4 was just released

@Groostav
Copy link
Contributor

I mean, this recovers a little bit of the data, but in terms of the instrimentation solution we were talking about this doesn't even come close...

consider:

    //this is line 100 in my test suite
    @Test fun `entry-point`() = runBlocking {

        PlatformImpl.startup{}

        val event = 1

        val result = run(JavaFx){
            doUIWork(event)
        }

        println("got $result!")
    }

    suspend fun doUIWork(event: Int): Int{

        val value = event

        return run(CommonPool){
            doHeavyWork(value)
        }
    }

    suspend fun doHeavyWork(value: Int): Int{

        val innerVal = value

        val result = run(BlockableThreadPool.asCoroutineDispatcher()){
            doIOWork(innerVal)
        }

        return result
    }

    suspend fun doIOWork(value: Int): Int = TODO("value + 1")

outputs

kotlin.NotImplementedError: An operation is not implemented: value + 1

	at com.empowerops.language.KotlinCoroutinesFixture.doIOWork(KotlinCoroutinesFixture.kt:134)
	at com.empowerops.language.KotlinCoroutinesFixture$doHeavyWork$result$1.doResume(KotlinCoroutinesFixture.kt:128)
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
	at kotlinx.coroutines.experimental.DispatchTask.run(CoroutineDispatcher.kt:120)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

when it should output something resembling


kotlin.NotImplementedError: An operation is not implemented: value + 1

	at com.empowerops.language.KotlinCoroutinesFixture.doIOWork(KotlinCoroutinesFixture.kt:135)
	at com.empowerops.language.KotlinCoroutinesFixture$doHeavyWork$result$1.doResume(KotlinCoroutinesFixture.kt:128)
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
	at kotlinx.coroutines.experimental.DispatchTask.run(CoroutineDispatcher.kt:120)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

	at kotlin.cotoutines.experimental.intrinsics.Intrinsics.suspendCoroutineOrReturn(Intrinsicts.kt:45)
	at kotlinx.coroutines.experimental.BuildersKt.run(Builders.kt:90)

	at com.empowerops.language.KotlinCoroutinesFixture.doHeavyWork(KotlinCoroutinesFixture.kt:127)
	at com.empowerops.language.KotlinCoroutinesFixture$doUIWork$2.doResume(KotlinCoroutinesFixture.kt:119)
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
	at kotlinx.coroutines.experimental.DispatchTask.run(CoroutineDispatcher.kt:120)
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
	
	at kotlin.cotoutines.experimental.intrinsics.Intrinsics.suspendCoroutineOrReturn(Intrinsicts.kt:45)
	at kotlinx.coroutines.experimental.BuildersKt.run(Builders.kt:90)

	at com.empowerops.language.KotlinCoroutinesFixture.doUIWork(KotlinCoroutinesFixture.kt:118)
	at com.empowerops.language.KotlinCoroutinesFixture$entry-point$1$result$1.doResume(KotlinCoroutinesFixture.kt:108)
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
	at kotlinx.coroutines.experimental.DispatchTask.run(CoroutineDispatcher.kt:120)
	at com.sun.javafx.application.PlatformImpl.lambda$null$173(PlatformImpl.java:295)
	at java.security.AccessController.doPrivileged(Native Method)
	at com.sun.javafx.application.PlatformImpl.lambda$runLater$174(PlatformImpl.java:294)
	at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
	at com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
	at com.sun.glass.ui.win.WinApplication.lambda$null$148(WinApplication.java:191)
	at java.lang.Thread.run(Thread.java:745)
	
	at kotlin.cotoutines.experimental.intrinsics.Intrinsics.suspendCoroutineOrReturn(Intrinsicts.kt:45)
	at kotlinx.coroutines.experimental.BuildersKt.run(Builders.kt:90)

	at com.empowerops.language.KotlinCoroutinesFixture$entry-point$1.doResume(KotlinCoroutinesFixture.kt:107)
	at kotlin.coroutines.experimental.jvm.internal.CoroutineImpl.resume(CoroutineImpl.kt:54)
	at kotlinx.coroutines.experimental.DispatchTask.run(CoroutineDispatcher.kt:120)
	at kotlinx.coroutines.experimental.EventLoopBase$QueuedRunnableTask.run(EventLoop.kt:189)
	at kotlinx.coroutines.experimental.EventLoopBase.processNextEvent(EventLoop.kt:129)
	at kotlinx.coroutines.experimental.BlockingCoroutine.joinBlocking(Builders.kt:225)
	at kotlinx.coroutines.experimental.BuildersKt.runBlocking(Builders.kt:150)
	at kotlinx.coroutines.experimental.BuildersKt.runBlocking$default(Builders.kt:142)
	at com.empowerops.language.KotlinCoroutinesFixture.entry-point(KotlinCoroutinesFixture.kt:101)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

notice I'm saying "at Thread.run at suspendCoroutineOrReturn" which is where I'm getting a little inventive. It could instead use the standard caused-by nested exception, but I think that just muddies things.

@elizarov
Copy link
Contributor

Unfortunately, that's way too expensive to capture a stack-trace on every launch of a new coroutine (that is the very last stack of your output). However, IDEA has "async stracktraces" features where it does capture this kind of information in debug mode.

Meanwhile, though instrumentation we are tackling a subset of this problem. In particular, stitching various context switched that happen inside the coroutine (the first parts of your desired stack).

@matejdro
Copy link

matejdro commented Jun 21, 2018

How about this one?

fun main(args : Array<String>) {
    runBlocking {
        getData()
    }
}

private suspend fun getData() {
    suspendCoroutine<Unit> { continuation ->
        thread {
            continuation.resumeWithException(Exception("Test Exception"))
        }
    }
}

Running this code will produce this exception:

Exception in thread "main" java.lang.Exception: Test Exception
	at TestKt$getData$2$1.invoke(Test.kt:14)
	at TestKt$getData$2$1.invoke(Test.kt)
	at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

This exception only works if getData() is called from one place in my application. Otherwise it is very unhelpful, since I have no way of knowing where this method was called.

This is especially frustrating when coroutines is wrapped around 3rd party library such as Google Play services's Task API. All Google Play code is completely obfuscated, so exceptions will contain only jibberish class names without any hint for where this code is called from.

However, IDEA has "async stracktraces" features where it does capture this kind of information in debug mode.

This is helpful if developer can reproduce the crash in the IDE. But more often than not crashes arrive via crash reporting tools where developer can only see the exception.

@cleishm
Copy link

cleishm commented Aug 17, 2018

I concur with the previous post - to be of any help, the stacktrace needs to include the point where the suspendCoroutine call was made (and ideally the stack leading up to that).

Is there no information about the original stack that can be attached to the exception when resumeWithException(e) is called?

@elizarov elizarov reopened this Aug 18, 2018
qwwdfsad added a commit that referenced this issue Sep 14, 2018
  * Agent automatically captures stacktrace when coroutine is created
  * Agent patches coroutine completion, so backtrace augmentation (#493) prepends creation stacktrace to the augmented stacktrace
  * Agent can be dynamically loaded and unloaded

Fixes #74
@SUPERCILEX
Copy link
Contributor

What's the status? (The PR was closed without a merge)

@qwwdfsad
Copy link
Collaborator

@SUPERCILEX It will be released in 1.1

@SUPERCILEX
Copy link
Contributor

Looks like the label needs to be updated then 🙂

@yaitskov
Copy link

Any progress on that. 1.3.50 gives stack trace as bad as it was.
Classic completable future looks works.

@elizarov
Copy link
Contributor

@yaitskov What is your use-case? The issue here was fixed a while ago. If you have something else, please report. Also, make sure you actually enable stack trace recovery: https://github.com/Kotlin/kotlinx.coroutines/blob/master/docs/debugging.md

@yaitskov
Copy link

[ERROR] koroChase(StackTraceChaseTest)  Time elapsed: 0.058 s  <<< ERROR!
java.lang.NullPointerException: ops x = 11 > 0
	at StackTraceChaseTest.c(StackTraceChaseTest.kt:20)
	at StackTraceChaseTest.b(StackTraceChaseTest.kt:17)
	at StackTraceChaseTest.a(StackTraceChaseTest.kt:16)
	at StackTraceChaseTest.koroChase(StackTraceChaseTest.kt:6)
import kotlinx.coroutines.runBlocking
import org.junit.Test

class StackTraceChaseTest {
    @Test
    fun koroChase() = runBlocking {  // line 6
        a(0)
        a(1)
        a(2)
        a(3)
        a(10)
        a(11)  // line 12. I want to see it in stack trace
        a(12)
    }

    suspend fun a(x: Int): Unit = b(x)  // line 16
    suspend fun b(x: Int): Unit = c(x)
    suspend fun c(x: Int) {
        if (x > 10) {
            throw NullPointerException("ops x = ${x} > 0")
        }
    }
}

My stack trace looks same with -ea or kotlinx.coroutines.stacktrace.recovery.

<kotlin.version>1.3.50</kotlin.version>
...
 <plugin>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
                    <argLine>-ea -Dkotlinx.coroutines.stacktrace.recovery=true</argLine>

I tried to pass garbage in argLine like -ae then java complains.

jdk1.8.0_211

$ mvn clean test  -Dtest=StackTraceChaseTest 

@elizarov
Copy link
Contributor

That's what I get on your examples. All the line numbers are correct:

java.lang.NullPointerException: ops x = 11 > 0
	at StackTraceChaseTest.c(Scratch.kt:20)
	at StackTraceChaseTest.b(Scratch.kt:17)
	at StackTraceChaseTest.a(Scratch.kt:16)
	at StackTraceChaseTest$koroChase$1.invokeSuspend(Scratch.kt:12) // <--- Line 12
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)

I'm running the latest versions:

  • Kotlin version 1.3.61.
  • Coroutines version 1.3.3.

I've tried with older versions, but I still cannot reproduce your problem. Can you share a self-contained maven project, please?

@yaitskov
Copy link

yaitskov commented Jan 4, 2020

@riflockle7
Copy link

riflockle7 commented Mar 24, 2020

I applied the question.
I wrote android test source code 

// AnyTest.kt

@Test
fun callStackCrop_a() = runBlocking {
  a(0)
  a(1)
  a(2)
  a(3)
  a(10)
  ***a(11)***   // 76 line
  a(12)
}

suspend fun a(x: Int): Unit = b(x)   // this action cropped....
suspend fun b(x: Int) = withContext(Dispatchers.IO) {
  c(x)      // 82 line
}
suspend fun c(x: Int) {
  if (x > 10) {
    throw NullPointerException("ops x = ${x} > 0")   // 86 line
  }
}

Test result is that

at ___.AnyTest.c(AnyTest.kt:86)
	at ___.AnyTest$b$2.invokeSuspend(AnyTest.kt:82)
	at |b|b|b(Coroutine boundary.|b(|b)
	at ___.AnyTest$koroChase$1.invokeSuspend(AnyTest.kt:76)
Caused by: java.lang.NullPointerException: ops x = 11 > 0
	at ___.AnyTest.c(AnyTest.kt:86)
	at ___.AnyTest$b$2.invokeSuspend(AnyTest.kt:82)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)

"Calling b() method at a()" is removed…..
I want to see all call stack trace in coroutine
how to recover this call stack.....? (both debug and release)

Attach a link with a similar problem

@elizarov
Copy link
Contributor

Yes. Tail calls to suspending functions are removed from stacktraces. We might consider recovering them in the future, but it is not in our near term plans.

@Anamorphosee
Copy link

Anamorphosee commented Nov 24, 2021

I faced the same problem. The Kotlin coroutine debug library didn't help me in any way. Therefore, after studying the implementation of coroutines, I wrote my own solution based on bytecode generation and MethodHandle API. It supports JVM 1.8 and Android API 26 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests