Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[builds] Print Java process stack traces of stalled builds #389

Closed
wants to merge 1 commit into from

Conversation

uce
Copy link
Contributor

@uce uce commented Feb 12, 2015

After discussion with @rmetzger we came up with the following idea to manually monitor stalled builds and to print the Java process stack traces if they do stall.

Furthermore, a stalling build is killed with a != 0 exit code and really fails the Travis build instead of just timing it out.

For testing, I've a branch where I block a test on purpose. The output can be found here: https://travis-ci.org/uce/incubator-flink/jobs/50491750

==============================================================================
Maven produced no output for 300 seconds.
==============================================================================
==============================================================================
The following Java processes are running (JPS)
==============================================================================
21670 Jps
12703 surefirebooter5726841031685552603.jar
2220 Launcher
==============================================================================
Printing stack trace of Java process 12703
==============================================================================
2015-02-12 14:14:42
Full thread dump OpenJDK 64-Bit Server VM (23.25-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f4e18001000 nid=0x54ed runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x00007f4e64112000 nid=0x31d2 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f4e64110000 nid=0x31ce waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f4e6410d000 nid=0x31cd waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f4e6410b000 nid=0x31cb runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f4e640b8800 nid=0x31c8 in Object.wait() [0x00007f4e32fee000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000d23275e0> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
    - locked <0x00000000d23275e0> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

"Reference Handler" daemon prio=10 tid=0x00007f4e640b6800 nid=0x31c7 in Object.wait() [0x00007f4e330ef000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000d23273a0> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x00000000d23273a0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f4e64009800 nid=0x31a4 in Object.wait() [0x00007f4e6b98c000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000f1c81e18> (a org.apache.flink.api.common.io.BinaryInputFormatTest)
    at java.lang.Object.wait(Object.java:502)
    at org.apache.flink.api.common.io.BinaryInputFormatTest.testCreateInputSplitsWithOneFile(BinaryInputFormatTest.java:51)
    - locked <0x00000000f1c81e18> (a org.apache.flink.api.common.io.BinaryInputFormatTest)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

"VM Thread" prio=10 tid=0x00007f4e640a7800 nid=0x31c2 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f4e64017000 nid=0x31a8 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f4e64019000 nid=0x31ac runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f4e6401b000 nid=0x31af runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f4e6401c800 nid=0x31b1 runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007f4e6401e800 nid=0x31b3 runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007f4e64020800 nid=0x31b5 runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007f4e64022000 nid=0x31b7 runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007f4e64024000 nid=0x31b9 runnable 

"GC task thread#8 (ParallelGC)" prio=10 tid=0x00007f4e64026000 nid=0x31bb runnable 

"GC task thread#9 (ParallelGC)" prio=10 tid=0x00007f4e64027800 nid=0x31be runnable 

"GC task thread#10 (ParallelGC)" prio=10 tid=0x00007f4e64029800 nid=0x31bf runnable 

"GC task thread#11 (ParallelGC)" prio=10 tid=0x00007f4e6402b800 nid=0x31c0 runnable 

"GC task thread#12 (ParallelGC)" prio=10 tid=0x00007f4e6402d800 nid=0x31c1 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007f4e6411c800 nid=0x31d4 waiting on condition 

JNI global references: 146

The stack trace is very helpful in pointing to the problem for stalled builds.

@rmetzger
Copy link
Contributor

Very nicely implemented! My bash scripts are certainly uglier ;)

+1 for merging it

@tillrohrmann
Copy link
Contributor

I really like this feature. Will make all our lives less frustrating ;-)

On Thu, Feb 12, 2015 at 3:33 PM, Robert Metzger notifications@github.com
wrote:

Very nicely implemented! My bash scripts are certainly uglier ;)

+1 for merging it


Reply to this email directly or view it on GitHub
#389 (comment).

@uce
Copy link
Contributor Author

uce commented Feb 16, 2015

Subsumed by #407.

@uce uce closed this Feb 16, 2015
@uce uce deleted the travis branch February 23, 2015 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants