Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix debug launch thread hang using modified denyska patch. #148

Merged
merged 1 commit into from May 22, 2015
Merged

Fix debug launch thread hang using modified denyska patch. #148

merged 1 commit into from May 22, 2015

Conversation

phensley
Copy link
Contributor

I ran into the 57% hang bug with the latest versions of Eclipse and testng-eclipse, and have resolved it by applying a slightly-modified version of patch #36 mentioned in issue #55.

Debug setup:

  • Eclipse Luna non-EE, Mac Yosemite
  • Simple gradle project, one class, using TestNG 6.9.4
  • No exception or line breakpoints are set.
  • Replaced the do..while loop in TestRunnerViewPart.startTestRunListening with a for 0..3 loop.
  • Dtrace to monitor TCP connection attempts.

Test case:

public class DemoTest {
  @Test
  public void testDemo() {
    Assert.assertEquals(1, 1);
  }
}

Launch as Debug -> TestNG, here is the sequence of events:

  1. Local port 55677 is selected by testng-eclipse plugin to listen on.
  2. Eclipse launches RemoteTestNG DemoTest process in debug mode using -serport 55677 argument.
  3. The DemoTest JVM connection uses port 55678 to talk to Eclipse debugger.
  4. Eclipse debugger connection established with DemoTest JVM on port 55678. JVM is paused, waiting for Eclipse to signal the JVM to start executing.
  5. TestNG starts its listener on port 55677, but DemoTest JVM is paused, so it will never attempt to connect.
  6. TestRunnerViewPart for loop exits, displaying the error popup in Eclipse.
  7. Immediately the DemoTest JVM begins executing and attempting to connect to 55677, but the testng-eclipse listener has already shut down. RemoteTestNG process keeps running, attempting to connect every few seconds.

Dtrace transcripts

RemoteTestNG process paused, waiting for Eclipse debugger to signal it to begin executing (stack trace of process):

          libsystem_kernel.dylib`__gettimeofday+0xa
          libjvm.dylib`os::PlatformEvent::park(long)+0x194
          libjvm.dylib`Monitor::IWait(Thread*, long)+0xa8
          libjvm.dylib`Monitor::wait(bool, long, bool)+0x177
          libjvm.dylib`VMThread::loop()+0x1bc
          libjvm.dylib`VMThread::run()+0x79
          libjvm.dylib`java_start(Thread*)+0xf6
          libsystem_pthread.dylib`_pthread_body+0x83
          libsystem_pthread.dylib`_pthread_body
          libsystem_pthread.dylib`thread_start+0xd

TCP connect attempts during the test:

2015 May 21 13:47:42 85869 java: 127.0.0.1:55678
.. eclipse is hanging here..

.. popup appears and startTestRunListening() returns, connect attempts start ..
2015 May 21 13:47:57 85869 java: 0.0.0.0:55677
2015 May 21 13:48:01 85869 java: 0.0.0.0:55677
2015 May 21 13:48:05 85869 java: 0.0.0.0:55677

Fix

The fix is to apply this modified version of @denyska's patch. It resolves the deadlock, since it allows the startTestRunListening() method to return, which allows the RemoteTestNG process to begin executing and connect to testng-eclipse listener. Since running this version I've had zero problems with the 57% hang.

@cbeust
Copy link
Collaborator

cbeust commented May 22, 2015

Wow, this has to be the most elaborately described problem and fix I've ever seen, this is just plain awesome.

Thanks a lot, Patrick!

cbeust added a commit that referenced this pull request May 22, 2015
Fix debug launch thread hang using modified denyska patch.
@cbeust cbeust merged commit 72a1aea into testng-team:master May 22, 2015
@ghost
Copy link

ghost commented May 23, 2015

Awesome I had same issue

@phensley
Copy link
Contributor Author

@cbeust 👍 Thanks for the fast merge!

@stephenhartley
Copy link

I am still seeing this issue running Luna Service Release 2 (4.4.2) Build id: 20150219-0600 with the testng plugin version 6.9.5.201506120235

Is someone able to confirm has this fix been included in this version of the plugin?

Just FYI, running the same test using "Run" works perfectly, it is only when running as "Debug" that I get the 57% hang and error modal.

I just checked out the latest from master and still see the same issue. Debug output is:

[RemoteTestNG] Invoked with -serport 58295 -d C:\steve\githubrepos\camel\test-output C:\Users\steve\AppData\Local\Temp\testng-eclipse--535199138\testng-customsuite.xml 
[BaseMessageSender] Waiting for Eclipse client on localhost:58295

Thanks

@cbeust
Copy link
Collaborator

cbeust commented Jun 13, 2015

Yes, the update site contains the fix. Are you seeing this from a fresh
workspace too?

Cédric

On Fri, Jun 12, 2015 at 2:21 PM, Stephen Hartley notifications@github.com
wrote:

I am still seeing this issue running Luna Service Release 2 (4.4.2) Build
id: 20150219-0600 with the testng plugin version 6.9.5.201506120235

Is someone able to confirm has this fix been included in this version of
the plugin?

Thanks


Reply to this email directly or view it on GitHub
#148 (comment)
.

@stephenhartley
Copy link

@cbeust Tried out in a clean workspace and still see the same problem.

Interestingly it is only happening when I debug and have a breakpoint in the JDK source, for example in java.util.Properties. If I have a breakpoint in my own source lines then the debug completes successfully.

Any thoughts as to why this might be?

I don't know if this is relevant but the maven dependency for testng is version 6.9.4 but the eclipse plugin is at version 6.9.5, does this have any negative impact? JDK version is 1.8.0_45.

The same test debugs as expected when using JUnit, even with breakpoints in the JDK source...

Thanks!

Steve

@cbeust
Copy link
Collaborator

cbeust commented Jun 13, 2015

Yes. this usually happens because of break points. I'm guessing your break
point is preventing RemoteTestNG from starting. Can you see the break point
trigger? If yes, does the stack involve RemoteTestNG?

Cédric

On Sat, Jun 13, 2015 at 4:23 AM, Stephen Hartley notifications@github.com
wrote:

@cbeust https://github.com/cbeust Tried out in a clean workspace and
still see the same problem.

Interestingly it is only happening when I debug and have a breakpoint in
the JDK source, for example in java.util.Properties. If I have a
breakpoint in my own source lines then the debug completes successfully.

Any thoughts as to why this might be?

Thanks!

Steve


Reply to this email directly or view it on GitHub
#148 (comment)
.

@stephenhartley
Copy link

Yes, the break point does trigger, here's the stack. I can see that RemoteTestNG is right at the top. When the breakpoint triggers I get the 'Couldn't contact the RemoteTestNG client' modal and after pressing resume in Eclipse debugger nothing further happens. However the thread is still running. Pressing terminate stops the thread but no testng results are displayed...

AlchemetricsFilenameTest.testProperties [TestNG]    
org.testng.remote.RemoteTestNG at localhost:52087   
    Thread [main] (Suspended (breakpoint at line 166 in Properties))    
        owns: Properties  (id=32)   
        owns: Class<T> (java.util.TimeZone) (id=19) 
        owns: JarFile  (id=33)  
        owns: URLClassPath  (id=34) 
        owns: Object  (id=35)   
        Properties.setProperty(String, String) line: 166    
        System.setProperty(String, String) line: 796    
        TimeZone$1.run() line: 684  
        TimeZone$1.run() line: 681  
        AccessController.doPrivileged(PrivilegedAction<T>) line: not available [native method]  
        TimeZone.setDefaultZone() line: 681 
        TimeZone.getDefaultRef() line: 636  
        Date.<init>(int, int, int, int, int, int) line: 254 
        ZipUtils.dosToJavaTime(long) line: 74   
        JarFile(ZipFile).getZipEntry(String, long) line: 570    
        JarFile(ZipFile).getEntry(String) line: 313 
        JarFile.getEntry(String) line: 240  
        JarFile.getJarEntry(String) line: 223   
        JarFile.getManEntry() line: 494 
        JarFile.checkForSpecialAttributes() line: 553   
        JarFile.hasClassPathAttribute() line: 518   
        JavaUtilJarAccessImpl.jarFileHasClassPathAttribute(JarFile) line: 37    
        URLClassPath$JarLoader.getClassPath() line: 1149    
        URLClassPath.getLoader(int) line: 495   
        URLClassPath.getNextLoader(int[], int) line: 457    
        URLClassPath.getResource(String, boolean) line: 211 
        URLClassLoader$1.run() line: 365    
        URLClassLoader$1.run() line: 362    
        AccessController.doPrivileged(PrivilegedExceptionAction<T>, AccessControlContext) line: not available [native method]   
        Launcher$AppClassLoader(URLClassLoader).findClass(String) line: 361 
        Launcher$AppClassLoader(ClassLoader).loadClass(String, boolean) line: 424   
        Launcher$AppClassLoader.loadClass(String, boolean) line: 331    
        Launcher$AppClassLoader(ClassLoader).loadClass(String) line: 357    
        LauncherHelper.checkAndLoadMain(boolean, int, String) line: 495 
C:\steve\jdk1.8.0_45\bin\javaw.exe (13 Jun 2015 19:56:16)   

@cbeust
Copy link
Collaborator

cbeust commented Jun 14, 2015

Yup, pretty much explains everything. The Eclipse plug-in launches
RemoteTestNG as a separate process but because of the break point,
RemoteTestNG doesn't take off so it never opens a socket. Eventually, the
plug-in's attempt at connecting on this socket time out.

Cédric

On Sat, Jun 13, 2015 at 12:04 PM, Stephen Hartley notifications@github.com
wrote:

Yes, the break point does trigger, here's the stack. I can see that
RemoteTestNG is right at the top. When the breakpoint triggers I get the
'Couldn't contact the RemoteTestNG client' modal and after pressing resume
in Eclipse debugger nothing further happens. However the thread is still
running. Pressing terminate stops the thread but no testng results are
displayed...

AlchemetricsFilenameTest.testProperties [TestNG]
org.testng.remote.RemoteTestNG at localhost:52087
Thread [main](Suspended %28breakpoint at line 166 in Properties%29)
owns: Properties (id=32)
owns: Class (java.util.TimeZone) (id=19)
owns: JarFile (id=33)
owns: URLClassPath (id=34)
owns: Object (id=35)
Properties.setProperty(String, String) line: 166
System.setProperty(String, String) line: 796
TimeZone$1.run() line: 684
TimeZone$1.run() line: 681
AccessController.doPrivileged(PrivilegedAction) line: not available [native method]
TimeZone.setDefaultZone() line: 681
TimeZone.getDefaultRef() line: 636
Date.(int, int, int, int, int, int) line: 254
ZipUtils.dosToJavaTime(long) line: 74
JarFile(ZipFile).getZipEntry(String, long) line: 570
JarFile(ZipFile).getEntry(String) line: 313
JarFile.getEntry(String) line: 240
JarFile.getJarEntry(String) line: 223
JarFile.getManEntry() line: 494
JarFile.checkForSpecialAttributes() line: 553
JarFile.hasClassPathAttribute() line: 518
JavaUtilJarAccessImpl.jarFileHasClassPathAttribute(JarFile) line: 37
URLClassPath$JarLoader.getClassPath() line: 1149
URLClassPath.getLoader(int) line: 495
URLClassPath.getNextLoader(int[], int) line: 457
URLClassPath.getResource(String, boolean) line: 211
URLClassLoader$1.run() line: 365
URLClassLoader$1.run() line: 362
AccessController.doPrivileged(PrivilegedExceptionAction, AccessControlContext) line: not available [native method]
Launcher$AppClassLoader(URLClassLoader).findClass(String) line: 361
Launcher$AppClassLoader(ClassLoader).loadClass(String, boolean) line: 424
Launcher$AppClassLoader.loadClass(String, boolean) line: 331
Launcher$AppClassLoader(ClassLoader).loadClass(String) line: 357
LauncherHelper.checkAndLoadMain(boolean, int, String) line: 495
C:\steve\jdk1.8.0_45\bin\javaw.exe (13 Jun 2015 19:56:16)


Reply to this email directly or view it on GitHub
#148 (comment)
.

@stephenhartley
Copy link

Thanks for the explanation Cédric. Does this mean this is a bug in testng-eclipse? How can I help to get it fixed - should I raise a new issue?

I cloned testng and built the latest version as testng-6.9.5-SNAPSHOT.jar and see the same problem there too.

Thanks again

Steve

@missedone
Copy link
Collaborator

@stephenhartley
yes, it's a bug, please go ahead to create a new issue to track this.
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants