Fixes [SUREFIRE-1516]: Poor performance in reuseForks=false #253

jon-bell · 2019-11-11T21:27:43Z

Hi,
This PR resolves the performance bug noted in SUREFIRE-1516, which appears when using the reuseForks=false configuration. The root-cause of the observed performance problem comes from forked JVM teardown time.

The issue is that the forked JVM should not block reading IO to read more from the host JVM after it sends BYE_ACK. Threads blocking on read may not be interruptable until they poll for interrupts (every 350msec for stdin), which can introduce significant latency for short-lived processes, such as surefire forked JVMs which are running just a single test at a time. This 350msec overhead can add up on projects that have thousands of test classes, each of which might take only several hundred msec to run.

To measure the scope of the problem and confirm its resolution, I created a simple benchmark test suite, which consists of 100 JUnit test classes, each with a single test that calls Thread.sleep(250). I instrumented the JVM to record the time that each JVM starts, the time that the test starts (as measured by JUnit), the time that the test ends (as measured by JUnit), and the time that the JVM terminates. For comparison, I also did the same experiment with ant and gradle.

The table below shows the results, which represent the average time for each test (over the 100 samples):

Configuration	Time to start forked JVM	Time to run test	Time to tear down forked JVM
ant 1.10.6	250.42	252.81	8.75
gradle 5.6.1	394.91	253.12	16.9
mvn (`b97df5a`)	250.21	252.59	358.59
mvn (`2fbe44f`)	216.66	252.32	16.9

Overall, most build systems took similar amounts of time to spin up the JVM, and all took the expected 250ms to run each test. You can see that the current master version of surefire (b97df5a) takes an unusually high amount of time to tear down the forked JVM (in fact, 350 msec more, which is exactly the time for the JVM to interrupt a thread reading from stdin explained in this fantastic Stack Overflow post). This is an easy fix though: after receiving the BYE_ACK message, the forked JVM can stop reading commands from the main surefire process, since it's shutting down. After implementing this fix, the overhead goes away (as shown in 2fbe44f).

… from the host JVM after it sends BYE_ACK. Threads blocking on `read` may not be interruptable until they poll for interrupts (every 350msec), which can introduce significant latency for short-lived processes.

Tibor17 · 2019-11-12T02:05:45Z

The build fails on macOS.
ForkModeMultiModuleIT.testForkCountTwoNoReuse:70->doTest:137 » SurefireVerifier
Can you pls have a look?
https://github.com/apache/maven-surefire/pull/253/checks?check_run_id=298330219

Tibor17 · 2019-11-13T01:47:22Z

surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java

@@ -412,6 +412,10 @@ public void run()
                                CommandReader.this.wakeupIterator();
                                callListeners( command );
                                break;
+                            case BYE_ACK:


It is not very elegant solution.
In the ForkedBooter the listener is already registered commandReader.addByeAckListener( new CommandListener() , see the line 356. See the next comment.

One thing I do not understand is why this change cause IT failure on macOS.

It would be great to reproduce it with several runs of the build.

surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java

jon-bell · 2019-11-13T07:42:42Z

It looks like several of the tests are flaky, perhaps - when I try to reproduce locally on MacOS, I get a different set of tests failing. Unfortunately I do not have time to debug these tests this week.

jon-bell · 2019-11-21T14:51:09Z

@Tibor17 can you trigger the Mac OS build to run again? I just tried to run the test suite locally again on my Mac and now none of the integration tests failed. I think that there is definitely a flaky test here (and the failure is not due to my change)

jon-bell · 2019-12-16T14:58:35Z

@Tibor17 please note that all of the tests are now passing.

Tibor17 · 2019-12-17T11:29:31Z

@jon-bell I have triggered the build again.

eolivelli

very nice trick.
Thank you for providing this enhancement

I hope we can commit this patch soon, I left my comment

surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java

Tibor17 · 2019-12-17T13:21:31Z

The build failed again https://github.com/apache/maven-surefire/runs/352389395
@jon-bell What happens if you push a new dummy commit several times an hour? Pls try and we will investigate them then. Thx

jon-bell · 2019-12-17T13:30:38Z

@Tibor17 I made a dummy commit and it looked like it passed - 06c74da

Tibor17 · 2019-12-18T13:56:49Z

@jon-bell
ok, let's continue with a new commit for #253 (comment)
and then i will create a branch to run the build on our Jenkins.

Tibor17 · 2019-12-29T02:36:20Z

@jon-bell Thx for contributing. I used your changes, updated the IT 705 and closed the Jira as fixed.
Can you pls perform new measurements with master, how far we are from the last measurement?

jon-bell · 2019-12-30T20:45:08Z

@Tibor17 - Thanks! The numbers from current master (5148b02) look pretty much the same on my side (down to 16.9msec to tear down the process).

Fixes [SUREFIRE-1516]: Forked JVM should not block in IO to read more…

2fbe44f

… from the host JVM after it sends BYE_ACK. Threads blocking on `read` may not be interruptable until they poll for interrupts (every 350msec), which can introduce significant latency for short-lived processes.

Tibor17 reviewed Nov 13, 2019

View reviewed changes

surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java Show resolved Hide resolved

Make pointless change to comment to re-trigger build

06c74da

eolivelli requested changes Dec 17, 2019

View reviewed changes

surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java Show resolved Hide resolved

Add changes suggested in review

1364c62

jon-bell force-pushed the SUREFIRE-1516 branch from b4c509d to 1364c62 Compare December 19, 2019 20:29

jon-bell requested a review from eolivelli December 19, 2019 20:31

Tibor17 mentioned this pull request Dec 27, 2019

[SUREFIRE-1516] Poor performance in reuseForks=false #259

Closed

asfgit pushed a commit that referenced this pull request Dec 29, 2019

The JVM teardown has speedup by 360 milli seconds, see #253

e062e82

Tibor17 closed this Dec 29, 2019

Tibor17 reopened this Dec 29, 2019

Tibor17 closed this Dec 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes [SUREFIRE-1516]: Poor performance in reuseForks=false #253

Fixes [SUREFIRE-1516]: Poor performance in reuseForks=false #253

jon-bell commented Nov 11, 2019

Tibor17 commented Nov 12, 2019

Tibor17 Nov 13, 2019

Tibor17 Nov 13, 2019

Tibor17 Nov 13, 2019

jon-bell commented Nov 13, 2019

jon-bell commented Nov 21, 2019

jon-bell commented Dec 16, 2019

Tibor17 commented Dec 17, 2019

eolivelli left a comment

Tibor17 commented Dec 17, 2019 •

edited

Loading

jon-bell commented Dec 17, 2019

Tibor17 commented Dec 18, 2019

Tibor17 commented Dec 29, 2019

jon-bell commented Dec 30, 2019

Fixes [SUREFIRE-1516]: Poor performance in reuseForks=false #253

Fixes [SUREFIRE-1516]: Poor performance in reuseForks=false #253

Conversation

jon-bell commented Nov 11, 2019

Tibor17 commented Nov 12, 2019

Tibor17 Nov 13, 2019

Choose a reason for hiding this comment

Tibor17 Nov 13, 2019

Choose a reason for hiding this comment

Tibor17 Nov 13, 2019

Choose a reason for hiding this comment

jon-bell commented Nov 13, 2019

jon-bell commented Nov 21, 2019

jon-bell commented Dec 16, 2019

Tibor17 commented Dec 17, 2019

eolivelli left a comment

Choose a reason for hiding this comment

Tibor17 commented Dec 17, 2019 • edited Loading

jon-bell commented Dec 17, 2019

Tibor17 commented Dec 18, 2019

Tibor17 commented Dec 29, 2019

jon-bell commented Dec 30, 2019

Tibor17 commented Dec 17, 2019 •

edited

Loading