Skip to content

[BROOKLYN-356] Transformer value resolver test#365

Merged
asfgit merged 4 commits intoapache:masterfrom
neykov:transformer-resolver-test
Oct 12, 2016
Merged

[BROOKLYN-356] Transformer value resolver test#365
asfgit merged 4 commits intoapache:masterfrom
neykov:transformer-resolver-test

Conversation

@neykov
Copy link
Member

@neykov neykov commented Oct 2, 2016

  • Adds thread startup jittering so we can lure out any concurrency bugs hiding in the code base. Testing core with this results in a handful of failures. Similar problems are very apparent on the windows builds where the machines have a single CPU and are frequently failing. I'd enable this by default for testing only if it wasn't slowing the tests to a craw.
  • Adds a test which shows the problem with the transformer resolver. It's failing under normal conditions infrequently, so use with the above enabled.

Tests are marked as Broken because there's no fix available for BOOKLYN-356 yet so they might fail builds. The tests are otherwise fully implemented/working.

@neykov neykov force-pushed the transformer-resolver-test branch from 361574c to 1f82109 Compare October 2, 2016 20:12
invokeCallback(flags.get("newTaskStartCallback"), task);
}

private void jitterThreadStart(Task<?> task) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

I wonder if we should make it non-static so it can be set by a test, and unset in its tearDown method. But let's go with static. For now, we can use it by setting the system property when doing mvn clean install.

Once we've fixed (most of) the failures that it exposes, then we can think about enabling it for some tests in jenkins - for regression testing purposes. But we want jenkins to be deterministic!

Throwable error = null;
String oldThreadName = Thread.currentThread().getName();
try {
if (RENAME_THREADS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this is removed from here because internalBeforeStart() and internalAfterEnd() are already doing the thread renaming? If so, then great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Details in the commit message:

Remove duplicate thread renaming

The code in beforeStartAtomicTask overrides the value set here. Keeping the other one as it's called for scheduled tasks as well.

}

@Test(groups={"Integration", "Broken"}, invocationCount=10)
public void testTransformerResolvesResolvableValuesIntegration() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test. Without the jitter, I ran it with invocationCount=100 and it failed 2 out of 100 times with errors like:

java.lang.AssertionError: entity=Application[s01l91es]; attribute=Sensor: port.transformed (java.lang.String) expected [10889] but found [null]

testTransformerResolvesResolvableValues(START_PORT, 200);
}

@Test(groups={"Integration", "Broken"}, invocationCount=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth including a comment above the test that points people to https://issues.apache.org/jira/browse/BROOKLYN-356

@aledsage
Copy link
Contributor

aledsage commented Oct 5, 2016

Jenkins build failure looks unrelated: when building "Brooklyn CAMP REST API", got Build timed out (after 60 minutes). Marking the build as aborted.

It was in the middle of invoking DslAndRebindYamlTest.testDslAttributeWhenReadyPersistedWithoutLeakingResolvedValue(). Below is the logging. Note the long GC pause times (approx 0.2 seconds each time). That suggests to me that it was doing a full GC. But it's hard to say on a shared build machine that might be starved of resources.

We should (separately) investigate DslAndRebindYamlTest to see how resource-hungry it is.

Unfortunately we don't have the "brooklyn gc" logging to tell us about the memory usage over time (it's logged at debug). But then for all tests except that one, the test would complete within one minute so we'd have deleted the managementContext before the brooklynGC did any logging.

Maybe we want something like the BrooklynLeakListener to spawn a scheduled executor in the background, and to log the memory usage of the VM periodically so we can see what the test suite is doing over time?

2016-10-02 21:04:32,889 INFO  TESTNG INVOKING: "Surefire test" - org.apache.brooklyn.camp.brooklyn.DslAndRebindYamlTest.testDslAttributeWhenReadyPersistedWithoutLeakingResolvedValue()
2016-10-02 21:04:33,052 INFO  Started application BasicApplicationImpl{id=gpt66t7160}
2016-10-02 21:04:33,052 INFO  Waiting on 3 task(s)
SUREFIRE-859: [GC 538982K->318659K(759808K), 0.1438800 secs]
SUREFIRE-859: [GC 542915K->355674K(767488K), 0.3288260 secs]
SUREFIRE-859: [GC 579930K->383114K(718336K), 0.2902200 secs]
SUREFIRE-859: [GC 558218K->401746K(736256K), 0.2247390 secs]
SUREFIRE-859: [GC 576850K->402834K(729088K), 0.2136890 secs]
SUREFIRE-859: [GC 559506K->395786K(711680K), 0.1718160 secs]
SUREFIRE-859: [GC 552458K->398330K(730624K), 0.1999770 secs]
SUREFIRE-859: [GC 550906K->397842K(709632K), 0.1918430 secs]
SUREFIRE-859: [GC 550418K->397994K(730112K), 0.1925270 secs]
SUREFIRE-859: [GC 548522K->395114K(730624K), 0.1828260 secs]
SUREFIRE-859: [GC 545642K->396522K(732160K), 0.1813480 secs]
SUREFIRE-859: [GC 549610K->398146K(731648K), 0.1988880 secs]
SUREFIRE-859: [GC 551234K->398202K(732672K), 0.1862850 secs]
SUREFIRE-859: [GC 552314K->396794K(732160K), 0.1816440 secs]
SUREFIRE-859: [GC 550906K->397530K(734208K), 0.1915190 secs]
SUREFIRE-859: [GC 554202K->398810K(733184K), 0.1886670 secs]
SUREFIRE-859: [GC 555482K->398810K(734720K), 0.1962980 secs]
SUREFIRE-859: [GC 557530K->398066K(734720K), 0.1893480 secs]
SUREFIRE-859: [GC 556786K->398322K(736768K), 0.1909740 secs]
SUREFIRE-859: [GC 560114K->400122K(735744K), 0.1974540 secs]
SUREFIRE-859: [GC 561914K->400274K(736768K), 0.2081040 secs]
SUREFIRE-859: [GC 563090K->399050K(736768K), 0.1930720 secs]
SUREFIRE-859: [GC 561866K->399466K(738304K), 0.1916980 secs]
SUREFIRE-859: [GC 564842K->400842K(737792K), 0.2071120 secs]
SUREFIRE-859: [GC 566218K->400866K(738816K), 0.2107460 secs]
SUREFIRE-859: [GC 567266K->399898K(738304K), 0.1974350 secs]
SUREFIRE-859: [GC 566298K->400474K(739840K), 0.2096590 secs]
SUREFIRE-859: [GC 568922K->401506K(739328K), 0.2051750 secs]
SUREFIRE-859: [GC 569954K->401882K(740352K), 0.2205450 secs]
SUREFIRE-859: [GC 571866K->400362K(740352K), 0.1994660 secs]
SUREFIRE-859: [GC 570346K->401922K(740864K), 0.2073610 secs]
SUREFIRE-859: [GC 572930K->402458K(740864K), 0.2085150 secs]
SUREFIRE-859: [GC 573466K->402618K(741376K), 0.2088340 secs]
SUREFIRE-859: [GC 574138K->400634K(740864K), 0.1965650 secs]
SUREFIRE-859: [GC 572154K->402330K(742400K), 0.2098550 secs]
SUREFIRE-859: [GC 575898K->403098K(741888K), 0.2152220 secs]
SUREFIRE-859: [GC 576666K->401906K(742400K), 0.2178950 secs]
SUREFIRE-859: [GC 575986K->401714K(742400K), 0.2041080 secs]
Build timed out (after 60 minutes). Marking the build as aborted.

@aledsage
Copy link
Contributor

aledsage commented Oct 5, 2016

@neykov looks very good to me. A couple of extremely minor things relating to this PR (the comment, and confirming about the thread renaming). Can you kick off the jenkins build again, and then merge this?

@aledsage
Copy link
Contributor

aledsage commented Oct 5, 2016

@neykov previous build failure here could well be fixed by #368 - our BrooklynLeakListener wasn't finding any left-behind management contexts to stop them, so our memory usage might have been climbing over time.

Once that PR is confirmed to work in jenkins (including checking its stdout that the leak-detector really executed) then I'll merge #368).

Useful for troubleshooting/testing concurrency related code.
The code in beforeStartAtomicTask overrides the value set here. Keeping the other one as it's called for scheduled tasks as well.
@neykov neykov force-pushed the transformer-resolver-test branch from 1f82109 to 670f83d Compare October 11, 2016 06:20
@neykov
Copy link
Member Author

neykov commented Oct 11, 2016

Addressed comments and kicked off another build.

@asfgit asfgit merged commit 670f83d into apache:master Oct 12, 2016
asfgit pushed a commit that referenced this pull request Oct 12, 2016
@neykov neykov deleted the transformer-resolver-test branch October 12, 2016 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants