Doozer build fails on Utilities - TokenManagerTest line 411 #405

jondgoodwin · 2018-04-16T06:58:21Z

Doozer fails on the GCC unit tests, but not the Clang and OSX unit tests. It is build #68.

This is the point of failure: https://github.com/BitFunnel/BitFunnel/blob/master/src/Common/Utilities/test/TokenManagerTest.cpp#L411

Value of: lastToken.GetSerialNumber() > 0
  Actual: false
Expected: true

This may be a poorly designed test that fails randomly. The token manager is used to allow lock-free coordination between threads, so it is possible that the unit test is not principled. We should see if it launches a bunch of random threads and seems to be non-deterministic. If so, it is likely a random error.

In the code it states:

        // DESIGN NOTE: this test used to have random sleeps in the methods
        // and a 5 second sleep here. All the sleeps were reduced because we
        // still get some randomness in ordering and this test shouldn't
        // take 5+ seconds to run. However, it's possible that's too extreme
        // and we should larger sleeps.
        // This test, with the reduced sleeps, found a threading bug that
        // was triggered every 10s to 100s of executions. With the old version
        // of this test, that would have taken an unreasonably long time, so
        // even if this test is sacrificing some "randomness" by having
        // a shorter duration, I believe it makes up for it by the increased
        // number of executions possible per unit time, and that we should
        // really have a seperate test that's designed for overnight use
        // that can be more comprehensive.
        // 5 is an arbitrary number that's > 2x 2.

This failure is likely due to slow startup of the TokenTrackerThread. Perhaps on Doozer the VM is so overloaded that the thread had not even started by the time this check came along.

The correct fix is to probably add some handshaking between the TokenTrackerThread and the main thread, using an std::condition_variable. We probably also want a burn-in test that runs overnight.

Not sure this is worth it now as the code has been trouble free for years in Bing and we're only seeing the problem on a slow VM in CI. My recommendation is to log the bug with all of the info we have and then see if it happens again (especially in CI).

The text was updated successfully, but these errors were encountered:

jondgoodwin · 2018-04-17T05:24:33Z

I re-ran the build & test on Doozer from the master branch. All tests passed.

As per the earlier recommendation, I will not pursue correcting this timing issue at this time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doozer build fails on Utilities - TokenManagerTest line 411 #405

Doozer build fails on Utilities - TokenManagerTest line 411 #405

jondgoodwin commented Apr 16, 2018

jondgoodwin commented Apr 17, 2018

Doozer build fails on Utilities - TokenManagerTest line 411 #405

Doozer build fails on Utilities - TokenManagerTest line 411 #405

Comments

jondgoodwin commented Apr 16, 2018

jondgoodwin commented Apr 17, 2018