Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doozer build fails on Utilities - TokenManagerTest line 411 #405

Open
jondgoodwin opened this issue Apr 16, 2018 · 1 comment
Open

Doozer build fails on Utilities - TokenManagerTest line 411 #405

jondgoodwin opened this issue Apr 16, 2018 · 1 comment

Comments

@jondgoodwin
Copy link
Collaborator

Doozer fails on the GCC unit tests, but not the Clang and OSX unit tests. It is build #68.

This is the point of failure: https://github.com/BitFunnel/BitFunnel/blob/master/src/Common/Utilities/test/TokenManagerTest.cpp#L411

Value of: lastToken.GetSerialNumber() > 0
  Actual: false
Expected: true

This may be a poorly designed test that fails randomly. The token manager is used to allow lock-free coordination between threads, so it is possible that the unit test is not principled. We should see if it launches a bunch of random threads and seems to be non-deterministic. If so, it is likely a random error.

In the code it states:

        // DESIGN NOTE: this test used to have random sleeps in the methods
        // and a 5 second sleep here. All the sleeps were reduced because we
        // still get some randomness in ordering and this test shouldn't
        // take 5+ seconds to run. However, it's possible that's too extreme
        // and we should larger sleeps.
        // This test, with the reduced sleeps, found a threading bug that
        // was triggered every 10s to 100s of executions. With the old version
        // of this test, that would have taken an unreasonably long time, so
        // even if this test is sacrificing some "randomness" by having
        // a shorter duration, I believe it makes up for it by the increased
        // number of executions possible per unit time, and that we should
        // really have a seperate test that's designed for overnight use
        // that can be more comprehensive.
        // 5 is an arbitrary number that's > 2x 2.

This failure is likely due to slow startup of the TokenTrackerThread. Perhaps on Doozer the VM is so overloaded that the thread had not even started by the time this check came along.

The correct fix is to probably add some handshaking between the TokenTrackerThread and the main thread, using an std::condition_variable. We probably also want a burn-in test that runs overnight.

Not sure this is worth it now as the code has been trouble free for years in Bing and we're only seeing the problem on a slow VM in CI. My recommendation is to log the bug with all of the info we have and then see if it happens again (especially in CI).

@jondgoodwin
Copy link
Collaborator Author

I re-ran the build & test on Doozer from the master branch. All tests passed.

As per the earlier recommendation, I will not pursue correcting this timing issue at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant