Skip to content

Fix the flaky test - TestZkWatch#749

Merged
junkaixue merged 1 commit intoapache:masterfrom
i3wangyi:fixTest
Feb 12, 2020
Merged

Fix the flaky test - TestZkWatch#749
junkaixue merged 1 commit intoapache:masterfrom
i3wangyi:fixTest

Conversation

@i3wangyi
Copy link
Contributor

@i3wangyi i3wangyi commented Feb 11, 2020

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

(Fixes #746 )

Description

  • Here are some details about my PR, including screenshots of any UI changes:

(Write a concise description including what, why, how)

The root cause of the instability is due to the incorrect condition notify time: it should notify other threads waiting for the condition after zkClient finishes the unsubscribing all listeners.

Tests

  • The following tests are written for this issue:

(List the names of added unit/integration tests)

  • The following is the result of the "mvn test" command on the appropriate module:

(Copy & paste the result of "mvn test")
Shut down zookeeper at port 2183 in thread main
[ERROR] Tests run: 1086, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4,587.356 s <<< FAILURE! - in TestSuite
[ERROR] testMissingTopStateDurationMonitoring(org.apache.helix.integration.controller.TestControllerLeadershipChange) Time elapsed: 4.779 s <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.apache.helix.integration.controller.TestControllerLeadershipChange.testMissingTopStateDurationMonitoring(TestControllerLeadershipChange.java:262)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestControllerLeadershipChange.testMissingTopStateDurationMonitoring:262 expected: but was:
[INFO]
[ERROR] Tests run: 1086, Failures: 1, Errors: 0, Skipped: 0
[INFO]

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Code Quality

  • My diff has been formatted using helix-style.xml

@narendly
Copy link
Contributor

@i3wangyi
Thanks for the fix! Could you just run the full mvn test a few more times to make sure this doesn't show up as failure?

@i3wangyi
Copy link
Contributor Author

i3wangyi commented Feb 11, 2020

Latest mvn test

[ERROR] Tests run: 1083, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4,534.661 s <<< FAILURE! - in TestSuite
[ERROR] testMissingTopStateDurationMonitoring(org.apache.helix.integration.controller.TestControllerLeadershipChange)  Time elapsed: 4.564 s  <<< FAILURE!
java.lang.AssertionError: expected:<true> but was:<false>
	at org.apache.helix.integration.controller.TestControllerLeadershipChange.testMissingTopStateDurationMonitoring(TestControllerLeadershipChange.java:262)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestControllerLeadershipChange.testMissingTopStateDurationMonitoring:262 expected:<true> but was:<false>
[INFO] 
[ERROR] Tests run: 1083, Failures: 1, Errors: 0, Skipped: 0

It can be confirmed that the specific test is stabled. However, it looks like the TestControllerLeadershipChange.testMissingTopStateDurationMonitoring is another flaky test at least reproducible; talked with @alirezazamani, the specific one is not reproducible on his machine. Anyway, I create another issue to track the fix.

Issue #753

Copy link

@alirezazamani alirezazamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@i3wangyi
Copy link
Contributor Author

The PR is ready to be merged into master, approved by @alirezazamani @narendly

final commit message:

Fix the unstable TestZkWatch

  • The root cause of the instability is due to the incorrect condition notify time: it should notify other threads after zkClient finishes the unsubscribing the listener

@junkaixue junkaixue merged commit afc62ab into apache:master Feb 12, 2020
narendly pushed a commit to narendly/helix that referenced this pull request Feb 20, 2020
Fix the unstable TestZkWatch

The root cause of the instability is due to the incorrect condition notify time: it should notify other threads after zkClient finishes the unsubscribing the listener
mgao0 pushed a commit to mgao0/helix that referenced this pull request Mar 6, 2020
Fix the unstable TestZkWatch

The root cause of the instability is due to the incorrect condition notify time: it should notify other threads after zkClient finishes the unsubscribing the listener
huizhilu pushed a commit to huizhilu/helix that referenced this pull request Aug 16, 2020
Fix the unstable TestZkWatch

The root cause of the instability is due to the incorrect condition notify time: it should notify other threads after zkClient finishes the unsubscribing the listener
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test ZKWatch is unstable

4 participants