Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-20428][test] Fix the unstable test ZooKeeperLeaderElectionConnectionHandlingTest#testConnectionSuspendedHandlingDuringInitialization #14268

Closed

Conversation

wangyang0918
Copy link
Contributor

When the ZooKeeper connection is suspended, the LeaderRetrievalEventHandler will be notified with an empty leader information. In the failed test ZooKeeperLeaderElectionConnectionHandlingTest.testConnectionSuspendedHandlingDuringInitialization, we expect no leader will be notified. This is wrong. The reason why it could work is the timeout is set to very small(50ms) and it is too fast to get the leader information.

This PR will fix the unstable test via correcting the expectation.

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 1, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 00f90b8 (Tue Dec 01 06:10:09 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@wangyang0918
Copy link
Contributor Author

cc @XComp @tillrohrmann Could you please have a look?

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 1, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR @wangyang0918. I think we can further simplify the QueueLeaderElectionListener to not have a timeout field at all. Please take a look at my comments.

@@ -281,6 +282,10 @@ public void notifyLeaderAddress(LeaderInformation leaderInformation) {
}

public CompletableFuture<String> next() {
return next(timeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clean up the timeout field of the QueueLeaderElectionListener? I think it does not make sense that the leader election listener has such a field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, notifyLeaderAddress should probably wait until it can add the leaderInformation to the queue.

return next(timeout);
}

public CompletableFuture<String> next(Duration timeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public CompletableFuture<String> next(Duration timeout) {
public CompletableFuture<String> next(@Nullable Duration timeout) {

assertThat("No result is expected since there was no leader elected before stopping the server, yet.",
secondAddress, is(nullValue()));
// QueueLeaderElectionListener will be notified with an empty leader when ZK connection is suspended
final CompletableFuture<String> secondAddress = queueLeaderElectionListener.next(Duration.ofSeconds(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are expecting an empty leader information, then a timeout is not required here.

…ectionHandlingTest#testConnectionSuspendedHandlingDuringInitialization
@wangyang0918
Copy link
Contributor Author

@tillrohrmann Thanks for the review. I have addressed the comments.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this PR @wangyang0918. LGTM. I will merge this PR once AZP gives green light.

@XComp
Copy link
Contributor

XComp commented Dec 1, 2020

I went through the code and have nothing to add. Thanks @wangyang0918

tillrohrmann pushed a commit that referenced this pull request Dec 2, 2020
…ectionHandlingTest#testConnectionSuspendedHandlingDuringInitialization

This closes #14268.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants