Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17077: Don't cancel twice when rejoining shard leader election #2084

Merged
merged 3 commits into from Dec 5, 2023

Conversation

psalagnac
Copy link
Contributor

@psalagnac psalagnac commented Nov 17, 2023

https://issues.apache.org/jira/browse/SOLR-17077

Description

When we invoke REJOINELECTION command on a replica, the replica must first leave its current election position by deleting the sequential ephemeral node in Zookeeper. This is done by invoking cancelElection() method on the election context.

This issue is cancelElection() method is invoked twice. Once in ZkController and once in retryElection() in the leader elector. Consequently we try to delete the Zookeeper leader election node twice. The second attempt always fails since the node was already removed by the first attempt.

Solution

Remove the call to cancelElection() from ZkController. Also remove the creation of a new election context here, since a new one will be created in retryElection().

Second commit on the pull request is to remove useless parameters shardId and electionNode from the REJOINELECTION command, that are not used anymore.

We have this change in production for 6 months and spotted no issue.

Tests

No test added.
I made sure all leader election tests still pass.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional side benefit is a bit less code :-)

@@ -26,13 +26,9 @@ public class RejoinLeaderElectionPayload implements ReflectMapWriter {
// ones are meant to be required without that being specified on the v1 API or elsewhere
@JsonProperty public String collection;

@JsonProperty public String shard;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards-compatible? I'm not super familiar with this, but from what I can see, there are some inputs that are no longer needed. It seems they weren't really used before (yes?). What would happen if a client supplies it?

CC @gerlowskija you might know if this produces a failure or not

Copy link
Contributor Author

@psalagnac psalagnac Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this @dsmiley.
I did a test locally and it seems to me any additional property in the Json payload here is simply ignored. Since this change is only removing some properties, I don't think we can have backwards-compatibility issues.

@dsmiley
Copy link
Contributor

dsmiley commented Dec 5, 2023

@psalagnac can you add a CHANGES.txt entry please? In "Other" or "Improvements" depending on your point of view. Try to both be brief and communicate the scope/impact -- kind of hard sometimes ;-)

@psalagnac
Copy link
Contributor Author

psalagnac commented Dec 5, 2023

Thanks.
I added an entry in "Improvements", but not sure whether it should go to bug "Bug Fixes". Maybe the code should have been like that the from the beginning.

@dsmiley dsmiley merged commit 5f7c685 into apache:main Dec 5, 2023
@psalagnac psalagnac deleted the cancel-twice branch December 5, 2023 15:50
@dsmiley
Copy link
Contributor

dsmiley commented Dec 5, 2023

By entering this CHANGES.txt into the 10.0 release section, it means you believe it should not be back ported to 9x. Is that what you meant or was that accidental? I'll move it and backport if you wish.

@psalagnac
Copy link
Contributor Author

Yes, just main is fine.
I want to follow-up on this work with other cleanups in leader election code. I don't think it is worth to backport them to 9x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants