New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17077: Don't cancel twice when rejoining shard leader election #2084
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional side benefit is a bit less code :-)
@@ -26,13 +26,9 @@ public class RejoinLeaderElectionPayload implements ReflectMapWriter { | |||
// ones are meant to be required without that being specified on the v1 API or elsewhere | |||
@JsonProperty public String collection; | |||
|
|||
@JsonProperty public String shard; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backwards-compatible? I'm not super familiar with this, but from what I can see, there are some inputs that are no longer needed. It seems they weren't really used before (yes?). What would happen if a client supplies it?
CC @gerlowskija you might know if this produces a failure or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this @dsmiley.
I did a test locally and it seems to me any additional property in the Json payload here is simply ignored. Since this change is only removing some properties, I don't think we can have backwards-compatibility issues.
@psalagnac can you add a CHANGES.txt entry please? In "Other" or "Improvements" depending on your point of view. Try to both be brief and communicate the scope/impact -- kind of hard sometimes ;-) |
Thanks. |
By entering this CHANGES.txt into the 10.0 release section, it means you believe it should not be back ported to 9x. Is that what you meant or was that accidental? I'll move it and backport if you wish. |
Yes, just main is fine. |
https://issues.apache.org/jira/browse/SOLR-17077
Description
When we invoke
REJOINELECTION
command on a replica, the replica must first leave its current election position by deleting the sequential ephemeral node in Zookeeper. This is done by invokingcancelElection()
method on the election context.This issue is
cancelElection()
method is invoked twice. Once inZkController
and once inretryElection()
in the leader elector. Consequently we try to delete the Zookeeper leader election node twice. The second attempt always fails since the node was already removed by the first attempt.Solution
Remove the call to
cancelElection()
from ZkController. Also remove the creation of a new election context here, since a new one will be created inretryElection()
.Second commit on the pull request is to remove useless parameters
shardId
andelectionNode
from theREJOINELECTION
command, that are not used anymore.We have this change in production for 6 months and spotted no issue.
Tests
No test added.
I made sure all leader election tests still pass.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.