-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-1769. Avoid changing priorities in TransferCommand unless necessary #808
Conversation
@szetszwo @codings-dan what do you think? |
I have come up with 3 options to preserve backward compatibility.
Personally I prever the 2nd way. |
@kaijchen Thanks for working on this. I prefer the second option, we can add a new commad to ensure compatibility. |
@codings-dan @szetszwo PTAL, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaijchen Thanks for working on this, could you add a unit test for the new command? see ElectionCommandIntegrationTest#testElectionTransferCommand
Thanks @codings-dan, I have added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaijchen , thanks a lot for working on this. The change looks good.
We should have test to transfer a leader who does not have the highest priority. Then, it should get some exception.
Thanks @szetszwo for reviewing. Good idea. |
@kaijchen , I am thinking how to deal with the existing and the new commands.
|
I'm OK to call it
The existing Based on these reasons, I chose to deprecate the old command. @szetszwo What do you think? |
@kaijchen , I got a new idea -- we may update the
We do you think? |
The old server (prior to 2.4.1) always requires the transferee to have the highest priority for transfer leadership.
Personally I prefer to print a message like "transferee does not have highest priority, please call Examplelet's say we have a cluster like this,
New serverIf we want to transfer leader from A to E, we should only raise E's priority to highest.
Old serverSince the old server requires transferee to be the only peer with highest priority.
However, the old command will reset every node's priority, and the old priority is erased.
|
@kaijchen , that's a good point. We may change the existing
Does it sound good? |
Yes. But how do we deal with old server (backward compatibility)?
Can we just drop backward compatibility? |
@kaijchen , When a new client talking to an old server, the client in that case should get a |
OK, sounds good. |
Manually test backward compatibility, Ratis shell version $ bin/ratis sh election transfer -peers 127.0.0.1:10024,127.0.0.1:10124,127.0.0.1:11124 -address 127.0.0.1:11124
[main] INFO org.reflections.Reflections - Reflections took 164 ms to scan 1 urls, producing 5 keys and 18 values
[main] WARN org.apache.ratis.metrics.MetricRegistries - Found multiple MetricRegistries implementations: class org.apache.ratis.metrics.impl.MetricRegistriesImpl, class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl. Using first found implementation: org.apache.ratis.metrics.impl.MetricRegistriesImpl@1e67a849
Transferring leadership to server with address <127.0.0.1:11124>
Changing priority of <127.0.0.1:11124> to 1
: caught an error when executing transfer: n0@group-ABB3109A44C1 refused to transfer leadership to peer n2 as it does not has highest priority 9: peers:[n0|rpc:127.0.0.1:10024|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER, n1|rpc:127.0.0.1:10124|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, n2|rpc:127.0.0.1:11124|admin:|client:|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[], old=null
Transferring leadership to server with address <127.0.0.1:11124>
Changing priority of <127.0.0.1:11124> to 2
: Transferring leadership initiated |
Note: sometimes transfer leadership succeeds, but the request timed out. This will lead to To fix this problem, we should wait for the new leader to be elected in the shell. |
Set default timeout to
Added TODO in comments. It's better to keep changes small in this PR. |
@szetszwo please take another look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaijchen , thanks for the update. Please see the comments inlined.
final long timeoutDefault = 3_000L; | ||
// Default timeout for legacy mode matches with the legacy command (version 2.4.x and older). | ||
final long timeoutLegacy = 60_000L; | ||
final Optional<Long> timeout = !cl.hasOption(TIMEOUT_OPTION_NAME) ? Optional.empty() : | ||
Optional.of(Long.parseLong(cl.getOptionValue(TIMEOUT_OPTION_NAME)) * 1000L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TimeDuration
for supporting different units such as 100ms, 1min.
final TimeDuration timeoutDefault = TimeDuration.valueOf(3, TimeUnit.SECONDS);
// Default timeout for legacy mode matches with the legacy command (version 2.4.x and older).
final TimeDuration timeoutLegacy = TimeDuration.valueOf(60, TimeUnit.SECONDS);
final Optional<TimeDuration> timeout = !cl.hasOption(TIMEOUT_OPTION_NAME) ? Optional.empty() :
Optional.of(TimeDuration.valueOf(cl.getOptionValue(TIMEOUT_OPTION_NAME), TimeUnit.SECONDS));
.mapToInt(RaftPeer::getPriority).max().orElse(0); | ||
RaftPeer newLeader = getRaftGroup().getPeers().stream() | ||
.filter(peer -> peer.getAddress().equals(strAddr)).findAny().orElse(null); | ||
if (newLeader == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print an error message.
printf("Peer with address %s not found.", strAddr);
private Throwable tryTransfer(RaftClient client, RaftPeer newLeader, int highestPriority, long timeout) { | ||
printf("Transferring leadership to server with address <%s> %n", newLeader.getAddress()); | ||
try { | ||
// lift the current leader to the highest priority, | ||
if (newLeader.getPriority() < highestPriority) { | ||
setPriority(client, newLeader.getAddress(), highestPriority); | ||
} | ||
RaftClientReply transferLeadershipReply = | ||
client.admin().transferLeadership(newLeader.getId(), timeout); | ||
processReply(transferLeadershipReply, () -> "election failed"); | ||
} catch (Throwable t) { | ||
printf("caught an error when executing transfer: %s%n", t.getMessage()); | ||
return t; | ||
} | ||
println("Transferring leadership initiated"); | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's return a boolean to indicate if the transfer is success.
private boolean tryTransfer(RaftClient client, RaftPeer newLeader, int highestPriority,
TimeDuration timeout) throws IOException {
printf("Transferring leadership to server with address <%s> %n", newLeader.getAddress());
try {
// lift the current leader to the highest priority,
if (newLeader.getPriority() < highestPriority) {
setPriority(client, newLeader.getAddress(), highestPriority);
}
RaftClientReply transferLeadershipReply =
client.admin().transferLeadership(newLeader.getId(), timeout.toLong(TimeUnit.MILLISECONDS));
processReply(transferLeadershipReply, () -> "election failed");
} catch (TransferLeadershipException tle) {
if (tle.getMessage().contains("it does not has highest priority")) {
return false;
}
throw tle;
}
println("Transferring leadership initiated");
return true;
}
Throwable err = tryTransfer(client, newLeader, highestPriority, timeout.orElse(timeoutDefault)); | ||
if (err instanceof TransferLeadershipException | ||
&& err.getMessage().contains("it does not has highest priority")) { | ||
// legacy mode, transfer leadership by setting priority. | ||
err = tryTransfer(client, newLeader, highestPriority + 1, timeout.orElse(timeoutLegacy)); | ||
} | ||
if (err != null) { | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After changing tryTransfer
to return a boolean, we could make the corresponding change as below:
try (RaftClient client = RaftUtils.createClient(getRaftGroup())) {
// transfer leadership
if (!tryTransfer(client, newLeader, highestPriority, timeout.orElse(timeoutDefault))) {
// legacy mode, transfer leadership by setting priority.
tryTransfer(client, newLeader, highestPriority + 1, timeout.orElse(timeoutLegacy));
}
} catch(Throwable t) {
printf("Failed to transfer peer %s with address %s: ",
newLeader.getId(), newLeader.getAddress());
t.printStackTrace(getPrintStream());
return -1;
}
return 0;
Thanks @szetszwo for the review, updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
What changes were proposed in this pull request?
This is a followup of RATIS-1762.
Try to avoid changing priorities before transfer leadership in TransferCommand.
It will fallback to "transfer leadership by changing priority" for backward compatibility.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-1769
How was this patch tested?
Integration test:
ElectionCommandIntegrationTest#testElectionTransferCommand()
andElectionCommandIntegrationTest#testElectionTransferCommandToHigherPriority()
Manually
Transfer between peers with same priority. (new feature)
Backward compatibility
Ratis shell version:
3.0.0-SNAPSHOT
.Ratis server version:
2.4.1
.In case of failure
In most cases, just a retry will fix the problem. And users can also set timeout manually by
-timeout
option.