-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-5916. Datanodes stuck in leader election in Kubernetes #3186
HDDS-5916. Datanodes stuck in leader election in Kubernetes #3186
Conversation
…ection in Kubernets env
FYI, the Ratis pre-vote feature could avoid infinite leader election; see https://issues.apache.org/jira/browse/RATIS-993 . Of course, it is good to fix the underlying problem, i.e. updating the IP address. |
Can you please check? |
@adoroszlai , do you know how can I run these tests locally? I am not familiar with it. Thanks |
@sokui You can run this acceptance test (Robot tests in Docker Compose-based environment) locally by:
|
Thanks @adoroszlai . I believe the test is fixed now. |
Thanks @sokui for fixing the test. There are some checkstyle problems, can you please fix those, too?
|
@adoroszlai done. |
@@ -96,7 +100,13 @@ public static UUID toDatanodeId(RaftProtos.RaftPeerProto peerId) { | |||
} | |||
|
|||
private static String toRaftPeerAddress(DatanodeDetails id, Port.Name port) { | |||
return id.getIpAddress() + ":" + id.getPort(port).getValue(); | |||
if (datanodeUseHostName()) { | |||
LOG.debug("Datanode is using hostname for raft peer address"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well print the actual value calculated in the debug log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
@@ -125,7 +125,7 @@ private void persistContainerDatanodeDetails() { | |||
File idPath = new File(dataNodeIDPath); | |||
DatanodeDetails datanodeDetails = this.context.getParent() | |||
.getDatanodeDetails(); | |||
if (datanodeDetails != null && !idPath.exists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for dropping this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because when the datanode got restarted in k8s, the IP will be changed. So the original info in this file is not accurate any more. This will make sure we update with the latest info.
And when we are not using k8s, I think it is not harmful to always update this file whenever the node restarts.
// The field parent in DatanodeDetails class has the circular reference | ||
// which will result in Gson infinite recursive parsing. We need to exclude | ||
// this field when generating json string for DatanodeDetails object | ||
static class DatanodeDetailsGsonExclusionStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can be merged as a quick PR and not wait on this PR.
|
||
if (datanodeDetails.getPersistedOpState() | ||
!= HddsProtos.NodeOperationalState.IN_SERVICE) { | ||
decommissionManager.continueAdminForNode(datanodeDetails); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to depend on the guarantees of continueAdminForNode
(need to update javadoc for continueAdminForNode
) and always call that method here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continueAdminForNode implements the logic for when the dn
should be monitored. Let's not replicate it.
public synchronized void continueAdminForNode(DatanodeDetails dn)
throws NodeNotFoundException {
if (!scmContext.isLeader()) {
LOG.info("follower SCM ignored continue admin for datanode {}", dn);
return;
}
NodeOperationalState opState = getNodeStatus(dn).getOperationalState();
if (opState == NodeOperationalState.DECOMMISSIONING
|| opState == NodeOperationalState.ENTERING_MAINTENANCE
|| opState == NodeOperationalState.IN_MAINTENANCE) {
LOG.info("Continue admin for datanode {}", dn);
monitor.startMonitoring(dn);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
} catch (NodeNotFoundException e) { | ||
// Should not happen, as the node has just registered to call this event | ||
// handler. | ||
LOG.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log as an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not completed my review, some minor nits to improve the code. I have to go over the test code.
datanodeDetails.getUuidString(), | ||
datanodeInfo, | ||
datanodeDetails); | ||
if (clusterMap.contains(datanodeInfo)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to implement clusterMap.update(datanodeDetails)
. This would keep the locking and concurrency issues in check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
removeEntryFromDnsToUuidMap(oldDnsName); | ||
addEntryToDnsToUuidMap(dnsName, datanodeDetails.getUuidString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here better to implement a new method updateEntryInDnsToUuisMap(oldDnsName, dnsName, datanodeDetails.getUuidString)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@kerneltime Your comments are addressed |
LGTM |
|
For this test failure: “build-branch / integration (flaky) (pull_request)“, does it require to pass? I check the flaky annotation, which has the following definitions:
|
@adoroszlai addressed all the PR comments. Pls have another look. Thanks |
Thanks @sokui for updating the patch. LGTM, but let's wait for another review. |
FWIW @adoroszlai I'm very interested in the PR and will try to review it in the next few days. |
@@ -64,7 +65,8 @@ public class BackgroundPipelineCreator implements SCMService { | |||
* SCMService related variables. | |||
* 1) after leaving safe mode, BackgroundPipelineCreator needs to | |||
* wait for a while before really take effect. | |||
* 2) NewNodeHandler, NonHealthyToHealthyNodeHandler, PreCheckComplete | |||
* 2) NewNodeHandler, NodeIpOrHostnameUpdateHandler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be "NodeAddressUpdateHandler" instead of "NodeIpOrHostnameUpdateHandler"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Let me fix this.
@@ -28,7 +28,19 @@ regenerate_resources | |||
|
|||
start_k8s_env | |||
|
|||
execute_robot_test scm-0 smoketest/basic/basic.robot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still want to run this test, in addition to those below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adoroszlai The change for this file is by cherry picking of your commits. Could you pls let me know why you delete this line before? If we need it back, I can simply put it back. Just want to know your consideration. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK either way.
basic.robot
has two tests:
- HTTP request for static web resource
- Freon key generation/validation
The latter (Freon) is performed by the new code, too, so only the web test is missing.
* @param datanodeDetails new datanodeDetails | ||
*/ | ||
@Override | ||
public void closeStalePipelines(DatanodeDetails datanodeDetails) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a unit test for this. Is it not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
"Datanodes may be used up. Try to see if any pipeline is in " + | ||
"ALLOCATED state, and then will wait for it to be OPEN", | ||
repConfig, se); | ||
List<Pipeline> allocatedPipelines = findPipelinesByState(repConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't appear as if the test code covers waiting for pipelines to open. Does it need to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean testing waitOnePipelineReady() method? This method involves the timer and multi thread, so I think it is not a good candidate for unit test. For integration test, I do not know how to set it up and test it through. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sokui i was thinking of something like this:
sokui/ozone@HDDS-5916-support-datanode-change-ip-hostname...GeorgeJahad:gbjAllocateTest2
Feel free to ignore it if you don't like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually do not use sleep() in unit tests, because it may cause the unit test unreliable. But for this test, we have no other better way to do it, and the sleep() here should not make the test case unstable. Let me include your commit. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the hard work you've put into this PR. We are very interested in it.
LGTM |
InetAddress dnAddress = Server.getRemoteIp(); | ||
if (dnAddress != null) { | ||
// Mostly called inside an RPC, update ip and peer hostname | ||
datanodeDetails.setHostName(dnAddress.getHostName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I delete this line, because these days, when I tested it, I found sometimes dnAddress.getHostName() returns IP instead of hostName, which makes the datanode restarting not work. Please let me know if it is OK to delete this line. @GeorgeJahad @adoroszlai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sokui @adoroszlai I'm nervous about removing the call to setHostName().
I just took around and it seems to get used in many places. I've included some below:
ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java
Line 259 in f57a019
key += pipeline.getClosestNode().getHostName(); |
ozone/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/DatanodeChunkGenerator.java
Line 171 in f57a019
if (datanodeHosts.contains(dn.getHostName())) { |
Line 576 in f57a019
hostList.add(dn.getHostName()); |
Why does restart not work when it returns the IP string instead of the host string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @GeorgeJahad ,
To not change the old code path, I added the if condition: when useHostname
is true, we will not setHostname, but when it is false (old code), we keep the old logic which set the hostname. There are two things I want to explain:
-
When datanode fist register with scm, the
datanodeDetails
already contains hostName. Here the code we are talking about is just to reset thedatanodeDetails.hostName
. So the code you listed above won't return null if we remove this line of codedatanodeDetails.setHostName(dnAddress.getHostName());
. -
Why it doesn't work when we reset
datanodeDetails.hostName
whenuseHostname
is true? this is because in k8s, when datanode first registered with scm,dnAddress.getHostName()
may return IP instead of hostName (maybe because of k8s DNS lookup service delay, I am not exactly sure). this will result in the IP instead of hostName is used in datanode Ratis communication for Pipelines. When datanode gets restarted with different IP, then the Ratis communication with old IP throws the HostNotFoundException. But if we remove this line, then we are sure that thedatanodeDetails.hostName
always contains the hostname instead of the IP. So it won't have the Ratis communication problem.
This is the whole story. That's why now I keep the old code path same, but if useHostname
is true, we won't do datanodeDetails.setHostName(dnAddress.getHostName());
in the register process. Please let me know if it makes sense to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When datanode fist register with scm, the datanodeDetails already contains hostName. Here the code we are talking about is just to reset the datanodeDetails.hostName.
If you are sure this is true, then I'm fine with the change.
To not change the old code path, I added the if condition: when useHostname is true, we will not setHostname, but when it is false (old code),
I'm confused about this statement. The old code path is when "(!isNodeRegistered(datanodeDetails))" is true, isn't it? not when "(!useHostname)" is true? what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old codepath means current master
, before this PR (or when DFS_DATANODE_USE_DN_HOSTNAME
is not enabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. The old path is the current master. So I made the following change:
from
if (dnAddress != null) {
// Mostly called inside an RPC, update ip and peer hostname
datanodeDetails.setHostName(dnAddress.getHostName());
...
}
To
if (dnAddress != null) {
// Mostly called inside an RPC, update ip and peer hostname
if (!useHostname) {
datanodeDetails.setHostName(dnAddress.getHostName());
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was saying is that even we delete this line datanodeDetails.setHostName(dnAddress.getHostName());
, it should be still fine because when datanode register with scm, the datanodeDetails already have the hostName info. But to be conservative, I just use the above logic to make sure when DFS_DATANODE_USE_DN_HOSTNAME
is not enabled, the code is the exactly same as before.
Added the unit test, and replied the comment. Please have another look. Thank you! |
I am looking into the verification failures. How can I run a .robot test case locally? For example, I just want to run /opt/hadoop/smoketest/recon/recon-api.robot this test case. @adoroszlai |
To run acceptance tests in a specific environment (replace mvn -DskipTests clean package
cd hadoop-ozone/dist/target/ozone-*-SNAPSHOT/compose/ozonesecure
./test.sh You can edit |
@adoroszlai I feel confused. After I pushed the last change (one line change), some test failed with unrelated problem. The test error line seems not match my code. Do you know what's going on? |
Hi @adoroszlai , When you have time, could you pls take a look at my above question? I think the failed tests in this PR are not relevant to my current code (the reported error lines do not match my code). If the testing relied on a wrong version of the code, could you pls re-trigger the testing? Thank you! |
…atanode-change-ip-hostname
@sokui Pull requests are built and tested as if the source branch (your code) was merged into the base branch ( The compile error in |
Nice. Seems all the tests get passed. Please let me know if we can merge it. Thank you! |
@sodonnel @nandakumar131 would you like to take a look? |
Thanks @sokui for the patch, @GeorgeJahad, @kerneltime, @Xushaohong for the review. |
What changes were proposed in this pull request?
make ozone support datanode change IPs and hostnames (as long as the uuid not change)
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5916
How was this patch tested?
Tested in k8s production with kerberos enabled. Each datanode is attached to a pvc. Ozone still works well after killing any number of the datanodes (Datanodes will be rescheduled with different IPs).