Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-5916. Add IP check when DN register to SCM #2802

Closed
wants to merge 3 commits into from

Conversation

Xushaohong
Copy link
Contributor

What changes were proposed in this pull request?

SCM doesn't upgrade DN NodeInfo if any DN restarts.
This causes the issue described in the JIRA, which in k8s env would lead to unavailable cluster conditions.
Thus, we should add this case to upgrade the LastknownIpAddress of Datanodes to NodeMap when they register again.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5916

How was this patch tested?

Manual Test on K8s

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Question for already created pipelines, will there be any issue/how is it handled?

@Xushaohong
Copy link
Contributor Author

The old pipelines would get stuck in leader election, timeout and then get closed for some DN being candidate too long.
I think this is acceptable to wait for a transition time to create new pipelines. @bharatviswa504

@Xushaohong Xushaohong force-pushed the HDDS-5916 branch 2 times, most recently from e59fb4d to a77b09f Compare November 5, 2021 02:38
@Xushaohong
Copy link
Contributor Author

Xushaohong commented Nov 5, 2021

This fix leaves the defect that possibly there is still one or two pipelines not working in my K8S env, the raft peer info among pipeline raft group is latest according to the log. I have 156 pipelines in total, but only one is unstable, which is quite tricky, as most newly-created pipelines work fine. I've tried some times to redeploy the K8S env and this problem still happens. And sometimes this strange phenomenon could recover by itself.
截屏2021-11-05 下午2 58 02

@techwhizbang
Copy link

@Xushaohong re:

The old pipelines would get stuck in leader election, timeout and then get closed for some DN being candidate too long.
I think this is acceptable to wait for a transition time to create new pipelines.

Does this mean that eventually after some timeout that the leader election is resolved? Or does it perpetuate for any previously created pipelines?

@Xushaohong
Copy link
Contributor Author

Xushaohong commented Nov 6, 2021

@Xushaohong re:

The old pipelines would get stuck in leader election, timeout and then get closed for some DN being candidate too long.
I think this is acceptable to wait for a transition time to create new pipelines.

Does this mean that eventually after some timeout that the leader election is resolved? Or does it perpetuate for any previously created pipelines?

The previously created pipelines would be closed after timeout, and new pipelines, most of them would work normally, except one or two still dangling. @techwhizbang

@adoroszlai
Copy link
Contributor

I think this problem is more completely addressed in #3186.

@Xushaohong
Copy link
Contributor Author

I think this problem is more completely addressed in #3186.

Thx for this information! I will close this PR.

@Xushaohong Xushaohong closed this May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants