-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#956] refactor: Changes the Boolean flag that determines whether a Node is healthy to a state #959
Conversation
proto/src/main/proto/Rss.proto
Outdated
@@ -251,7 +251,6 @@ message ShuffleServerHeartBeatRequest { | |||
int64 availableMemory = 4; | |||
int32 eventNumInFlush = 5; | |||
repeated string tags = 6; | |||
google.protobuf.BoolValue isHealthy = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't modify the field. You can add a comment // deprecated
. If you modify this, this will be incompatible feature. Although we don't release 1.0 version, we don't need to guarantee the compatibility, I still hope we reduce the similar breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll recover
ShuffleServerMetrics.gaugeIsHealthy.set(1); | ||
return; | ||
} | ||
} | ||
ShuffleServerMetrics.gaugeIsHealthy.set(0); | ||
isHealthy.set(true); | ||
if (shuffleServer.getServerStatus() == ServerStatus.DECOMMISSIONING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ShuffleServer. After decommission is called, there are a large number of application is not complete case, ServerStatus state is DECOMMISSIONING, but we can't change the state of ServerStatus health thread detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we assign DECOMMISSIONING
to it if its origin value is DECOMMISSIONING
? It seems that we don't change value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I made a mistake
@@ -226,7 +226,7 @@ private void initialization() throws Exception { | |||
if (healthCheckEnable) { | |||
List<Checker> builtInCheckers = Lists.newArrayList(); | |||
builtInCheckers.add(storageManager.getStorageChecker()); | |||
healthCheck = new HealthCheck(isHealthy, shuffleServerConf, builtInCheckers); | |||
healthCheck = new HealthCheck(this, shuffleServerConf, builtInCheckers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass ServerStatus as parameter. We can use AtomicReference as the type of ServerStatus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me revise it.
private final long checkIntervalMs; | ||
private final Thread thread; | ||
private volatile boolean isStop = false; | ||
private List<Checker> checkers = Lists.newArrayList(); | ||
|
||
public HealthCheck(AtomicBoolean isHealthy, ShuffleServerConf conf, List<Checker> buildInCheckers) { | ||
this.isHealthy = isHealthy; | ||
public HealthCheck(AtomicReference<ServerStatus> serverStatus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use style consistent with other places?
public HealthCheck(
AtomicReference<ServerStatus> serverStatus,
ShuffleServerConf conf,
List<Checker> buildInCheckers) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yl09099 Could you address this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indent should be 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, Have been changed
boolean isHealthy = true; | ||
if (request.hasIsHealthy()) { | ||
isHealthy = request.getIsHealthy().getValue(); | ||
/** | ||
* Compatible with isHealthy version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Compatible with isHealthy version | |
* Compatible with older version |
@@ -1,38 +1,26 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove the license header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this file, forgot to add it, let me add it
…ether a Node is healthy to a state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @yl09099
What changes were proposed in this pull request?
Change the ServerNode health status from the original Boolean judgment to the unhealthy state
Why are the changes needed?
Unhealthy states should not be isolated
Fix: #956
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing unit tests