New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observe cluster state on health request #8350
Conversation
@bleskes here you go :) |
} | ||
if (System.currentTimeMillis() > endTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line of the method can be removed now - the endTime variable is not used.
long endTime = System.currentTimeMillis() + request.timeout().millis();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this over to be entirely non-blocking now...
c25bef8
to
b2c8288
Compare
@bleskes I updated the PR as we discussed |
@@ -241,15 +240,15 @@ public void onTimeout(TimeValue timeout) { | |||
} | |||
} | |||
|
|||
public interface Listener { | |||
public static abstract class Listener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is an added value to have this as an abstract class anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I will move back
pushed another round |
@@ -90,14 +86,15 @@ public ClusterState execute(ClusterState currentState) { | |||
|
|||
@Override | |||
public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { | |||
latch.countDown(); | |||
TimeValue newTimeout = TimeValue.timeValueMillis(Math.max(0, endTime - System.currentTimeMillis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a time out of 0 is problematic in the InternalClusterService logic. We should protect for it, either in executeHealth or in the ClusterStateObserver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm but can't you set this via the API already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked and the scheduler can take a value of 0 and the listener removal logic is also OK where postAdded and onTimeout are called in rapid succession. I still think the check for 0 here is clearer. Thanks for adding it.
pushed a check for |
Agreed. LGTM - thanks for cleaning this up. |
Today we use busy waiting and sampling when we execute HealthReqeusts on the master. This is tricky sicne we might sample a not yet fully applied cluster state and make a decsions base on the partial cluster state. This can lead to ugly problems since requests might be routed to nodes where shards are already marked as relocated but on the actual cluster state they are still started. Yet, this window is very small usually it can lead to ugly test failures. This commit moves the health request over to a listener pattern that gets the actual applied cluster state. Closes elastic#8350
df835d8
to
cc8e8e6
Compare
Today we use busy waiting and sampling when we execute HealthReqeusts on the master. This is tricky sicne we might sample a not yet fully applied cluster state and make a decsions base on the partial cluster state. This can lead to ugly problems since requests might be routed to nodes where shards are already marked as relocated but on the actual cluster state they are still started. Yet, this window is very small usually it can lead to ugly test failures. This commit moves the health request over to a listener pattern that gets the actual applied cluster state. Closes elastic#8350
Today we use busy waiting and sampling when we execute HealthReqeusts
on the master. This is tricky sicne we might sample a not yet fully applied
cluster state and make a decsions base on the partial cluster state. This can
lead to ugly problems since requests might be routed to nodes where shards are
already marked as relocated but on the actual cluster state they are still started.
Yet, this window is very small usually it can lead to ugly test failures.
This commit moves the health request over to a listener pattern that gets the actual
applied cluster state.