Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observe cluster state on health request #8350

Merged
merged 1 commit into from Nov 7, 2014

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Nov 5, 2014

Today we use busy waiting and sampling when we execute HealthReqeusts
on the master. This is tricky sicne we might sample a not yet fully applied
cluster state and make a decsions base on the partial cluster state. This can
lead to ugly problems since requests might be routed to nodes where shards are
already marked as relocated but on the actual cluster state they are still started.
Yet, this window is very small usually it can lead to ugly test failures.

This commit moves the health request over to a listener pattern that gets the actual
applied cluster state.

@s1monw
Copy link
Contributor Author

s1monw commented Nov 5, 2014

@bleskes here you go :)

}
if (System.currentTimeMillis() > endTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line of the method can be removed now - the endTime variable is not used.

        long endTime = System.currentTimeMillis() + request.timeout().millis();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this over to be entirely non-blocking now...

@s1monw
Copy link
Contributor Author

s1monw commented Nov 6, 2014

@bleskes I updated the PR as we discussed

@@ -241,15 +240,15 @@ public void onTimeout(TimeValue timeout) {
}
}

public interface Listener {
public static abstract class Listener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an added value to have this as an abstract class anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I will move back

@s1monw
Copy link
Contributor Author

s1monw commented Nov 6, 2014

pushed another round

@@ -90,14 +86,15 @@ public ClusterState execute(ClusterState currentState) {

@Override
public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) {
latch.countDown();
TimeValue newTimeout = TimeValue.timeValueMillis(Math.max(0, endTime - System.currentTimeMillis()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a time out of 0 is problematic in the InternalClusterService logic. We should protect for it, either in executeHealth or in the ClusterStateObserver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm but can't you set this via the API already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and the scheduler can take a value of 0 and the listener removal logic is also OK where postAdded and onTimeout are called in rapid succession. I still think the check for 0 here is clearer. Thanks for adding it.

@s1monw
Copy link
Contributor Author

s1monw commented Nov 6, 2014

pushed a check for timeout == 0 I think it's ready

@bleskes
Copy link
Contributor

bleskes commented Nov 6, 2014

Agreed. LGTM - thanks for cleaning this up.

@bleskes bleskes removed the review label Nov 6, 2014
Today we use busy waiting and sampling when we execute HealthReqeusts
on the master. This is tricky sicne we might sample a not yet fully applied
cluster state and make a decsions base on the partial cluster state. This can
lead to ugly problems since requests might be routed to nodes where shards are
already marked as relocated but on the actual cluster state they are still started.
Yet, this window is very small usually it can lead to ugly test failures.

This commit moves the health request over to a listener pattern that gets the actual
applied cluster state.

Closes elastic#8350
@s1monw s1monw merged commit cc8e8e6 into elastic:master Nov 7, 2014
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Nov 7, 2014
Today we use busy waiting and sampling when we execute HealthReqeusts
on the master. This is tricky sicne we might sample a not yet fully applied
cluster state and make a decsions base on the partial cluster state. This can
lead to ugly problems since requests might be routed to nodes where shards are
already marked as relocated but on the actual cluster state they are still started.
Yet, this window is very small usually it can lead to ugly test failures.

This commit moves the health request over to a listener pattern that gets the actual
applied cluster state.

Closes elastic#8350
@s1monw s1monw deleted the cluster_health_on_update branch November 7, 2014 10:24
@clintongormley clintongormley added :Core/Infra/Core Core issues without another label >enhancement labels Mar 19, 2015
@clintongormley clintongormley changed the title [STATE] Observe cluster state on health request Observe cluster state on health request Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants