Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting index statistics could contain failed shards #2210

Merged
merged 1 commit into from May 11, 2016
Merged

Getting index statistics could contain failed shards #2210

merged 1 commit into from May 11, 2016

Conversation

kroepke
Copy link
Member

@kroepke kroepke commented May 10, 2016

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only.
The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster.

fixes #2194

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only.
The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster.

fixes #2194
@kroepke kroepke added this to the 2.0.1 milestone May 10, 2016
@dennisoelkers dennisoelkers self-assigned this May 10, 2016
} catch (NumberFormatException ex) {
LOG.debug("Couldn't extract index number from index name " + indexName, ex);
LOG.warn("Couldn't extract index number from index name " + indexName, ex);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about replacing this with a streams call like this:

        final Optional<Integer> highestIndexNumber = indexNames.stream()
            .filter(indexName -> !this.isGraylogDeflectorIndex(indexName))
            .map(Deflector::extractIndexNumber)
            .max(Integer::max);

Makes it easier to grasp what is done (at least for me).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory I agree, but in this case we'd need to refactor extractIndexNumber as well and its callers, because it throws an exception, and it already felt I changed a lot of code already :(

How about making those changes on master?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bernd bernd assigned bernd and unassigned dennisoelkers May 11, 2016
@bernd
Copy link
Member

bernd commented May 11, 2016

LGTM 👍

@bernd bernd merged commit bc6042c into 2.0 May 11, 2016
@bernd bernd deleted the issue-2194 branch May 11, 2016 11:36
bernd pushed a commit that referenced this pull request May 11, 2016
During rotation and retention we requested all index statistics multiple times, which,
in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base
rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index
names including their aliases, reducing the usage of the expensive Index Statistics
to the indices page only.
The current rotation and retention strategies do not need to know all index statistics
which require to touch every single shard in the cluster.

Fixes #2194
(cherry picked from commit bc6042c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants