Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting index statistics could contain failed shards #2210

Merged
merged 1 commit into from May 11, 2016
Merged

Getting index statistics could contain failed shards #2210

merged 1 commit into from May 11, 2016

Conversation

@kroepke
Copy link
Member

@kroepke kroepke commented May 10, 2016

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only.
The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster.

fixes #2194

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only.
The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster.

fixes #2194
@kroepke kroepke added this to the 2.0.1 milestone May 10, 2016
@dennisoelkers dennisoelkers self-assigned this May 10, 2016
} catch (NumberFormatException ex) {
LOG.debug("Couldn't extract index number from index name " + indexName, ex);
LOG.warn("Couldn't extract index number from index name " + indexName, ex);
}
}

This comment has been minimized.

@dennisoelkers

dennisoelkers May 10, 2016
Member

What about replacing this with a streams call like this:

        final Optional<Integer> highestIndexNumber = indexNames.stream()
            .filter(indexName -> !this.isGraylogDeflectorIndex(indexName))
            .map(Deflector::extractIndexNumber)
            .max(Integer::max);

Makes it easier to grasp what is done (at least for me).

This comment has been minimized.

@kroepke

kroepke May 10, 2016
Author Member

In theory I agree, but in this case we'd need to refactor extractIndexNumber as well and its callers, because it throws an exception, and it already felt I changed a lot of code already :(

How about making those changes on master?

This comment has been minimized.

@bernd bernd assigned bernd and unassigned dennisoelkers May 11, 2016
@bernd
Copy link
Member

@bernd bernd commented May 11, 2016

LGTM 👍

@bernd bernd merged commit bc6042c into 2.0 May 11, 2016
4 checks passed
4 checks passed
ci-server-integration Jenkins build graylog2-server-integration-pr 896 has succeeded
Details
ci-web-linter Jenkins build graylog-pr-linter-check 384 has succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@bernd bernd deleted the issue-2194 branch May 11, 2016
bernd added a commit that referenced this pull request May 11, 2016
During rotation and retention we requested all index statistics multiple times, which,
in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base
rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index
names including their aliases, reducing the usage of the expensive Index Statistics
to the indices page only.
The current rotation and retention strategies do not need to know all index statistics
which require to touch every single shard in the cluster.

Fixes #2194
(cherry picked from commit bc6042c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.