New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting index statistics could contain failed shards #2210
Conversation
During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts. This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early. This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only. The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster. fixes #2194
} catch (NumberFormatException ex) { | ||
LOG.debug("Couldn't extract index number from index name " + indexName, ex); | ||
LOG.warn("Couldn't extract index number from index name " + indexName, ex); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about replacing this with a streams call like this:
final Optional<Integer> highestIndexNumber = indexNames.stream()
.filter(indexName -> !this.isGraylogDeflectorIndex(indexName))
.map(Deflector::extractIndexNumber)
.max(Integer::max);
Makes it easier to grasp what is done (at least for me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory I agree, but in this case we'd need to refactor extractIndexNumber as well and its callers, because it throws an exception, and it already felt I changed a lot of code already :(
How about making those changes on master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
LGTM 👍 |
During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts. This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early. This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only. The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster. Fixes #2194 (cherry picked from commit bc6042c)
During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early.
This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only.
The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster.
fixes #2194