Getting index statistics could contain failed shards #2210

kroepke · 2016-05-10T12:35:48Z

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts.
This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early.

This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only.
The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster.

fixes #2194

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts. This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early. This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only. The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster. fixes #2194

dennisoelkers · 2016-05-10T13:52:49Z

graylog2-server/src/main/java/org/graylog2/indexer/Deflector.java

            } catch (NumberFormatException ex) {
-                LOG.debug("Couldn't extract index number from index name " + indexName, ex);
+                LOG.warn("Couldn't extract index number from index name " + indexName, ex);
            }
        }



What about replacing this with a streams call like this:

final Optional<Integer> highestIndexNumber = indexNames.stream() .filter(indexName -> !this.isGraylogDeflectorIndex(indexName)) .map(Deflector::extractIndexNumber) .max(Integer::max);

Makes it easier to grasp what is done (at least for me).

In theory I agree, but in this case we'd need to refactor extractIndexNumber as well and its callers, because it throws an exception, and it already felt I changed a lot of code already :(

How about making those changes on master?

bernd · 2016-05-11T11:33:48Z

LGTM 👍

During rotation and retention we requested all index statistics multiple times, which, in a overloaded cluster, could lead to shard failures due to timeouts. This failure wasn't logged and could lead to using the wrong (older) index to base rotation decisions on, effectively rotating indices too early. This change makes Graylog use a more lightweight API to determine all index names including their aliases, reducing the usage of the expensive Index Statistics to the indices page only. The current rotation and retention strategies do not need to know all index statistics which require to touch every single shard in the cluster. Fixes #2194 (cherry picked from commit bc6042c)

kroepke added the ready-for-review label May 10, 2016

kroepke added this to the 2.0.1 milestone May 10, 2016

dennisoelkers self-assigned this May 10, 2016

dennisoelkers reviewed May 10, 2016
View reviewed changes

bernd assigned bernd and unassigned dennisoelkers May 11, 2016

bernd merged commit bc6042c into 2.0 May 11, 2016

bernd deleted the issue-2194 branch May 11, 2016 11:36

dependabot-preview bot mentioned this pull request May 3, 2019

Update eslint-plugin-react requirement from 7.12.4 to 7.13.0 in /graylog2-web-interface/packages/eslint-config-graylog #5926

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting index statistics could contain failed shards #2210

Getting index statistics could contain failed shards #2210

kroepke commented May 10, 2016

dennisoelkers May 10, 2016

kroepke May 10, 2016

dennisoelkers May 11, 2016

bernd commented May 11, 2016

Getting index statistics could contain failed shards #2210

Getting index statistics could contain failed shards #2210

Conversation

kroepke commented May 10, 2016

dennisoelkers May 10, 2016

Choose a reason for hiding this comment

kroepke May 10, 2016

Choose a reason for hiding this comment

dennisoelkers May 11, 2016

Choose a reason for hiding this comment

bernd commented May 11, 2016