Add an API to locate unrecovered shards and their state #11545

areek · 2015-06-08T22:41:20Z

This API provides store information for shard copies of indices.
Store information reports on which nodes shard copies exist, the shard
copy version, indicating how recent they are, and any exceptions
encountered while opening the shard index or from earlier engine failure.

By default, only lists store information for shards that have at least one
unallocated copy. When the cluster health status is yellow, this will list
store information for shards that have at least one unassigned replica.
When the cluster health status is red, this will list store information
for shards, which has unassigned primaries.

Endpoints include shard stores information for a specific index, several
indices, or all:

curl -XGET 'http://localhost:9200/test/_shard_stores'
curl -XGET 'http://localhost:9200/test1,test2/_shard_stores'
curl -XGET 'http://localhost:9200/_shard_stores'

The scope of shards to list store information can be changed through
status param. Defaults to 'yellow' and 'red'. 'yellow' lists store information of
shards with at least one unassigned replica and 'red' for shards with unassigned
primary shard.
Use 'green' to list store information for shards with all assigned copies.

curl -XGET 'http://localhost:9200/_shard_stores?status=green'

Response:

The shard stores information is grouped by indices and shard ids.

{
    ...
   "0": { <1>
        "stores": [ <2>
            {
                "sPa3OgxLSYGvQ4oPs-Tajw": { <3>
                    "name": "node_t0",
                    "transport_address": "local[1]",
                    "attributes": {
                        "enable_custom_paths": "true",
                        "mode": "local"
                    }
                },
                "version": 4, <4>
                "allocation" : "primary" | "replica" | "unused", <6>
                "store_exception": ... <5>
            },
            ...
        ]
   },
    ...
}

<1> The key is the corresponding shard id for the store information
<2> A list of store information for all copies of the shard
<3> The node information that hosts a copy of the store, the key
is the unique node id.
<4> The version of the store copy
<5> The status of the store copy, whether it is used as a
primary, replica or not used at all
<6> Any exception encountered while opening the shard index or
from earlier engine failure

closes #10952

areek · 2015-06-08T22:47:41Z

@s1monw This is still a WIP in terms of documentation and testing, would appreciate a review.

s1monw · 2015-06-09T19:02:17Z

...va/org/elasticsearch/action/admin/indices/shards/TransportIndicesUnassignedShardsAction.java

+            this.metaData = metaData;
+            this.listener = listener;
+            this.expectedOps = expectedOps;
+            this.opsCount = new AtomicInteger(0);


We have a class called CountDown.java for this - check it out it might make things simpler her

Thanks for the pointer, switched to using CountDown

s1monw · 2015-06-09T19:10:17Z

wow @areek this looks pretty awesome. I left some comments

areek · 2015-06-10T03:57:56Z

Thanks for the review @s1monw! Addressed all the comments.

Was wondering if there are any tests that I can look at to get the cluster to have a bunch of unassigned nodes (currently just stoping random nodes)?
I see CorruptedFileTest#corruptRandomPrimaryFile, was wondering if there is a easier way to test out the corruption exception for the response?

bleskes · 2015-06-10T07:13:18Z

...va/org/elasticsearch/action/admin/indices/shards/TransportIndicesUnassignedShardsAction.java

+        Set<String> requestedIndices = new HashSet<>();
+        requestedIndices.addAll(Arrays.asList(request.indices()));
+        List<ShardId> shardIdsToFetch = new ArrayList<>();
+        for (MutableShardRouting shard : Iterables.concat(routingNodes.unassigned(), routingNodes.ignoredUnassigned())) {


you should be able to use state.routingTable().shardsWithState(ShardRoutingState.UNASSIGNED);

changed to using state.routingTable().shardsWithState(ShardRoutingState.UNASSIGNED)

bleskes · 2015-06-10T07:39:37Z

@areek I think this as awesome api. Left some comments here and there. I think we need to beef up the tests to check for the actual content for the shard responses (check it finds stuff and check that it detects corruption etc.) . I'll respond to the naming part on the ticket..

bleskes · 2015-06-11T07:30:51Z

...va/org/elasticsearch/action/admin/indices/shards/TransportIndicesUnassignedShardsAction.java

+                    indexShardsBuilder.put(res.shardId.id(), shardStatuses);
+                    shardsResponseBuilder.put(res.shardId.getIndex(), indexShardsBuilder.build());
+                    for (FailedNodeException failure : res.failures) {
+                        failureBuilder.add(failure);


I think we loose the information about which shard has failed. Should we wrap it in DefaultShardOperationFailedException ?

bleskes · 2015-06-11T07:34:03Z

@areek change looks good. Did you see my comment about beefing up the testing?

areek · 2015-06-23T02:09:20Z

@bleskes @clintongormley, I have updated the description with the new API, thoughts? It turned out to be a bit different from what we have discussed before, in terms of default behaviour. It would be good to have this reviewed.

bleskes · 2015-07-14T11:07:37Z

.../src/main/java/org/elasticsearch/action/admin/indices/shards/IndicesShardsStoresRequest.java

+    /**
+     * Status used to choose shards to get store information on
+     */
+    public enum Status {


do we still need this? Can't we use ClusterHealthStatus ? now that we have the EnumSet , we don't need ALL anymore...

Thanks for the suggestion, we now use ClusterHealthStatus

bleskes · 2015-07-14T13:17:36Z

I left some final minor comments. I think we are getting close!

areek · 2015-07-14T17:34:06Z

@bleskes Thanks for the review, addressed all your comments

bleskes · 2015-07-15T15:18:36Z

...ain/java/org/elasticsearch/action/admin/indices/shards/IndicesShardsStoreRequestBuilder.java

@@ -0,0 +1,62 @@
+/*


Left over IndicesShard_S_ ..

bleskes · 2015-07-15T15:44:58Z

Left some final suggestions. Thx @areek

areek · 2015-07-15T18:07:45Z

@bleskes Thanks for the review, updated the PR addressing all your comments.

bleskes · 2015-07-16T09:03:42Z

LGTM. Left some very minor comment. Thx for all the hard word 👍

areek · 2015-07-16T22:43:50Z

merged to master 7a21d84

areek added >feature v2.0.0-beta1 WIP :Data Management/Stats Statistics tracking and retrieval APIs labels Jun 8, 2015

areek mentioned this pull request Jun 8, 2015

API for locating unrecovered shard copies and their state #10952

Closed

areek force-pushed the enhancement/10952 branch from a3e12a6 to 1a2fac9 Compare June 8, 2015 22:50

s1monw reviewed Jun 9, 2015
View reviewed changes

areek force-pushed the enhancement/10952 branch from 1a2fac9 to 426b6b5 Compare June 10, 2015 02:49

areek force-pushed the enhancement/10952 branch from 742b0ef to 1894bdc Compare June 10, 2015 05:57

bleskes reviewed Jun 10, 2015
View reviewed changes

bleskes reviewed Jun 11, 2015
View reviewed changes

areek force-pushed the enhancement/10952 branch 2 times, most recently from 3848ce1 to 9f8d224 Compare June 23, 2015 01:56

bleskes reviewed Jul 14, 2015
View reviewed changes

incorporate another round of feedback

c0302c4

areek force-pushed the enhancement/10952 branch from 65262dc to c0302c4 Compare July 14, 2015 17:32

bleskes reviewed Jul 15, 2015
View reviewed changes

incorporate feedback

9b540f2

areek force-pushed the enhancement/10952 branch from bd78872 to 9b540f2 Compare July 15, 2015 18:15

areek closed this Jul 16, 2015

clintongormley added the release highlight label Jul 18, 2015

ppf2 mentioned this pull request Aug 7, 2015

API to return info/details on why shards are in unassigned state #9471

Closed

ppf2 mentioned this pull request Nov 6, 2015

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an API to locate unrecovered shards and their state #11545

Add an API to locate unrecovered shards and their state #11545

areek commented Jun 8, 2015

areek commented Jun 8, 2015

s1monw Jun 9, 2015

areek Jun 10, 2015

s1monw commented Jun 9, 2015

areek commented Jun 10, 2015

bleskes Jun 10, 2015

areek Jun 10, 2015

bleskes commented Jun 10, 2015

bleskes Jun 11, 2015

bleskes commented Jun 11, 2015

areek commented Jun 23, 2015

bleskes Jul 14, 2015

areek Jul 14, 2015

bleskes commented Jul 14, 2015

areek commented Jul 14, 2015

bleskes Jul 15, 2015

bleskes commented Jul 15, 2015

areek commented Jul 15, 2015

bleskes commented Jul 16, 2015

areek commented Jul 16, 2015

Add an API to locate unrecovered shards and their state #11545

Add an API to locate unrecovered shards and their state #11545

Conversation

areek commented Jun 8, 2015

areek commented Jun 8, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Jun 9, 2015

areek commented Jun 10, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Jun 10, 2015

Choose a reason for hiding this comment

bleskes commented Jun 11, 2015

areek commented Jun 23, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Jul 14, 2015

areek commented Jul 14, 2015

Choose a reason for hiding this comment

bleskes commented Jul 15, 2015

areek commented Jul 15, 2015

bleskes commented Jul 16, 2015

areek commented Jul 16, 2015