Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for locating unrecovered shard copies and their state #10952

Closed
clintongormley opened this issue May 4, 2015 · 15 comments
Closed

API for locating unrecovered shard copies and their state #10952

clintongormley opened this issue May 4, 2015 · 15 comments
Assignees
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >feature v2.0.0-beta1

Comments

@clintongormley
Copy link

During shard recovery, Elasticsearch reaches out to all the nodes to find out which nodes hold copies of the shard. It then chooses one copy and tries to recover it. Recovery may fail (eg if there is corruption).

It would be good to have an API which exposes this information, eg:

  • which nodes hold copies of the shard
  • how recent is each copy (ie which one is likely to be chosen)
  • has recovery of that copy failed (and the reason for that)
@clintongormley clintongormley added >feature help wanted adoptme :Data Management/Stats Statistics tracking and retrieval APIs labels May 4, 2015
@s1monw
Copy link
Contributor

s1monw commented May 22, 2015

I think this would be awesome to have. We already have most of the infrastructure to get this information. We can also make it relatively fast with our new async shard fetching classes org.elasticsearch.gateway.AsyncShardFetch together with TransportNodesListGatewayStartedShards We might need to enhance the response information a bit here but it can give use all it was asked for here in a fast way. We don't need to read store file metadata at all. @areek do you want to give this a try, we spoke about getting more into the core of the system and this seems like a rather easy start?

@s1monw s1monw added v2.0.0-beta1 and removed help wanted adoptme labels May 22, 2015
@clintongormley
Copy link
Author

@areek what information are you able to expose? I'm thinking that maybe we should add a new API like:

GET /{indices}/_shard_copies

@areek
Copy link
Contributor

areek commented Jun 8, 2015

Hey @clintongormley,

The API:

GET /{indices}/_unassigned_shards

GET /_unassigned_shards

(action name is up for debate)

The response will report the following on all copies of unassigned shards for specified indices:

  • node id of the host node
  • shard version of the copy
  • exception (if any) that has been encountered when trying to open the shard index. (will catch if the shard index has been corrupted, etc.)
{
  "indices": {
    "<index>": {
      "shards": {
        "<shard_id>": [
          {
            "node": "UcZRcveHSjyfiN0rHalDsQ",
            "version": 6,
            "exception": {..}
          },
          {
            ....
          }
        ],
        "<shard_id>": [
          ....
        ]
      }
    }
  }
}

Thoughts?
PR: #11545

@bleskes
Copy link
Contributor

bleskes commented Jun 9, 2015

nice. I would call this "_list_stores" or something similar (the unassigned name means shards for which we have no store available or are throttled for being assigned). Question-- what is the exception? I presume it's a corruption marker? If so call it corruption?

@s1monw
Copy link
Contributor

s1monw commented Jun 9, 2015

@bleskes there is a PR linked #11545

@areek
Copy link
Contributor

areek commented Jun 10, 2015

@bleskes thanks for the suggestion. "_list_stores" seem to imply that the API will list all the shard stores for the indices rather than only list stores of the copies of the currently unassigned shards? But I can not think of a better name though :). The exception is thrown when trying to open the shard index, can be caused by corruption marker or if the segment infos file can not be read. Maybe it still makes sense to call it corruption?

@bleskes
Copy link
Contributor

bleskes commented Jun 10, 2015

@areek I looked at the implementation and I'm not happy with _list_stores either, mainly because we use the TransportNodesListGatewayStartedShards and not the TransportNodesListShardStoreMetaData (which is good, but the naming will confusing). I wonder if we should just rename TransportNodesListGatewayStartedShards to TransportNodesListShardStores . Will think about this some more.

re: exception vs corruption - I think we should call it corruption and also make Store#canOpenIndex leave a corruption marker (as another change). If we can't open an index, for what ever reason, I think it is safe to mark it as such. @s1monw thoughts?

In general it fills to mee better to have API list unassigned shards by default but have a parameter to make list everything the nodes have . Should be very simple to implement and will give us an alternative to telling customers to grep their disks.

@s1monw
Copy link
Contributor

s1monw commented Jun 11, 2015

re: exception vs corruption - I think we should call it corruption and also make Store#canOpenIndex leave a corruption marker (as another change). If we can't open an index, for what ever reason, I think it is safe to mark it as such. @s1monw thoughts?

well we can also have exceptions if there is no index there for instance which is not a corruption?

@areek
Copy link
Contributor

areek commented Jun 12, 2015

We could expose this as a _list_stores or _list_shard_stores API that has a query parameter state to pass in the shard routing states to filter out shards. By default, it will only report for the shards that are unassigned.

# lists all shards in unassigned state
GET /_list_stores

# lists all {index} shards
GET /{index}/_list_stores?state=_all

Thoughts?

@bleskes
Copy link
Contributor

bleskes commented Jun 12, 2015

well we can also have exceptions if there is no index there for instance which is not a corruption?

I hear you see and see the distinction but I think it's OK to call this corruption (index is gone, is it distinguishable from a missing segment_n file?). All I was going at is that we need another name then a generic exception because currrently it's confusing - it's hard to telll if the operation has failed (ES/API exception) or the shard is bad (and I know we are adding a failures section, I think that's still confusing).

GET /{index}/_list_stores?state=_all

_list_shard_stores is maybe even better. I would suggest shards=_all and the default would be shards=_unassigned . @clintongormley how do you feel about these namings?

@clintongormley
Copy link
Author

I don't like including _list because that's implied with the GET. We have a _search_shards end point. I'm wondering if this should just be:

GET {index}/_shards

The state filter should use unassigned, relocating, started etc, rather than _unassigned, and it should accept a comma-separated list.

I think the default should actually be all, instead of unassigned. Of course the response should include the shard state.

Do we need to filter by node? I was considering whether this should be a _node API, but leaning towards no. We could add a nodes param to allow filtering on a node spec, perhaps?

@areek
Copy link
Contributor

areek commented Jun 12, 2015

I'm wondering if this should just be:

GET {index}/_shards

The state filter should use unassigned, relocating, started etc, rather than _unassigned, and it should accept a comma-separated list.
I think the default should actually be all, instead of unassigned. Of course the response should include the shard state.

IMO it makes the purpose of the API clear. Locate all shards and their state meta-data for some index/indices.
The response format can be:

{
  "indices": {
    "<index>": {
      "shards": {
        "<shard_id>": [
          {
            "node": "UcZRcveHSjyfiN0rHalDsQ",
            "version": 6,
            "state": UNASSIGNED | STARTED | INITIALIZING | RELOCATING
            "exception": {..}
          },
          {
            ....
          }
        ],
        "<shard_id>": [
          ....
        ]
      }
    }
  }
}

Thoughts?

Do we need to filter by node? I was considering whether this should be a _node API, but leaning towards no. We could add a nodes param to allow filtering on a node spec, perhaps?

We currently filter by indices not nodes. Though possible, I think adding a nodes param would be confusing. The API should filter on either indices or nodes but not both.

well we can also have exceptions if there is no index there for instance which is not a corruption?

@s1monw currently this is indicated by version being -1 for the shard in the response, should it be reported under exception instead? I can't think of a way to distinguish between shards that have failed in a node (no ShardStateMetaData) and shards that never existed. Thoughts?

@clintongormley
Copy link
Author

"state": UNASSIGNED | STARTED | INITIALIZING | RELOCATING

lgtm

We currently filter by indices not nodes. Though possible, I think adding a nodes param would be confusing. The API should filter on either indices or nodes but not both.

Nodes and indices are two ways of looking at the same data. eg we have indices stats, and we have nodes-indices stats (which show index stats per node). But agreed that it is a nice to have, most of the time we're interested in viewing this data by index.

@bleskes
Copy link
Contributor

bleskes commented Jun 15, 2015

GET {index}/_shards

My concern with just shards is that it's not clear we're going to disk and reporting what we found. It has nothing to do with the current state of shards as far as the cluster is concerned (at list as implemented now) - we don't care if shards are relocating, available for search etc. We need a name the reflect this (or change the API to do more in the future). Maybe something like

GET {indices}/_shards

returns:

{
   "index": {
       "1": {
              "routing": [{ ... extracted from the cluster state ... }]
              "stores": [{ ... this is where we put the current output ... }]
        }
  }

That said, something simple like {indices}/_stores (or something else that gives you what we return now) has my preference, assuming we can find a good name.

We currently filter by indices not nodes. Though possible, I think adding a nodes param would be confusing. The API should filter on either indices or nodes but not both.

+1 . If we need a node oriented API, we can add that as well later on.

"node": "UcZRcveHSjyfiN0rHalDsQ",

Can we use a DiscoveryNode instead of just an id? id's are hard to resolve and don't mean much on their own (they are random).

"state": UNASSIGNED | STARTED | INITIALIZING | RELOCATING

These are shard routings values. Things like initializing and relocating are a bit confusing imho. I think a simpler message is whether the store is "used" or not. I would use "state": "unassigned|assigned" .

I can't think of a way to distinguish between shards that have failed in a node (no ShardStateMetaData) and shards that never existed

I confused here - shard that never existed will not have an entry right? shards where the ShardStateMetaData is corrupted will only have an exception (or corruption ) and shard where ShardStateMetaData is OK but the lucene index is bad will have both. I feel like I miss something...

@clintongormley
Copy link
Author

Rethinking this API a bit. When would you need it? The typical use case is: I have unassigned primary shards and I want to know if copies of the shard exist somewhere in my cluster and whether they can recover or not. Replicas don't matter because they can recover from the primary.

So what I want to see are a list or shards which could become the primary, where they are, and if there has been a problem trying to open the shard already. So perhaps we can drop the assigned shards completely and make this an API for unassigned primaries only.

What about:

GET {index}/_primary_candidates

And we can drop the state key as well.

areek added a commit to areek/elasticsearch that referenced this issue Jul 1, 2015
…atuses of their copies in the cluster for shard recovery.

This PR adds an API that reports the nodes that hold copies of unassigned shards for specific indices, the shard versions
(which indicate how recent the copy is) and any exceptions encountered while trying to open the shard indices.
The action backing the API is implemented as a master read operation, reading the list of unassigned shards for specified
indicies from the cluster state and then fetching shard metadata from all the nodes in the cluster.

closes elastic#10952
@areek areek closed this as completed in 7a21d84 Jul 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >feature v2.0.0-beta1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants