Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recovery progress (% recovered) to the cat API #10805

Closed
clintongormley opened this issue Apr 25, 2015 · 5 comments
Closed

Add recovery progress (% recovered) to the cat API #10805

clintongormley opened this issue Apr 25, 2015 · 5 comments
Assignees
Labels

Comments

@clintongormley
Copy link

It'd be useful to have a cat API which reports the percentage of shards recovered, to make it easy for users to assess how much longer recovery will take.

@clintongormley
Copy link
Author

Other useful data to display in the result include:

  • number of pending tasks in the queue
  • how long the first task in the queue has been waiting
  • cluster health

@jpountz jpountz removed the help wanted adoptme label May 22, 2015
@spinscale
Copy link
Contributor

A couple of APIs are already there to do similar tasks

  • /_cat/recovery/{index} has files_percent and bytes_percent per shard
  • /_cat/pending_tasks lists all pending tasks and thus also the first task in the queue
  • /_cat/health lists the health status in the status field

So, we need a new endpoint here. This basically is an endpoint to ask, when the full cluster will be ready or give you a very general overview. Something like /_cat/status (very generic though)

Proposed output format

# curl '127.0.0.1:9200/_cat/status?v'
health     pending_tasks   pending_task_time_in_queue     recovery_percent
green     1234                   423234                                       100.0%

@bleskes
Copy link
Contributor

bleskes commented May 27, 2015

This what we expose with GET _cluster/health for inspiration:

{
   "cluster_name": "boaz",
   "status": "yellow",
   "timed_out": false,
   "number_of_nodes": 1,
   "number_of_data_nodes": 1,
   "active_primary_shards": 2,
   "active_shards": 2,
   "relocating_shards": 0,
   "initializing_shards": 0,
   "unassigned_shards": 2,
   "number_of_pending_tasks": 0,
   "number_of_in_flight_fetch": 0
}

I also think every number that we feel the need to add that the proposed _cat/status api and isn't part of that list should be added to it. Concretely I hear pending_task_time_in_queue and recovery_percent . Also, if we add _cat/status I wonder if we should have a dedicated json _cluster/status. Maybe we should extend _cat/heath to give more.

@spinscale
Copy link
Contributor

@bleskes I tend to agree to extend the /_cat/health and /_cluster/health with the two required metrics instead of adding a new endpoint - I was not aware that the pending tasks are included in the cluster health

@bleskes
Copy link
Contributor

bleskes commented May 27, 2015

I was not aware that the pending tasks are included in the cluster health

Yeah, I got annoyed with having to grep pending tasks dumps a long time ago :)

spinscale added a commit to spinscale/elasticsearch that referenced this issue May 28, 2015
In order to get a quick overview using by simply checking the cluster state
and its corresponding cat API, the following two attributes have been added
to the cluster health response:

* pending_task_time_in_queue, the time value of the first task of the
  queue and how long it has been waiting
* recovery percent: The percentage of the number of shards that are in
  initializing state

This makes the cluster health API handy to check, when a fully restarted
cluster is back up and running.

In addition a small serialization fix has been added, which removes version
checks for the this branch in the ClusterHealthResponse.

Closes elastic#10805
spinscale added a commit to spinscale/elasticsearch that referenced this issue Jun 22, 2015
In order to get a quick overview using by simply checking the cluster state
and its corresponding cat API, the following two attributes have been added
to the cluster health response:

* task max waiting time, the time value of the first task of the
  queue and how long it has been waiting
* active shards percent: The percentage of the number of shards that are in
  initializing state

This makes the cluster health API handy to check, when a fully restarted
cluster is back up and running.

Closes elastic#10805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants