Recovery API

Adds a new API endpoint at /_recovery as well as to the Java API. The recovery API allows one to see the recovery status of all shards in the cluster. It will report on percent complete, recovery type, and which files are copied. Closes #4637
elastic · Mar 20, 2014 · 91627fa · 91627fa
1 parent 6977479
commit 91627fa
Show file tree

Hide file tree

Showing 35 changed files with 2,448 additions and 501 deletions.
diff --git a/docs/reference/indices.asciidoc b/docs/reference/indices.asciidoc
@@ -46,6 +46,7 @@ and warmers.
 * <<indices-status>>
 * <<indices-stats>>
 * <<indices-segments>>
+* <<indices-recovery>>
 
 [float]
 [[status-management]]
@@ -94,6 +95,8 @@ include::indices/stats.asciidoc[]
 
 include::indices/segments.asciidoc[]
 
+include::indices/recovery.asciidoc[]
+
 include::indices/clearcache.asciidoc[]
 
 include::indices/flush.asciidoc[]

diff --git a/docs/reference/indices/recovery.asciidoc b/docs/reference/indices/recovery.asciidoc
@@ -0,0 +1,194 @@
+[[indices-recovery]]
+== Indices Recovery
+
+The indices recovery API provides insight into on-going shard recoveries.
+Recovery status may be reported for specific indices, or cluster-wide.
+
+For example, the following command would show recovery information for the indices "index1" and "index2".
+
+[source,js]
+--------------------------------------------------
+curl -XGET http://localhost:9200/index1,index2/_recovery?pretty=true
+--------------------------------------------------
+
+To see cluster-wide recovery status simply leave out the index names.
+
+[source,js]
+--------------------------------------------------
+curl -XGET http://localhost:9200/_recovery?pretty=true
+--------------------------------------------------
+
+Response:
+
+[source,js]
+--------------------------------------------------
+{
+  "index1" : {
+    "shards" : [ {
+      "id" : 0,
+      "type" : "snapshot",
+      "stage" : "index",
+      "primary" : true,
+      "start_time" : "2014-02-24T12:15:59.716",
+      "stop_time" : 0,
+      "total_time_in_millis" : 175576,
+      "source" : {
+        "repository" : "my_repository",
+        "snapshot" : "my_snapshot",
+        "index" : "index1"
+      },
+      "target" : {
+        "id" : "ryqJ5lO5S4-lSFbGntkEkg",
+        "hostname" : "my.fqdn",
+        "ip" : "10.0.1.7",
+        "name" : "my_es_node"
+      },
+      "index" : {
+        "files" : {
+          "total" : 73,
+          "reused" : 0,
+          "recovered" : 69,
+          "percent" : "94.5%"
+        },
+        "bytes" : {
+          "total" : 79063092,
+          "reused" : 0,
+          "recovered" : 68891939,
+          "percent" : "87.1%"
+        },
+        "total_time_in_millis" : 0
+      },
+      "translog" : {
+        "recovered" : 0,
+        "total_time_in_millis" : 0
+      },
+      "start" : {
+        "check_index_time" : 0,
+        "total_time_in_millis" : 0
+      }
+    } ]
+  }
+}
+--------------------------------------------------
+
+The above response shows a single index recovering a single shard. In this case, the source of the recovery is a snapshot repository
+and the target of the recovery is the node with name "my_es_node".
+
+Additionally, the output shows the number and percent of files recovered, as well as the number and percent of bytes recovered.
+
+In some cases a higher level of detail may be preferable. Setting "detailed=true" will present a list of physical files in recovery.
+
+[source,js]
+--------------------------------------------------
+curl -XGET http://localhost:9200/_recovery?pretty=true&detailed=true
+--------------------------------------------------
+
+Response:
+
+[source,js]
+--------------------------------------------------
+{
+  "index1" : {
+    "shards" : [ {
+      "id" : 0,
+      "type" : "gateway",
+      "stage" : "done",
+      "primary" : true,
+      "start_time" : "2014-02-24T12:38:06.349",
+      "stop_time" : "2014-02-24T12:38:08.464",
+      "total_time_in_millis" : 2115,
+      "source" : {
+        "id" : "RGMdRc-yQWWKIBM4DGvwqQ",
+        "hostname" : "my.fqdn",
+        "ip" : "10.0.1.7",
+        "name" : "my_es_node"
+      },
+      "target" : {
+        "id" : "RGMdRc-yQWWKIBM4DGvwqQ",
+        "hostname" : "my.fqdn",
+        "ip" : "10.0.1.7",
+        "name" : "my_es_node"
+      },
+      "index" : {
+        "files" : {
+          "total" : 26,
+          "reused" : 26,
+          "recovered" : 26,
+          "percent" : "100.0%",
+          "details" : [ {
+            "name" : "segments.gen",
+            "length" : 20,
+            "recovered" : 20
+          }, {
+            "name" : "_0.cfs",
+            "length" : 135306,
+            "recovered" : 135306
+          }, {
+            "name" : "segments_2",
+            "length" : 251,
+            "recovered" : 251
+          },
+           ...
+          ]
+        },
+        "bytes" : {
+          "total" : 26001617,
+          "reused" : 26001617,
+          "recovered" : 26001617,
+          "percent" : "100.0%"
+        },
+        "total_time_in_millis" : 2
+      },
+      "translog" : {
+        "recovered" : 71,
+        "total_time_in_millis" : 2025
+      },
+      "start" : {
+        "check_index_time" : 0,
+        "total_time_in_millis" : 88
+      }
+    } ]
+  }
+}
+--------------------------------------------------
+
+This response shows a detailed listing (truncated for brevity) of the actual files recovered and their sizes.
+
+Also shown are the timings in milliseconds of the various stages of recovery: index retrieval, translog replay, and index start time.
+
+Note that the above listing indicates that the recovery is in stage "done". All recoveries, whether on-going or complete, are kept in
+cluster state and may be reported on at any time. Setting "active_only=true" will cause only on-going recoveries to be reported.
+
+Here is a complete list of options:
+
+[horizontal]
+`detailed`::        Display a detailed view. This is primarily useful for viewing the recovery of physical index files. Default: false.
+`active_only`::     Display only those recoveries that are currently on-going. Default: false.
+
+Description of output fields:
+
+[horizontal]
+`id`::              Shard ID
+`type`::            Recovery type:
+                        * gateway
+                        * snapshot
+                        * replica
+                        * relocating
+`stage`::           Recovery stage:
+                        * init:     Recovery has not started
+                        * index:    Reading index meta-data and copying bytes from source to destination
+                        * start:    Starting the engine; opening the index for use
+                        * translog: Replaying transaction log
+                        * finalize: Cleanup
+                        * done:     Complete
+`primary`::         True if shard is primary, false otherwise
+`start_time`::      Timestamp of recovery start
+`stop_time`::       Timestamp of recovery finish
+`total_time_in_millis`::    Total time to recover shard in milliseconds
+`source`::          Recovery source:
+                        * repository description if recovery is from a snapshot
+                        * description of source node otherwise
+`target`::          Destination node
+`index`::           Statistics about physical index recovery
+`translog`::        Statistics about translog recovery
+`start`::           Statistics about time to open and start the index
diff --git a/rest-api-spec/api/cat.recovery.json b/rest-api-spec/api/cat.recovery.json
@@ -17,10 +17,6 @@
           "description" : "The unit in which to display byte values",
           "options": [ "b", "k", "m", "g" ]
         },
-        "local": {
-          "type" : "boolean",
-          "description" : "Return local information, do not retrieve the state from master node (default: false)"
-        },
         "master_timeout": {
           "type" : "time",
           "description" : "Explicit operation timeout for connection to master node"

diff --git a/rest-api-spec/api/indices.recovery.json b/rest-api-spec/api/indices.recovery.json
@@ -0,0 +1,34 @@
+{
+    "indices.recovery" : {
+        "documentation": "http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/indices-recovery.html",
+        "methods": ["GET"],
+        "url": {
+            "path": "/_recovery",
+            "paths": ["/_recovery", "/{index}/_recovery"],
+            "parts": {
+                "index": {
+                    "type" : "list",
+                    "description" : "A comma-separated list of index names; use `_all` or empty string to perform the operation on all indices"
+                }
+            },
+            "params": {
+                "detailed" : {
+                    "type": "boolean",
+                    "description": "Whether to display detailed information about shard recovery",
+                    "default": false
+                },
+                "active_only" : {
+                    "type": "boolean",
+                    "description": "Display only those recoveries that are currently on-going",
+                    "default": false
+                },
+                "human": {
+                    "type": "boolean",
+                    "description": "Whether to return time and byte values in human-readable format.",
+                    "default": false
+                }
+            }
+        },
+        "body": null
+    }
+}
diff --git a/rest-api-spec/test/cat.recovery/10_basic.yaml b/rest-api-spec/test/cat.recovery/10_basic.yaml
@@ -0,0 +1,26 @@
+---
+"Test cat recovery output":
+
+  - do:
+      cat.recovery: {}
+
+  - match:
+      $body: >
+               /^$/
+
+  - do:
+      index:
+        index:  index1
+        type:   type1
+        id:     1
+        body:   { foo: bar }
+        refresh: true
+  - do:
+       cluster.health:
+         wait_for_status: yellow
+  - do:
+      cat.recovery: {}
+  - match:
+      $body: >
+              /^(index1\s+\d+\s+\d+\s+(gateway|replica|snapshot|relocating)\s+(init|index|start|translog|finalize|done)\s+([a-zA-Z_0-9/.])+\s+([a-zA-Z_0-9/.])+\s+([a-zA-Z_0-9/.])+\s+([a-zA-Z_0-9/.])+\s+\d+\s+\d+\.\d+\%\s+\d+\s+\d+\.\d+\%\s+\n?){1,}$/
+
diff --git a/rest-api-spec/test/indices.recovery/10_basic.yaml b/rest-api-spec/test/indices.recovery/10_basic.yaml
@@ -0,0 +1,32 @@
+---
+"Indices recovery test":
+
+  - skip:
+      features: gtelte
+
+  - do:
+      indices.create:
+        index:  test_1
+
+  - do:
+      indices.recovery:
+        index: [test_1]
+
+  - match: { test_1.shards.0.type:                              "GATEWAY"               }
+  - match: { test_1.shards.0.stage:                             "DONE"                  }
+  - match: { test_1.shards.0.primary:                           true                    }
+  - match: { test_1.shards.0.target.ip:                         /^\d+\.\d+\.\d+\.\d+$/  }
+  - gte:   { test_1.shards.0.index.files.total:                 0                       }
+  - gte:   { test_1.shards.0.index.files.reused:                0                       }
+  - gte:   { test_1.shards.0.index.files.recovered:             0                       }
+  - match: { test_1.shards.0.index.files.percent:               /^\d+\.\d\%$/           }
+  - gte:   { test_1.shards.0.index.bytes.total:                 0                       }
+  - gte:   { test_1.shards.0.index.bytes.reused:                0                       }
+  - gte:   { test_1.shards.0.index.bytes.recovered:             0                       }
+  - match: { test_1.shards.0.index.bytes.percent:               /^\d+\.\d\%$/           }
+  - gte:   { test_1.shards.0.translog.recovered:                0                       }
+  - gte:   { test_1.shards.0.translog.total_time_in_millis:     0                       }
+  - gte:   { test_1.shards.0.start.check_index_time_in_millis:  0                       }
+  - gte:   { test_1.shards.0.start.total_time_in_millis:        0                       }
+
+
diff --git a/src/main/java/org/elasticsearch/action/ActionModule.java b/src/main/java/org/elasticsearch/action/ActionModule.java
@@ -95,6 +95,8 @@
 import org.elasticsearch.action.admin.indices.optimize.TransportOptimizeAction;
 import org.elasticsearch.action.admin.indices.refresh.RefreshAction;
 import org.elasticsearch.action.admin.indices.refresh.TransportRefreshAction;
+import org.elasticsearch.action.admin.indices.recovery.RecoveryAction;
+import org.elasticsearch.action.admin.indices.recovery.TransportRecoveryAction;
 import org.elasticsearch.action.admin.indices.segments.IndicesSegmentsAction;
 import org.elasticsearch.action.admin.indices.segments.TransportIndicesSegmentsAction;
 import org.elasticsearch.action.admin.indices.settings.get.GetSettingsAction;
@@ -284,6 +286,7 @@ protected void configure() {
         registerAction(MultiPercolateAction.INSTANCE, TransportMultiPercolateAction.class, TransportShardMultiPercolateAction.class);
         registerAction(ExplainAction.INSTANCE, TransportExplainAction.class);
         registerAction(ClearScrollAction.INSTANCE, TransportClearScrollAction.class);
+        registerAction(RecoveryAction.INSTANCE, TransportRecoveryAction.class);
 
         // register Name -> GenericAction Map that can be injected to instances.
         MapBinder<String, GenericAction> actionsBinder

diff --git a/src/main/java/org/elasticsearch/action/admin/indices/recovery/RecoveryAction.java b/src/main/java/org/elasticsearch/action/admin/indices/recovery/RecoveryAction.java
@@ -0,0 +1,46 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.action.admin.indices.recovery;
+
+import org.elasticsearch.client.IndicesAdminClient;
+import org.elasticsearch.action.admin.indices.IndicesAction;
+
+/**
+ * Recovery information action
+ */
+public class RecoveryAction extends IndicesAction<RecoveryRequest, RecoveryResponse, RecoveryRequestBuilder> {
+
+    public static final RecoveryAction INSTANCE = new RecoveryAction();
+    public static final String NAME = "indices/recovery";
+
+    private RecoveryAction() {
+        super(NAME);
+    }
+
+    @Override
+    public RecoveryRequestBuilder newRequestBuilder(IndicesAdminClient client) {
+        return new RecoveryRequestBuilder(client);
+    }
+
+    @Override
+    public RecoveryResponse newResponse() {
+        return new RecoveryResponse();
+    }
+}