Docs: Documented delayed allocation settings

Relates to: elastic#11712
szroland · Jun 30, 2015 · 53cb4e6 · 53cb4e6
1 parent d392458
commit 53cb4e6
Show file tree

Hide file tree

Showing 5 changed files with 238 additions and 133 deletions.
diff --git a/docs/reference/cluster/health.asciidoc b/docs/reference/cluster/health.asciidoc
@@ -7,18 +7,22 @@ of the cluster.
 [source,js]
 --------------------------------------------------
 $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
-{                                                                                            
-  "cluster_name" : "testcluster",                                                              
-  "status" : "green",                                                                        
-  "timed_out" : false,                                                                       
-  "number_of_nodes" : 2,                                                                     
-  "number_of_data_nodes" : 2,                                                                
-  "active_primary_shards" : 5,                                                               
-  "active_shards" : 10,                                                                      
-  "relocating_shards" : 0,                                                                   
-  "initializing_shards" : 0,                                                                 
+{
+  "cluster_name" : "testcluster",
+  "status" : "green",
+  "timed_out" : false,
+  "number_of_nodes" : 2,
+  "number_of_data_nodes" : 2,
+  "active_primary_shards" : 5,
+  "active_shards" : 10,
+  "relocating_shards" : 0,
+  "initializing_shards" : 0,
   "unassigned_shards" : 0,
-  "number_of_pending_tasks" : 0
+  "delayed_unassigned_shards": 0,
+  "number_of_pending_tasks" : 0,
+  "number_of_in_flight_fetch": 0,
+  "task_max_waiting_in_queue_millis": 0,
+  "active_shards_percent_as_number": 100
 }
 --------------------------------------------------
 
@@ -61,14 +65,14 @@ The cluster health API accepts the following request parameters:
 `wait_for_status`::
     One of `green`, `yellow` or `red`. Will wait (until
     the timeout provided) until the status of the cluster changes to the one
-    provided or better, i.e. `green` > `yellow` > `red`. By default, will not 
+    provided or better, i.e. `green` > `yellow` > `red`. By default, will not
     wait for any status.
 
 `wait_for_relocating_shards`::
     A number controlling to how many relocating
     shards to wait for. Usually will be `0` to indicate to wait till all
     relocations have happened. Defaults to not wait.
-    
+
 `wait_for_active_shards`::
     A number controlling to how many active
     shards to wait for. Defaults to not wait.

diff --git a/docs/reference/index-modules/allocation.asciidoc b/docs/reference/index-modules/allocation.asciidoc
@@ -2,130 +2,17 @@
 == Index Shard Allocation
 
 This module provides per-index settings to control the allocation of shards to
-nodes.
+nodes:
 
-[float]
-[[shard-allocation-filtering]]
-=== Shard Allocation Filtering
+* <<shard-allocation-filtering,Shard allocation filtering>>: Controlling which shards are allocated to which nodes.
+* <<delayed-allocation,Delayed allocation>>: Delaying allocation of unassigned shards caused by a node leaving.
+* <<allocation-total-shards,Total shards per node>>: A hard limit on the number of shards from the same index per node.
 
-Shard allocation filtering allows you to specify which nodes are allowed
-to host the shards of a particular index.
+include::allocation/filtering.asciidoc[]
 
-NOTE: The per-index shard allocation filters explained below work in
-conjunction with the cluster-wide allocation filters explained in
-<<shards-allocation>>.
+include::allocation/delayed.asciidoc[]
 
-It is possible to assign arbitrary metadata attributes to each node at
-startup.  For instance, nodes could be assigned a `rack` and a `group`
-attribute as follows:
-
-[source,sh]
-------------------------
-bin/elasticsearch --node.rack rack1 --node.size big  <1>
-------------------------
-<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file.
-
-These metadata attributes can be used with the
-`index.routing.allocation.*` settings to allocate an index to a particular
-group of nodes.  For instance, we can move the index `test` to either `big` or
-`medium` nodes as follows:
-
-[source,json]
-------------------------
-PUT test/_settings
-{
-  "index.routing.allocation.include.size": "big,medium"
-}
-------------------------
-// AUTOSENSE
-
-Alternatively, we can move the index `test` away from the `small` nodes with
-an `exclude` rule:
-
-[source,json]
-------------------------
-PUT test/_settings
-{
-  "index.routing.allocation.exclude.size": "small"
-}
-------------------------
-// AUTOSENSE
-
-Multiple rules can be specified, in which case all conditions must be
-satisfied.  For instance, we could move the index `test` to `big` nodes in
-`rack1` with the following:
-
-[source,json]
-------------------------
-PUT test/_settings
-{
-  "index.routing.allocation.include.size": "big",
-  "index.routing.allocation.include.rack": "rack1"
-}
-------------------------
-// AUTOSENSE
-
-NOTE: If some conditions cannot be satisfied then shards will not be moved.
-
-The following settings are _dynamic_, allowing live indices to be moved from
-one set of nodes to another:
-
-`index.routing.allocation.include.{attribute}`::
-
-    Assign the index to a node whose `{attribute}` has at least one of the
-    comma-separated values.
-
-`index.routing.allocation.require.{attribute}`::
-
-    Assign the index to a node whose `{attribute}` has _all_ of the
-    comma-separated values.
-
-`index.routing.allocation.exclude.{attribute}`::
-
-    Assign the index to a node whose `{attribute}` has _none_ of the
-    comma-separated values.
-
-These special attributes are also supported:
-
-[horizontal]
-`_name`::   Match nodes by node name
-`_ip`::     Match nodes by IP address (the IP address associated with the hostname)
-`_host`::   Match nodes by hostname
-
-All attribute values can be specified with wildcards, eg:
-
-[source,json]
-------------------------
-PUT test/_settings
-{
-  "index.routing.allocation.include._ip": "192.168.2.*"
-}
-------------------------
-// AUTOSENSE
-
-[float]
-=== Total Shards Per Node
-
-The cluster-level shard allocator tries to spread the shards of a single index
-across as many nodes as possible.  However, depending on how many shards and
-indices you have, and how big they are, it may not always be possible to spread
-shards evenly.
-
-The following _dynamic_ setting allows you to specify a hard limit on the total
-number of shards from a single index allowed per node:
-
-`index.routing.allocation.total_shards_per_node`::
-
-    The maximum number of shards (replicas and primaries) that will be
-    allocated to a single node.  Defaults to unbounded.
-
-[WARNING]
-=======================================
-This setting imposes a hard limit which can result in some shards not
-being allocated.
-
-Use with caution.
-=======================================
+include::allocation/total_shards.asciidoc[]
 
 
 
diff --git a/docs/reference/index-modules/allocation/delayed.asciidoc b/docs/reference/index-modules/allocation/delayed.asciidoc
@@ -0,0 +1,91 @@
+[[delayed-allocation]]
+=== Delaying allocation when a node leaves
+
+When a node leaves the cluster for whatever reason, intentional or otherwise,
+the master reacts by:
+
+* Promoting a replica shard to primary to replace any primaries that were on the node.
+* Allocating replica shards to replace the missing replicas (assuming there are enough nodes).
+* Rebalancing shards evenly across the remaining nodes.
+
+These actions are intended to protect the cluster against data loss by
+ensuring that every shard is fully replicated as soon as possible.
+
+Even though we throttle concurrent recoveries both at the
+<<recovery,node level>> and at the <<shards-allocation,cluster level>>, this
+``shard-shuffle'' can still put a lot of extra load on the cluster which
+may not be necessary if the missing node is likely to return soon. Imagine
+this scenario:
+
+* Node 5 loses network connectivity.
+* The master promotes a replica shard to primary for each primary that was on Node 5.
+* The master allocates new replicas to other nodes in the cluster.
+* Each new replica makes an entire copy of the primary shard across the network.
+* More shards are moved to different nodes to rebalance the cluster.
+* Node 5 returns after a few minutes.
+* The master rebalances the cluster by allocating shards to Node 5.
+
+If the master had just waited for a few minutes, then the missing shards could
+have been re-allocated to Node 5 with the minimum of network traffic.  This
+process would be even quicker for idle shards (shards not receiving indexing
+requests) which have been automatically <<indices-synced-flush,sync-flushed>>.
+
+The allocation of replica shards which become unassigned because a node has
+left can be delayed with the `index.unassigned.node_left.delayed_timeout`
+dynamic setting, which defaults to `0` (reassign shards immediately).
+
+This setting can be updated on a live index (or on all indices):
+
+[source,js]
+------------------------------
+PUT /_all/_settings
+{
+  "settings": {
+    "index.unassigned.node_left.delayed_timeout": "5m"
+  }
+}
+------------------------------
+// AUTOSENSE
+
+With delayed allocation enabled, the above scenario changes to look like this:
+
+* Node 5 loses network connectivity.
+* The master promotes a replica shard to primary for each primary that was on Node 5.
+* The master logs a message that allocation of unassigned shards has been delayed, and for how long.
+* The cluster remains yellow because there are unassigned replica shards.
+* Node 5 returns after a few minutes, before the `timeout` expires.
+* The missing replicas are re-allocated to Node 5 (and sync-flushed shards recover almost immediately).
+
+NOTE: This setting will not affect the promotion of replicas to primaries, nor
+will it affect the assignment of replicas that have not been assigned
+previously.
+
+==== Monitoring delayed unassigned shards
+
+The number of shards whose allocation has been delayed by this timeout setting
+can be viewed with the <<cluster-health,cluster health API>>:
+
+[source,js]
+------------------------------
+GET _cluster/health <1>
+------------------------------
+<1> This request will return a `delayed_unassigned_shards` value.
+
+==== Removing a node permanently
+
+If a node is not going to return and you would like Elasticsearch to allocate
+the missing shards immediately, just update the timeout to zero:
+
+
+[source,js]
+------------------------------
+PUT /_all/_settings
+{
+  "settings": {
+    "index.unassigned.node_left.delayed_timeout": "0"
+  }
+}
+------------------------------
+// AUTOSENSE
+
+You can reset the timeout as soon as the missing shards have started to recover.
diff --git a/docs/reference/index-modules/allocation/filtering.asciidoc b/docs/reference/index-modules/allocation/filtering.asciidoc
@@ -0,0 +1,97 @@
+[[shard-allocation-filtering]]
+=== Shard Allocation Filtering
+
+Shard allocation filtering allows you to specify which nodes are allowed
+to host the shards of a particular index.
+
+NOTE: The per-index shard allocation filters explained below work in
+conjunction with the cluster-wide allocation filters explained in
+<<shards-allocation>>.
+
+It is possible to assign arbitrary metadata attributes to each node at
+startup.  For instance, nodes could be assigned a `rack` and a `group`
+attribute as follows:
+
+[source,sh]
+------------------------
+bin/elasticsearch --node.rack rack1 --node.size big  <1>
+------------------------
+<1> These attribute settings can also be specfied in the `elasticsearch.yml` config file.
+
+These metadata attributes can be used with the
+`index.routing.allocation.*` settings to allocate an index to a particular
+group of nodes.  For instance, we can move the index `test` to either `big` or
+`medium` nodes as follows:
+
+[source,json]
+------------------------
+PUT test/_settings
+{
+  "index.routing.allocation.include.size": "big,medium"
+}
+------------------------
+// AUTOSENSE
+
+Alternatively, we can move the index `test` away from the `small` nodes with
+an `exclude` rule:
+
+[source,json]
+------------------------
+PUT test/_settings
+{
+  "index.routing.allocation.exclude.size": "small"
+}
+------------------------
+// AUTOSENSE
+
+Multiple rules can be specified, in which case all conditions must be
+satisfied.  For instance, we could move the index `test` to `big` nodes in
+`rack1` with the following:
+
+[source,json]
+------------------------
+PUT test/_settings
+{
+  "index.routing.allocation.include.size": "big",
+  "index.routing.allocation.include.rack": "rack1"
+}
+------------------------
+// AUTOSENSE
+
+NOTE: If some conditions cannot be satisfied then shards will not be moved.
+
+The following settings are _dynamic_, allowing live indices to be moved from
+one set of nodes to another:
+
+`index.routing.allocation.include.{attribute}`::
+
+    Assign the index to a node whose `{attribute}` has at least one of the
+    comma-separated values.
+
+`index.routing.allocation.require.{attribute}`::
+
+    Assign the index to a node whose `{attribute}` has _all_ of the
+    comma-separated values.
+
+`index.routing.allocation.exclude.{attribute}`::
+
+    Assign the index to a node whose `{attribute}` has _none_ of the
+    comma-separated values.
+
+These special attributes are also supported:
+
+[horizontal]
+`_name`::   Match nodes by node name
+`_ip`::     Match nodes by IP address (the IP address associated with the hostname)
+`_host`::   Match nodes by hostname
+
+All attribute values can be specified with wildcards, eg:
+
+[source,json]
+------------------------
+PUT test/_settings
+{
+  "index.routing.allocation.include._ip": "192.168.2.*"
+}
+------------------------
+// AUTOSENSE