Skip to content
This repository has been archived by the owner on Feb 27, 2020. It is now read-only.

Commit

Permalink
Jenkins checking in autogenerated rST files
Browse files Browse the repository at this point in the history
  • Loading branch information
AthenaNebula Jenkins committed May 24, 2017
1 parent fc472cf commit f029234
Showing 1 changed file with 23 additions and 19 deletions.
42 changes: 23 additions & 19 deletions autogenerated_rst_docs/Handling_Failed_Nodes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,28 @@ Dealing with Failed Nodes

Nodes can be easily removed from a Clearwater deployment by following
the instructions for `elastic
scaling <Clearwater_Elastic_Scaling.html>`__. However sometimes a node or
scaling <Clearwater_Elastic_Scaling.html>`__. However, sometimes a node or
nodes may fail unexpectedly. If the nodes cannot be recovered, then you
should do the following (in the order specified). \* If one or more
nodes have failed that were acting as etcd masters (see
`configuration <Clearwater_Configuration_Options_Reference.html>`__) and
as a result you have lost 50% (or more) of your etcd master nodes in any
one site then the etcd cluster for that site will have lost "quorum" and
have become read-only. To recover the etcd cluster you will need to
follow the process `here <Handling_Multiple_Failed_Nodes.html>`__. \* If
one or more nodes have failed that were acting as etcd masters but
*more* than half of your etcd cluster remains operational then you must
first follow the steps below: “removing a failed node from an etcd
cluster” \* If a Vellum node has failed then you should follow the
instructions below: “removing a failed Vellum node from the data store
clusters” \* You can now spin up a new node to replace the lost
capacity. If you are replacing a node that had been acting as an etcd
master then you should typically configure the new node to also be an
etcd master in order to retain your original etcd cluster size.
should do the following (in the order specified):

- If one or more nodes have failed that were acting as etcd masters
(see
`configuration <Clearwater_Configuration_Options_Reference.html>`__)
and as a result you have lost 50% (or more) of your etcd master nodes
in any one site then the etcd cluster for that site will have lost
"quorum" and have become read-only. To recover the etcd cluster you
will need to follow the process
`here <Handling_Multiple_Failed_Nodes.html>`__.
- If one or more nodes have failed that were acting as etcd masters but
*more* than half of your etcd cluster remains operational then you
must first follow the steps below: "Removing a failed node from an
etcd cluster"
- If a Vellum node has failed then you should follow the instructions
below: "Removing a failed Vellum node from the data store clusters"
- You can now spin up a new node to replace the lost capacity. If you
are replacing a node that had been acting as an etcd master then you
should typically configure the new node to also be an etcd master in
order to retain your original etcd cluster size.

The processes described below do not affect call processing and can be
run on a system handling call traffic.
Expand All @@ -29,11 +33,11 @@ Removing a failed node from an etcd cluster
-------------------------------------------

If a node fails that was acting as an etcd master then it must be
manually removed from the sites etcd cluster. Failure to do so may
manually removed from the site's etcd cluster. Failure to do so may
leave the site in a state where future scaling operations do not work,
or where in-progress scaling operations fail to complete.

This process assumes that more than half of the sites etcd cluster is
This process assumes that more than half of the site's etcd cluster is
still healthy and so the etcd cluster still has quorum. If 50% or more
of the etcd masters in a given site have failed then you will need to
first follow the process `here <Handling_Multiple_Failed_Nodes.html>`__.
Expand Down

0 comments on commit f029234

Please sign in to comment.