Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #742 from kgaillot/stonith
Minor stonith fixes and refactoring
- Loading branch information
Showing
5 changed files
with
429 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Directory contents | ||
|
||
* `admin.c`, `stonith_admin.8`: `stonith_admin` command-line tool and its man | ||
page | ||
* `commands.c`, `internal.h`, `main.c`, `remote.c`, `stonithd.7`: stonithd and | ||
its man page | ||
* `fence_dummy`, `fence_legacy`, `fence_legacy.8`, `fence_pcmk`, | ||
`fence_pcmk.8`: Pacemaker-supplied fence agents and their man pages | ||
* `regression.py(.in)`: regression tests for `stonithd` | ||
* `standalone_config.c`, `standalone_config.h`: abandoned project | ||
* `test.c`: `stonith-test` command-line tool | ||
|
||
# How fencing requests are handled | ||
|
||
## Bird's eye view | ||
|
||
In the broadest terms, stonith works like this: | ||
|
||
1. The initiator (an external program such as `stonith_admin`, or the cluster | ||
itself via the `crmd`) asks the local `stonithd`, "Hey, can you fence this | ||
node?" | ||
1. The local `stonithd` asks all the `stonithd's` in the cluster (including | ||
itself), "Hey, what fencing devices do you have access to that can fence | ||
this node?" | ||
1. Each `stonithd` in the cluster replies with a list of available devices that | ||
it knows about. | ||
1. Once the original `stonithd` gets all the replies, it asks the most | ||
appropriate `stonithd` peer to actually carry out the fencing. It may send | ||
out more than one such request if the target node must be fenced with | ||
multiple devices. | ||
1. The chosen `stonithd(s)` call the appropriate fencing resource agent(s) to | ||
do the fencing, then replies to the original `stonithd` with the result. | ||
1. The original `stonithd` broadcasts the result to all `stonithd's`. | ||
1. Each `stonithd` sends the result to each of its local clients (including, at | ||
some point, the initiator). | ||
|
||
## Detailed view | ||
|
||
### Initiating a fencing request | ||
|
||
A fencing request can be initiated by the cluster or externally, using the | ||
libfencing API. | ||
|
||
* The cluster always initiates fencing via `crmd/te_actions.c:te_fence_node()` | ||
(which calls the `fence()` API). This occurs when a graph synapse contains a | ||
`CRM_OP_FENCE` XML operation. | ||
* The main external clients are `stonith_admin` and `stonith-test`. | ||
|
||
Highlights of the fencing API: | ||
* `stonith_api_new()` creates and returns a new `stonith_t` object, whose | ||
`cmds` member has methods for connect, disconnect, fence, etc. | ||
* the `fence()` method creates and sends a `STONITH_OP_FENCE XML` request with | ||
the desired action and target node. Callers do not have to choose or even | ||
have any knowledge about particular fencing devices. | ||
|
||
### Fencing queries | ||
|
||
The function calls for a stonith request go something like this as of this writing: | ||
|
||
The local `stonithd` receives the client's request via an IPC or messaging | ||
layer callback, which calls | ||
* `stonith_command()`, which (for requests) calls | ||
* `handle_request()`, which (for `STONITH_OP_FENCE` from a client) calls | ||
* `initiate_remote_stonith_op()`, which creates a `STONITH_OP_QUERY` XML | ||
request with the target, desired action, timeout, etc.. then broadcasts | ||
the operation to the cluster group (i.e. all `stonithd` instances) and | ||
starts a timer. The query is broadcast because (1) location constraints | ||
might prevent the local node from accessing the stonith device directly, | ||
and (2) even if the local node does have direct access, another node | ||
might be preferred to carry out the fencing. | ||
|
||
Each `stonithd` receives the original `stonithd's STONITH_OP_QUERY` broadcast | ||
request via IPC or messaging layer callback, which calls: | ||
* `stonith_command()`, which (for requests) calls | ||
* `handle_request()`, which (for `STONITH_OP_QUERY` from a peer) calls | ||
* `stonith_query()`, which calls | ||
* `get_capable_devices()` with `stonith_query_capable_device_db()` to add | ||
device information to an XML reply and send it. (A message is | ||
considered a reply if it contains `T_STONITH_REPLY`, which is only set | ||
by `stonithd` peers, not clients.) | ||
|
||
The original `stonithd` receives all peers' `STONITH_OP_QUERY` replies via IPC | ||
or messaging layer callback, which calls: | ||
* `stonith_command()`, which (for replies) calls | ||
* `handle_reply()` which (for `STONITH_OP_QUERY`) calls | ||
* `process_remote_stonith_query()`, which allocates a new query result | ||
structure, parses device information into it, and adds it to operation | ||
object. It increments the number of replies received for this operation, | ||
and compares it against the expected number of replies (i.e. the number | ||
of active peers), and if this is the last expected reply, calls | ||
* `call_remote_stonith()`, which calculates the timeout and sends | ||
`STONITH_OP_FENCE` request(s) to carry out the fencing. If the target | ||
node has a fencing "topology" (which allows specifications such as | ||
"this node can be fenced either with device A, or devices B and C in | ||
combination"), it will choose the device(s), and send out as many | ||
requests as needed. If it chooses a device, it will choose the peer; a | ||
peer is preferred if it has "verified" access to the desired device, | ||
meaning that it has the device "running" on it and thus has a monitor | ||
operation ensuring reachability. | ||
|
||
### Fencing operations | ||
|
||
Each `STONITH_OP_FENCE` request goes something like this as of this writing: | ||
|
||
The chosen peer `stonithd` receives the `STONITH_OP_FENCE` request via IPC or | ||
messaging layer callback, which calls: | ||
* `stonith_command()`, which (for requests) calls | ||
* `handle_request()`, which (for `STONITH_OP_FENCE` from a peer) calls | ||
* `stonith_fence()`, which calls | ||
* `schedule_stonith_command()` (using supplied device if | ||
`F_STONITH_DEVICE` was set, otherwise the highest-priority capable | ||
device obtained via `get_capable_devices()` with | ||
`stonith_fence_get_devices_cb()`), which adds the operation to the | ||
device's pending operations list and triggers processing. | ||
|
||
The chosen peer `stonithd's` mainloop is triggered and calls | ||
* `stonith_device_dispatch()`, which calls | ||
* `stonith_device_execute()`, which pops off the next item from the device's | ||
pending operations list. If acting as the (internally implemented) watchdog | ||
agent, it panics the node, otherwise it calls | ||
* `stonith_action_create()` and `stonith_action_execute_async()` to call the fencing agent. | ||
|
||
The chosen peer stonithd's mainloop is triggered again once the fencing agent returns, and calls | ||
* `stonith_action_async_done()` which adds the results to an action object then calls its | ||
* done callback (`st_child_done()`), which calls `schedule_stonith_command()` | ||
for a new device if there are further required actions to execute or if the | ||
original action failed, then builds and sends an XML reply to the original | ||
`stonithd` (via `stonith_send_async_reply()`), then checks whether any | ||
pending actions are the same as the one just executed and merges them if so. | ||
|
||
### Fencing replies | ||
|
||
The original `stonithd` receives the `STONITH_OP_FENCE` reply via IPC or | ||
messaging layer callback, which calls: | ||
* `stonith_command()`, which (for replies) calls | ||
* `handle_reply()`, which calls | ||
* `process_remote_stonith_exec()`, which calls either | ||
`call_remote_stonith()` (to retry a failed operation, or try the next | ||
device in a topology is appropriate, which issues a new | ||
`STONITH_OP_FENCE` request, proceeding as before) or `remote_op_done()` | ||
(if the operation is definitively failed or successful). | ||
* remote_op_done() broadcasts the result to all peers. | ||
|
||
Finally, all peers receive the broadcast result and call | ||
* `remote_op_done()`, which sends the result to all local clients. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.