Merge pull request #742 from kgaillot/stonith

Minor stonith fixes and refactoring
ClusterLabs · Jul 1, 2015 · c4d6be4 · c4d6be4
2 parents 1edcf61 + 5966ddb
commit c4d6be4
Show file tree

Hide file tree

Showing 5 changed files with 429 additions and 81 deletions.
diff --git a/cts/CTStests.py b/cts/CTStests.py
@@ -1314,7 +1314,8 @@ def __call__(self, node):
         self.debug("Shooting %s aka. %s" % (rsc.clone_id, rsc.id))
 
         pats = []
-        pats.append("pengine.*: warning: Processing failed op %s for %s on" % (self.action, self.rid))
+        pats.append(r"pengine.*: warning: Processing failed op %s for (%s|%s) on" % (self.action,
+            rsc.id, rsc.clone_id))
 
         if rsc.managed():
             pats.append(self.templates["Pat:RscOpOK"] % (self.rid, "stop_0"))

diff --git a/fencing/README.md b/fencing/README.md
@@ -0,0 +1,145 @@
+# Directory contents
+
+* `admin.c`, `stonith_admin.8`: `stonith_admin` command-line tool and its man
+  page
+* `commands.c`, `internal.h`, `main.c`, `remote.c`, `stonithd.7`: stonithd and
+  its man page
+* `fence_dummy`, `fence_legacy`, `fence_legacy.8`, `fence_pcmk`,
+  `fence_pcmk.8`: Pacemaker-supplied fence agents and their man pages
+* `regression.py(.in)`: regression tests for `stonithd`
+* `standalone_config.c`, `standalone_config.h`: abandoned project
+* `test.c`: `stonith-test` command-line tool
+
+# How fencing requests are handled
+
+## Bird's eye view
+
+In the broadest terms, stonith works like this:
+
+1. The initiator (an external program such as `stonith_admin`, or the cluster
+   itself via the `crmd`) asks the local `stonithd`, "Hey, can you fence this
+   node?"
+1. The local `stonithd` asks all the `stonithd's` in the cluster (including
+   itself), "Hey, what fencing devices do you have access to that can fence
+   this node?"
+1. Each `stonithd` in the cluster replies with a list of available devices that
+   it knows about.
+1. Once the original `stonithd` gets all the replies, it asks the most
+   appropriate `stonithd` peer to actually carry out the fencing. It may send
+   out more than one such request if the target node must be fenced with
+   multiple devices.
+1. The chosen `stonithd(s)` call the appropriate fencing resource agent(s) to
+   do the fencing, then replies to the original `stonithd` with the result.
+1. The original `stonithd` broadcasts the result to all `stonithd's`.
+1. Each `stonithd` sends the result to each of its local clients (including, at
+   some point, the initiator).
+
+## Detailed view
+
+### Initiating a fencing request
+
+A fencing request can be initiated by the cluster or externally, using the
+libfencing API.
+
+* The cluster always initiates fencing via `crmd/te_actions.c:te_fence_node()`
+  (which calls the `fence()` API). This occurs when a graph synapse contains a
+  `CRM_OP_FENCE` XML operation.
+* The main external clients are `stonith_admin` and `stonith-test`.
+
+Highlights of the fencing API:
+* `stonith_api_new()` creates and returns a new `stonith_t` object, whose
+  `cmds` member has methods for connect, disconnect, fence, etc.
+* the `fence()` method creates and sends a `STONITH_OP_FENCE XML` request with
+  the desired action and target node. Callers do not have to choose or even
+  have any knowledge about particular fencing devices.
+
+### Fencing queries
+
+The function calls for a stonith request go something like this as of this writing:
+
+The local `stonithd` receives the client's request via an IPC or messaging
+layer callback, which calls
+* `stonith_command()`, which (for requests) calls
+  * `handle_request()`, which (for `STONITH_OP_FENCE` from a client) calls
+    * `initiate_remote_stonith_op()`, which creates a `STONITH_OP_QUERY` XML
+      request with the target, desired action, timeout, etc.. then broadcasts
+      the operation to the cluster group (i.e. all `stonithd` instances) and
+      starts a timer. The query is broadcast because (1) location constraints
+      might prevent the local node from accessing the stonith device directly,
+      and (2) even if the local node does have direct access, another node
+      might be preferred to carry out the fencing.
+
+Each `stonithd` receives the original `stonithd's STONITH_OP_QUERY` broadcast
+request via IPC or messaging layer callback, which calls:
+* `stonith_command()`, which (for requests) calls
+  *  `handle_request()`, which (for `STONITH_OP_QUERY` from a peer) calls
+    * `stonith_query()`, which calls
+      * `get_capable_devices()` with `stonith_query_capable_device_db()` to add
+        device information to an XML reply and send it. (A message is
+	considered a reply if it contains `T_STONITH_REPLY`, which is only set
+        by `stonithd` peers, not clients.)
+
+The original `stonithd` receives all peers' `STONITH_OP_QUERY` replies via IPC
+or messaging layer callback, which calls:
+* `stonith_command()`, which (for replies) calls
+  * `handle_reply()` which (for `STONITH_OP_QUERY`) calls
+    * `process_remote_stonith_query()`, which allocates a new query result
+      structure, parses device information into it, and adds it to operation
+      object. It increments the number of replies received for this operation,
+      and compares it against the expected number of replies (i.e. the number
+      of active peers), and if this is the last expected reply, calls
+      * `call_remote_stonith()`, which calculates the timeout and sends
+        `STONITH_OP_FENCE` request(s) to carry out the fencing. If the target
+	node has a fencing "topology" (which allows specifications such as
+	"this node can be fenced either with device A, or devices B and C in
+	combination"), it will choose the device(s), and send out as many
+	requests as needed. If it chooses a device, it will choose the peer; a
+	peer is preferred if it has "verified" access to the desired device,
+	meaning that it has the device "running" on it and thus has a monitor
+        operation ensuring reachability.
+
+### Fencing operations
+
+Each `STONITH_OP_FENCE` request goes something like this as of this writing:
+
+The chosen peer `stonithd` receives the `STONITH_OP_FENCE` request via IPC or
+messaging layer callback, which calls:
+* `stonith_command()`, which (for requests) calls
+  * `handle_request()`, which (for `STONITH_OP_FENCE` from a peer) calls
+    * `stonith_fence()`, which calls
+      * `schedule_stonith_command()` (using supplied device if
+        `F_STONITH_DEVICE` was set, otherwise the highest-priority capable
+	device obtained via `get_capable_devices()` with
+	`stonith_fence_get_devices_cb()`), which adds the operation to the
+        device's pending operations list and triggers processing.
+
+The chosen peer `stonithd's` mainloop is triggered and calls
+* `stonith_device_dispatch()`, which calls
+  * `stonith_device_execute()`, which pops off the next item from the device's
+    pending operations list. If acting as the (internally implemented) watchdog
+    agent, it panics the node, otherwise it calls
+    * `stonith_action_create()` and `stonith_action_execute_async()` to call the fencing agent.
+
+The chosen peer stonithd's mainloop is triggered again once the fencing agent returns, and calls
+* `stonith_action_async_done()` which adds the results to an action object then calls its
+  * done callback (`st_child_done()`), which calls `schedule_stonith_command()`
+    for a new device if there are further required actions to execute or if the
+    original action failed, then builds and sends an XML reply to the original
+    `stonithd` (via `stonith_send_async_reply()`), then checks whether any
+    pending actions are the same as the one just executed and merges them if so.
+
+### Fencing replies
+
+The original `stonithd` receives the `STONITH_OP_FENCE` reply via IPC or
+messaging layer callback, which calls:
+* `stonith_command()`, which (for replies) calls
+  * `handle_reply()`, which calls
+    * `process_remote_stonith_exec()`, which calls either
+      `call_remote_stonith()` (to retry a failed operation, or try the next
+       device in a topology is appropriate, which issues a new
+      `STONITH_OP_FENCE` request, proceeding as before) or `remote_op_done()`
+      (if the operation is definitively failed or successful).
+      * remote_op_done() broadcasts the result to all peers.
+
+Finally, all peers receive the broadcast result and call
+* `remote_op_done()`, which sends the result to all local clients.
diff --git a/fencing/commands.c b/fencing/commands.c
@@ -53,15 +53,24 @@ GHashTable *topology = NULL;
 GList *cmd_list = NULL;
 
 struct device_search_s {
+    /* target of fence action */
     char *host;
+    /* requested fence action */
     char *action;
+    /* timeout to use if a device is queried dynamically for possible targets */
     int per_device_timeout;
+    /* number of registered fencing devices at time of request */
     int replies_needed;
+    /* number of device replies received so far */
     int replies_received;
+    /* whether the target is eligible to perform requested action (or off) */
     bool allow_suicide;
 
+    /* private data to pass to search callback function */
     void *user_data;
+    /* function to call when all replies have been received */
     void (*callback) (GList * devices, void *user_data);
+    /* devices capable of performing requested action (or off if remapping) */
     GListPtr capable;
 };
 
@@ -173,6 +182,17 @@ get_action_timeout(stonith_device_t * device, const char *action, int default_ti
         char buffer[64] = { 0, };
         const char *value = NULL;
 
+        /* If "reboot" was requested but the device does not support it,
+         * we will remap to "off", so check timeout for "off" instead
+         */
+        if (safe_str_eq(action, "reboot")
+            && is_not_set(device->flags, st_device_supports_reboot)) {
+            crm_trace("%s doesn't support reboot, using timeout for off instead",
+                      device->id);
+            action = "off";
+        }
+
+        /* If the device config specified an action-specific timeout, use it */
         snprintf(buffer, sizeof(buffer) - 1, "pcmk_%s_timeout", action);
         value = g_hash_table_lookup(device->params, buffer);
         if (value) {
@@ -1241,6 +1261,38 @@ search_devices_record_result(struct device_search_s *search, const char *device,
     }
 }
 
+/*
+ * \internal
+ * \brief Check whether the local host is allowed to execute a fencing action
+ *
+ * \param[in] device         Fence device to check
+ * \param[in] action         Fence action to check
+ * \param[in] target         Hostname of fence target
+ * \param[in] allow_suicide  Whether self-fencing is allowed for this operation
+ *
+ * \return TRUE if local host is allowed to execute action, FALSE otherwise
+ */
+static gboolean
+localhost_is_eligible(const stonith_device_t *device, const char *action,
+                      const char *target, gboolean allow_suicide)
+{
+    gboolean localhost_is_target = safe_str_eq(target, stonith_our_uname);
+
+    if (device && action && device->on_target_actions
+        && strstr(device->on_target_actions, action)) {
+        if (!localhost_is_target) {
+            crm_trace("%s operation with %s can only be executed for localhost not %s",
+                      action, device->id, target);
+            return FALSE;
+        }
+
+    } else if (localhost_is_target && !allow_suicide) {
+        crm_trace("%s operation does not support self-fencing", action);
+        return FALSE;
+    }
+    return TRUE;
+}
+
 static void
 can_fence_host_with_device(stonith_device_t * dev, struct device_search_s *search)
 {
@@ -1258,19 +1310,11 @@ can_fence_host_with_device(stonith_device_t * dev, struct device_search_s *searc
         goto search_report_results;
     }
 
-    if (dev->on_target_actions &&
-        search->action &&
-        strstr(dev->on_target_actions, search->action)) {
-        /* this device can only execute this action on the target node */
-
-        if(safe_str_neq(host, stonith_our_uname)) {
-            crm_trace("%s operation with %s can only be executed for localhost not %s",
-                      search->action, dev->id, host);
-            goto search_report_results;
-        }
-
-    } else if(safe_str_eq(host, stonith_our_uname) && search->allow_suicide == FALSE) {
-        crm_trace("%s operation does not support self-fencing", search->action);
+    /* Short-circuit the query if the local host is not allowed to perform the
+     * desired action.
+     */
+    if (!localhost_is_eligible(dev, search->action, host,
+                               search->allow_suicide)) {
         goto search_report_results;
     }
 
@@ -1423,6 +1467,43 @@ struct st_query_data {
     int call_options;
 };
 
+/*
+ * \internal
+ * \brief Add action-specific attributes to query reply XML
+ *
+ * \param[in,out] xml     XML to add attributes to
+ * \param[in]     action  Fence action
+ * \param[in]     device  Fence device
+ */
+static void
+add_action_specific_attributes(xmlNode *xml, const char *action,
+                               stonith_device_t *device)
+{
+    int action_specific_timeout;
+    int delay_max;
+
+    CRM_CHECK(xml && action && device, return);
+
+    if (is_action_required(action, device)) {
+        crm_trace("Action %s is required on %s", action, device->id);
+        crm_xml_add_int(xml, F_STONITH_DEVICE_REQUIRED, 1);
+    }
+
+    action_specific_timeout = get_action_timeout(device, action, 0);
+    if (action_specific_timeout) {
+        crm_trace("Action %s has timeout %dms on %s",
+                  action, action_specific_timeout, device->id);
+        crm_xml_add_int(xml, F_STONITH_ACTION_TIMEOUT, action_specific_timeout);
+    }
+
+    delay_max = get_action_delay_max(device, action);
+    if (delay_max > 0) {
+        crm_trace("Action %s has maximum random delay %dms on %s",
+                  action, delay_max, device->id);
+        crm_xml_add_int(xml, F_STONITH_DELAY_MAX, delay_max / 1000);
+    }
+}
+
 static void
 stonith_query_capable_device_cb(GList * devices, void *user_data)
 {
@@ -1432,13 +1513,12 @@ stonith_query_capable_device_cb(GList * devices, void *user_data)
     xmlNode *list = NULL;
     GListPtr lpc = NULL;
 
-    /* Pack the results into data */
+    /* Pack the results into XML */
     list = create_xml_node(NULL, __FUNCTION__);
     crm_xml_add(list, F_STONITH_TARGET, query->target);
     for (lpc = devices; lpc != NULL; lpc = lpc->next) {
         stonith_device_t *device = g_hash_table_lookup(device_list, lpc->data);
-        int action_specific_timeout;
-        int delay_max;
+        const char *action = query->action;
 
         if (!device) {
             /* It is possible the device got unregistered while
@@ -1448,24 +1528,25 @@ stonith_query_capable_device_cb(GList * devices, void *user_data)
 
         available_devices++;
 
-        action_specific_timeout = get_action_timeout(device, query->action, 0);
         dev = create_xml_node(list, F_STONITH_DEVICE);
         crm_xml_add(dev, XML_ATTR_ID, device->id);
         crm_xml_add(dev, "namespace", device->namespace);
         crm_xml_add(dev, "agent", device->agent);
         crm_xml_add_int(dev, F_STONITH_DEVICE_VERIFIED, device->verified);
-        if (is_action_required(query->action, device)) {
-            crm_xml_add_int(dev, F_STONITH_DEVICE_REQUIRED, 1);
-        }
-        if (action_specific_timeout) {
-            crm_xml_add_int(dev, F_STONITH_ACTION_TIMEOUT, action_specific_timeout);
-        }
 
-        delay_max = get_action_delay_max(device, query->action);
-        if (delay_max > 0) {
-            crm_xml_add_int(dev, F_STONITH_DELAY_MAX, delay_max / 1000);
+        /* If the originating stonithd wants to reboot the node, and we have a
+         * capable device that doesn't support "reboot", remap to "off" instead.
+         */
+        if (is_not_set(device->flags, st_device_supports_reboot)
+            && safe_str_eq(query->action, "reboot")) {
+            crm_trace("%s doesn't support reboot, using values for off instead",
+                      device->id);
+            action = "off";
         }
 
+        /* Add action-specific values if available */
+        add_action_specific_attributes(dev, action, device);
+
         if (query->target == NULL) {
             xmlNode *attrs = create_xml_node(dev, XML_TAG_ATTRS);
 
@@ -1481,7 +1562,7 @@ stonith_query_capable_device_cb(GList * devices, void *user_data)
     }
 
     if (list != NULL) {
-        crm_trace("Attaching query list output");
+        crm_log_xml_trace(list, "Add query results");
         add_message_xml(query->reply, F_STONITH_CALLDATA, list);
     }
     stonith_send_reply(query->reply, query->call_options, query->remote_peer, query->client_id);

diff --git a/fencing/internal.h b/fencing/internal.h
@@ -129,6 +129,20 @@ typedef struct remote_fencing_op_s {
 
 } remote_fencing_op_t;
 
+/*
+ * Complex fencing requirements are specified via fencing topologies.
+ * A topology consists of levels; each level is a list of fencing devices.
+ * Topologies are stored in a hash table by node name. When a node needs to be
+ * fenced, if it has an entry in the topology table, the levels are tried
+ * sequentially, and the devices in each level are tried sequentially.
+ * Fencing is considered successful as soon as any level succeeds;
+ * a level is considered successful if all its devices succeed.
+ * Essentially, all devices at a given level are "and-ed" and the
+ * levels are "or-ed".
+ *
+ * This structure is used for the topology table entries.
+ * Topology levels start from 1, so levels[0] is unused and always NULL.
+ */
 typedef struct stonith_topology_s {
     char *node;
     GListPtr levels[ST_LEVEL_MAX];