Skip to content
Permalink
Browse files
rv/monitor: Add safe watchdog monitor
The watchdog is an essential building block for the usage of Linux in
safety-critical systems because it allows the system to be monitored from
an external element - the watchdog hardware, acting as a safety-monitor.

A user-space application controls the watchdog device via the watchdog
interface. This application, hereafter safety_app, enables the watchdog
and periodically pets the watchdog upon correct completion of the safety
related processing.

If the safety_app, for any reason, stops pinging the watchdog,
the watchdog hardware can set the system in a fail-safe state. For
example, shutting the system down.

Given the importance of the safety_app / watchdog hardware couple,
the interaction between these software pieces also needs some
sort of monitoring. In other words, "who monitors the monitor?"

The safe watchdog (safe_wtd) RV monitor monitors the interaction between
the safety_app and the watchdog device, enforcing the correct sequence of
events that leads the system to a safe state.

Furthermore, the safety_app can monitor the RV monitor by collecting the
events generated by the RV monitor itself via tracing interface. In this way,
closing the monitoring loop with the safety_app.

To reach a safe state, the safe_wtd RV monitor requires the
safety_app to:

	- Open the watchdog device
	- Start the watchdog
	- Set a timeout
	- ping at least once

The RV monitor also avoids some undesired actions. For example, to have
other threads to touch the watchdog.

The monitor also has a set of options, enabled via kernel command
line/module options. They are:

	- watchdog_id: the device id to monitor (default 0).
	- dont_stop: once enabled, do not allow the RV monitor to be stopped
		(default off);
	- safe_timeout: define a maximum safe value that an user-space
		application can set as the watchdog timeout
		(default unlimited).
	- check_timeout: After every ping, check if the time left in the
		watchdog is less than or equal to the last timeout set
		for the watchdog. It only works for watchdog devices that
		provide the get_timeleft() function (default off).

For further information, please refer to:
	Documentation/trace/rv/watchdog-monitor.rst

The monitor specification was developed together with Gabriele Paoloni,
in the context of the Linux Foundation Elisa Project.

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
  • Loading branch information
Daniel Bristot de Oliveira authored and intel-lab-lkp committed Feb 14, 2022
1 parent 6e5afc9 commit 4f1503ea00e1acc20ca50288f9493c562133a6e8
Show file tree
Hide file tree
Showing 5 changed files with 480 additions and 0 deletions.
@@ -30,6 +30,15 @@ config RV_MON_WWNR
illustrates the usage of per-task monitor. The model is
broken on purpose: it serves to test reactors.

config RV_MON_SAFE_WTD
tristate "Safety watchdog"
help
Enable safe_wtd, this monitor observes the interaction
between a user-space safety monitor and a watchdog device.

For futher information see:
Documentation/trace/rv/safety-monitor.rst

config RV_REACTORS
bool "Runtime verification reactors"
default y if RV
@@ -3,6 +3,7 @@
obj-$(CONFIG_RV) += rv.o
obj-$(CONFIG_RV_MON_WIP) += monitor_wip/wip.o
obj-$(CONFIG_RV_MON_WWNR) += monitor_wwnr/wwnr.o
obj-$(CONFIG_RV_MON_SAFE_WTD) += monitor_safe_wtd/safe_wtd.o
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
obj-$(CONFIG_RV_REACT_PANIC) += reactor_panic.o
@@ -0,0 +1,84 @@
enum states_safe_wtd {
init = 0,
closed_running,
closed_running_nwo,
nwo,
opened,
opened_nwo,
reopened,
safe,
safe_nwo,
set,
set_nwo,
started,
started_nwo,
stopped,
state_max
};

enum events_safe_wtd {
close = 0,
nowayout,
open,
other_threads,
ping,
set_safe_timeout,
start,
stop,
event_max
};

struct automaton_safe_wtd {
char *state_names[state_max];
char *event_names[event_max];
char function[state_max][event_max];
char initial_state;
char final_states[state_max];
};

struct automaton_safe_wtd automaton_safe_wtd = {
.state_names = {
"init",
"closed_running",
"closed_running_nwo",
"nwo",
"opened",
"opened_nwo",
"reopened",
"safe",
"safe_nwo",
"set",
"set_nwo",
"started",
"started_nwo",
"stopped"
},
.event_names = {
"close",
"nowayout",
"open",
"other_threads",
"ping",
"set_safe_timeout",
"start",
"stop"
},
.function = {
{ -1, nwo, opened, init, -1, -1, -1, -1 },
{ -1, closed_running_nwo, reopened, closed_running, -1, -1, -1, -1 },
{ -1, closed_running_nwo, started_nwo, closed_running_nwo, -1, -1, -1, -1 },
{ -1, nwo, opened_nwo, nwo, -1, -1, -1, -1 },
{ init, -1, -1, -1, -1, -1, started, -1 },
{ nwo, -1, -1, -1, -1, -1, started_nwo, -1 },
{ closed_running, -1, -1, -1, -1, set, -1, opened },
{ closed_running, -1, -1, -1, safe, -1, -1, stopped },
{ closed_running_nwo, -1, -1, -1, safe_nwo, -1, -1, -1 },
{ -1, -1, -1, -1, safe, -1, -1, -1 },
{ -1, -1, -1, -1, safe_nwo, -1, -1, -1 },
{ closed_running, -1, -1, -1, -1, set, -1, stopped },
{ closed_running_nwo, -1, -1, -1, -1, set_nwo, -1, -1 },
{ init, -1, -1, -1, -1, -1, -1, -1 },
},
.initial_state = init,
.final_states = { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
};

0 comments on commit 4f1503e

Please sign in to comment.