Skip to content

TritonDataCenter/node-mooremachine

Repository files navigation

mooremachine

Short version

This is a framework for organising your async node.js code as Moore Finite State Machines. FSMs can be easier to reason about and debug than implicit state kept in callbacks and objects, leading to more correct code.

Introduction

It’s widely known that if you want to sequence some series of asynchronous actions in node, you should use a library like vasync or async to do so — they let you define a series of callbacks to be run in order, like this:

async.series([
    function (cb) { thing.once('something', cb) },
    function (cb) { anotherthing.get(key, function () { cb(null, blah); }); }
]);

This lets you define sequential actions with asynchronous functions. However, if you need more complex logic within this structure, this becomes rapidly limiting — it is difficult, for example, to create a loop. You have to improvise one by nesting some form of loop within a series call and a second layer of callbacks.

Another problem comes if you want to define multiple ways to return from one of these functions in the async series — e.g. if there is an error and result path that are separate:

    function (cb) {
        thing.once('error', cb);
        thing.once('success', function (result) { cb(null, result); });
    }

While one such additional path is manageable, things quickly become very complex.

Instead, let us think of each of the callbacks in such an async sequence as being states of a finite state machine. With async.series we are limited to defining only edges that progress forwards through the list. If, instead, we could define whatever edges we like, we could construct conditional logic and loops across async boundaries. If we had some way to "gang" the callbacks set up in each state together, they could all be disconnected at state exit and avoid the need for complex logic to deal with out-of-state events.

This library provides a framework for dealing with just such an async finite state machine.

Moore machines

A Moore machine (as opposed to a Mealy machine) is an FSM whose outputs depend solely on the present state, and not any other inputs. They are considered to be a simpler approach than the Mealy machine, and easier to reason about.

In our analogy, of course, our state machine does not have distinct outputs (since really we are using it to run arbitrary code). If we consider the FSM’s "outputs" as the total set of side-effects it has on the program’s state, however, we can interpret a Moore machine as being an FSM where code only runs on the entry to a new state, and all other events can only serve to cause state transitions.

Example

In this example we’ll create an FSM called ThingFSM. It’s a typical network client, which wants to make a TCP connection to something and talk to it. It also wants to delay/backoff and retry on failure.

var mod_mooremachine = require('mooremachine');
var mod_util = require('util');
var mod_net = require('net');
var mod_assert = require('assert');

function ThingFSM() {
    /* Some variables we'll use later. */
    this.tf_sock = undefined;
    this.tf_lastError = undefined;
    /* Go to our initial state, 'stopped'. */
    mod_mooremachine.FSM.call(this, 'stopped');
}
mod_util.inherits(ThingFSM, mod_mooremachine.FSM);

/* This function runs when we enter state 'stopped'. */
ThingFSM.prototype.state_stopped = function (S) {
    /*
     * This event handler will go away automatically if we
     * leave the current state.
     */
    S.on(this, 'startAsserted', function () {
        S.gotoState('connecting');
    });
};

/*
 * A common idiom with mooremachine is to have "signal" functions like
 * this one, which are called by something external to the FSM.
 *
 * They assert about the current state and then emit an event on 'this',
 * which triggers a state transition.
 *
 * This pattern ensures that all possible paths out of a given state
 * remain within the state_x() function and can be easily enumerated.
 */
ThingFSM.prototype.start = function () {
    mod_assert.ok(this.isInState('stopped'),
        'ThingFSM must be stopped to call start()');
    this.emit('startAsserted');
};

ThingFSM.prototype.state_connecting = function (S) {
    var self = this;
    /*
     * In this state, we take an action upon state entry (we try to
     * open a new socket connection).
     */
    this.tf_sock = mod_net.connect(...);
    /*
     * And we have two possible ways out of this state. Whichever
     * one happens first will automatically tear down the event
     * handlers for the others, making sure we don't leak any
     * handlers or have unexpected transitions later.
     */
    S.on(this.tf_sock, 'connect', function () {
        S.gotoState('connected');
    });
    S.on(this.tf_sock, 'error', function (err) {
        /*
         * Stash the error so we can do something with it later.
         * It's generally fine to store things on 'self' inside an
         * event handler like this, but *actions* should only be taken
         * upon state entry.
         */
        self.tf_lastError = err;
        S.gotoState('error');
    });
};

ThingFSM.prototype.state_error = function (S) {
    var self = this;
    /* Take action: destroy the socket. */
    if (this.tf_sock !== undefined)
        this.tf_sock.destroy();
    this.tf_sock = undefined;
    /* Print an error, do something, check # of retries... */

    /* Retry the connection in 5 seconds */
    S.timeout(5000, function () {
        S.gotoState('connecting');
    });
};

ThingFSM.prototype.state_connected = function (S) {
    /* ... */
};

API documentation

FSM

Inheriting from the FSM class

Implementations of a state machine should inherit from mod_mooremachine.FSM, using mod_util.inherits. The only compulsory methods that the subprototype must implement are the state callbacks.

mod_mooremachine.FSM(initialState)

Constructor. Must be called by the constructor of the subprototype.

Parameters:

  • initialState: String, name of the initial state the FSM will enter at startup

FSM#state_name(stateHandle)

State entry functions. These run exactly once, at entry to the new state. They should take any actions associated with the state and set up any callbacks that can cause transition out of it.

The stateHandle argument is a handle giving access to functions that should be used to set up events that can lead to a state transition. It provides replacements for EventEmitter#on, setTimeout, and other mechanisms for async event handling, which are automatically torn down as soon as the FSM leaves its current state. This prevents erroneous state transitions from a dangling callback left behind by a previous state.

It is permissible to call stateHandle.gotoState() immediately within the state_ function.

Caution should be used when emitting events or making synchronous calls within a state_ function — if it is possible for the handler of the event or callee to call back into the FSM or emit an event itself that may cause the FSM to transition, then the results of this occurring synchronously within the state entry function may be undesirable. It is highly recommended to emit any events within a setImmediate() callback.

Parameters:

  • stateHandle, an Object, instance of mod_mooremachine.FSMStateHandle

FSM#allStateEvent(name)

Adds an "all-state event". Should be called in the constructor for an FSM subclass. Any registered all-state event must have a handler registered on it after any state transition. This allows you to enforce that particular events must be handled in every state of the FSM.

Parameters:

  • name: String, name of the event

FSM#getState()

Returns a String, full current state of the FSM (including sub-state).

FSM#isInState(state)

Tests whether the FSM is in the given state, or any sub-state of it.

Parameters:

  • state: String, state to test for

Returns a Boolean.

Events

FSM derived subclasses provide one EventEmitter event: 'stateChanged'. This event fires after every state transition, and has a single argument (a String, the name of the new state).

It is important to note that 'stateChanged' always fires on the next tick after the actual transition has occurred. There is no guarantee that the FSM is still in the state you received an event for.

The #isInState() method is useful to check the current state after you have received a notification.

The reason why 'stateChanged' emission is not immediate is so that interacting FSMs cannot re-enter each others' state transition functions, making it impossible to enforce post-conditions on the transition (e.g. the checking of allStateEvents).


FSM state handles

FSMStateHandle#gotoState(state)

Transitions the FSM into the given new state. Can only be called once per state handle.

Parameters:

  • state: a String, name of state to transition into

FSMStateHandle#gotoStateOn(emitter, event, state)

Transitions the FSM into the given new state when emitter emits event. The registered callback will be removed if the FSM moves out of the current state. This is convenient shorthand for writing:

S.on(emitter, event, function () {
    S.gotoState(state);
});

FSMStateHandle#gotoStateTimeout(timeoutMs, state)

Transitions the FSM into the given new state after timeoutMs milliseconds have elapsed. The timer is cleared if the FSM moves out of the current state. This is convenient shorthand for writing:

S.timeout(timeoutMs, function () {
    S.gotoState(state);
});

FSMStateHandle#on(emitter, event, cb)

Works like EventEmitter#on: equivalent to emitter.on(event, cb) but registers the callback for removal as soon as the FSM moves out of the current state.

FSMStateHandle#immediate(cb)

Equivalent to setImmediate(cb), but registers the timer for clearing as soon as the FSM moves out of the current state.

FSMStateHandle#timeout(timeoutMs, cb)

Equivalent to setTimeout(cb, timeoutMs), but registers the timer for clearing as soon as the FSM moves out of the current state.

Returns the timer handle.

FSMStateHandle#callback(cb)

Wraps an arbitrary callback function in such a way that calling it once the FSM has left the current state is a no-op.

It’s recommended to try to avoid using this if you can (see Signal functions below for a possible alternative), but sometimes it is necessary.

Parameters:

  • cb: a Function

Returns a Function that takes the same arguments as cb.

FSMStateHandle#interval(intervalMs, cb)

Equivalent to setInterval(cb, intervalMs), but registers the timer for clearing as soon as the FSM moves out of the current state.

Returns the timer handle.

FSMStateHandle#validTransitions(possibleStates)

Should be called from a state entry function. Sets the list of valid transitions that are possible out of the current state. Any attempt to transition the FSM out of the current state to a state not on this list (using gotoState()) will throw an error.

Calling validTransitions more than once on the same state handle is an error.

Parameters:

  • possibleStates: Array of String, names of valid states


Signal functions

One of the key goals of the mooremachine framework is to keep all possible transitions out of a state together inside the state entry function.

This allows analysis and reasoning about the FSM’s movement between states without having to refer to many parts of the code at once, reducing the likelihood that possible transitions can go unnoticed.

When FSMs must receive input from outside in order to determine where to transition to next, the standard method is to do so via event emitter events.

Sometimes, it is desirable for an external component to instead call a method on the FSM to signal it in some way, instead of emitting an event. This can be useful to loosen coupling between the components, as well.

When this is desired, the recommended pattern is to use a "signal function":

ThingFSM.prototype.state_stopped = function (S) {
    S.on(this, 'startAsserted', function () {
        S.gotoState('connecting');
    });
};

ThingFSM.prototype.start = function () {
    mod_assert.ok(this.isInState('stopped'),
        'ThingFSM must be stopped to call start()');
    this.emit('startAsserted');
};

The two key components of a signal function are:

  • It emits an event on this, typically named verbAsserted.

  • It must either assert about the current state before emitting, or emit only events registered as "all state events".


Sub-states

It is possible to create a "sub-state" with mooremachine FSMs, which "inherits from" its parent state. For example:

ThingFSM.prototype.state_connected = function (S) {
    S.on(this.tf_sock, 'close', function () {
        S.gotoState('closed');
    });
    if (workAvailable)
        S.gotoState('connected.busy');
    else
        S.gotoState('connected.idle');
};

ThingFSM.prototype.state_connected.busy = function (S) {
    this.tf_sock.ref();
    /* ... */
    S.on(this.tf_work, 'finished', function () {
        S.gotoState('connected');
    });
};

ThingFSM.prototype.state_connected.idle = function (S) {
    this.tf_sock.unref();
    S.on(this, 'workAvailable', function () {
        S.gotoState('connected.busy');
    });
};

All event handlers that are set up in the 'connected' state entry function are kept when entering 'connected.busy' or 'connected.idle'. When changing from 'connected.busy' to 'connected.idle', the handlers set up in that sub-state are torn down, but those originating from 'connected' are kept.

While in a sub-state of 'connected', fsm.isInState('connected') will continue to evaluate to true. Separate 'stateChanged' events will be emitted for each sub-state entered.

Once a handle is used to transition to an unrelated state (e.g. 'closed' in the example), all handlers are torn down (from both the parent state and sub-state) as usual before entering the new state.


DTrace support

Mooremachine has support for DTrace probes using dtrace-provider (and libusdt). The following probes are provided under the moorefsm$pid provider:

  • create-fsm(char *klass, char *id) — fired at the creation of a new FSM instance. The klass argument contains the string name of the constructor of the FSM sub-class. The id argument contains a short randomly generated string that should be unique to this FSM as long as <~6M instances of this class exist in the program (it consists of 64 random bits, base64-encoded, so about a 1/1M chance of collision at 6M instances).

  • transition-start(char *klass, char *id, char *oldState, char *newState) — fired at the beginning of an FSM transitioning to a new state.

  • transition-end(char *klass, char *id, char *oldState, char *newState) — fired at the end of an FSM transitioning to a new state.

For example:

dtrace -Zc 'node thingfsm.js' -n '
    moorefsm$target:::transition-start
    /copyinstr(arg0) == "ThingFSM"/
    {
        printf("%s => %s", copyinstr(arg2), copyinstr(arg3));
    }'

When used on the ThingFSM above might output:

CPU     ID                    FUNCTION:NAME
  4   8216 transition-start:transition-start undefined => stopped
  4   8216 transition-start:transition-start stopped => connecting
  4   8216 transition-start:transition-start connecting => error

This will list all the transitions of ThingFSM instances.

Another example (as a D-script file):

uint64_t timeIn[string];

moorefsm$target:::transition-start
/copyinstr(arg0) == "SocketMgrFSM" && copyinstr(arg2) != "undefined"/
{
    this->id = copyinstr(arg1);
    this->state = copyinstr(arg2);
    this->entryTime = timeIn[this->id];
    this->exitTime = timestamp;
    this->time = (this->exitTime - this->entryTime) / 1000000;
    @timeInState[this->state] = quantize(this->time);
}

moorefsm$target:::transition-end
/copyinstr(arg0) == "SocketMgrFSM"/
{
    this->id = copyinstr(arg1);
    timeIn[this->id] = timestamp;
}

This reports on the number of milliseconds spent in each state by all SocketMgrFSM instances in the process.

The output from this could look like:

$ dtrace -Zc 'node test.js' -s script.d
...
  error
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@                     1
               1 |                                         0
               2 |@@@@@@@@@@@@@@@@@@@@                     1
               4 |                                         0

  backoff
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@                     1
               1 |                                         0
               2 |                                         0
               4 |                                         0
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@@@@@@@@@@@@@@@@@@@                     1
             128 |                                         0

  connected
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@                              2
               8 |@@@@@@                                   1
              16 |                                         0
              32 |                                         0
              64 |                                         0
             128 |                                         0
             256 |@@@@@@                                   1
             512 |@@@@@@@@@@@@@@@@@                        3
            1024 |                                         0

  connecting
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@                       4
               1 |@@@@                                     1
               2 |                                         0
               4 |                                         0
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@@@                                     1
             128 |                                         0
             256 |@@@@                                     1
             512 |@@@@                                     1
            1024 |@@@@                                     1
            2048 |                                         0

It’s generally safe enough to use only the id of the FSM as a key in an associative array or aggregation in DTrace, even when tracing multiple processes. This only becomes a problem if you expect to have more than a few million FSMs running at the same time on a system (in which case you can scope it by pid and class as well as key).