Skip to content

Application state management with triggers that span both urgent and non-urgent states to account for more graceful recoveries

Notifications You must be signed in to change notification settings

asilvas/happy-feet

Repository files navigation

Happy Feet

Application state management with triggers that span both urgent and non-urgent states to account for more graceful and staggered recoveries.

State Management

Sometimes healthy isn't so straight forward. Intermediate states can help fill the void.

const happy = require('happy-feet')();

// by default, happy.state === happy.STATE.HAPPY

// sometimes apps want to differentiate between unhealthy, and not-yet-healthy...
happy.state = happy.STATE.STARTING;

// app is now running
happy.state = happy.STATE.HAPPY;

// something went wrong, but we're still operational.
// eventual escalation to UNHAPPY
happy.state = happy.STATE.WARN;

// other times, maybe you just want your own custom state -- who are we to judge?
happy.state = 'BLACK_HOLE';

// DANGER WILL ROBINSON!
happy.state = happy.STATE.UNHAPPY;

// once state set to UNHAPPY it is irreversible, future states will be ignored

Usage

The core functionality of happy-feet lies in setting the state property of your happy-feet instance. This state may be set automatically by happy-feet as it monitors your process or set by your application. The value of this state can be automatically reflected at a web service URL via one of the provided middlewares. This can be used to signal to supervising processes that are monitoring this health indicator when they need to take corrective action.

You may also decide to manually use the happy-feet API to roll your own state management logic:

const happy = require('happy-feet')({ /* optional options */ });

// happy.state === happy.STATE.HAPPY by default

if (happy.state !== happy.STATE.HAPPY) {
  // do something about it
}

There are three built-in values that have a special meaning to happy-feet:

State Meaning
happy.STATE.HAPPY This indicates that your service is in a healthy state
happy.STATE.WARN This means your service is having problems. The health check page will reflect an OK status, but eventually the state will automatically transition to UNHAPPY.
happy.STATE.UNHAPPY This signals that your service instance should be considered ready to be terminated/restarted due to being in a bad state.

You can also use custom states if you're rolling your own health status implementation.

Automatic state change triggers include soft and hard limits. Soft limits trigger a warning state (a temporary successful state) indicating the eventual need to be restarted, while hard limits trigger a state of immediate urgency.

The automatic transition from the WARN state to UNHAPPY is done after a random amount of time, which is configurable with the escalationSoftLimitMin and escalationSoftLimitMax settings. The purpose of this randomness is to avoid each member of a service cluster being taken down at the same time in case of full cluster impacting faults.

Here's a full list of configuration options for happy-feet:

Option Type Default Info
escalationSoftLimitMin number 60 Minimum time (in seconds) before a WARN state may be escalated to an UNHAPPY state.
escalationSoftLimitMax number 600 Maximum time (in seconds) before a WARN state may be escalated to an UNHAPPY state.
uncaughtExceptionSoftLimit number 1 Number of uncaught exceptions before WARN state.
uncaughtExceptionHardLimit number undefined Number of uncaught exceptions before UNHAPPY state. Disabled by default.
unhandledRejectionSoftLimit number undefined Number of unhandled rejections before WARN state. Disabled by default.
unhandledRejectionHardLimit number undefined Number of unhandled rejections before UNHAPPY state. Disabled by default.
rssSoftLimit number undefined Memory Resident Set Size (in bytes) before WARN state. Disabled by default.
rssHardLimit number undefined Memory Resident Set Size (in bytes) before UNHAPPY state. Disabled by default.
heapSoftLimit number undefined Total heap size (in bytes) before WARN state. Disabled by default.
heapHardLimit number undefined Total heap size (in bytes) before UNHAPPY state. Disabled by default.
eventLoopSoftLimit number undefined Event Loop delay (in ms) before WARN state. Disabled by default. Recommended value of 150 or higher.
eventLoopHardLimit number undefined Event Loop delay (in ms) before UNHAPPY state. Disabled by default. Recommended value of 500 or higher.
timeLimitMin number undefined Minimum time (in seconds) before UNHAPPY state. Disabled by default. Useful for periodic application restarts. Both timeLimitMin and timeLimitMax must be set to use this feature.
timeLimitMax number undefined Maximum time (in seconds) before UNHAPPY state. Disabled by default. Useful for periodic application restarts. Both timeLimitMin and timeLimitMax must be set to use this feature.
logger { warn,error } console Logging interface to use when state changes occur. Defaults to use console.
gracePeriod number 300 (5 mins) The time (in seconds) before any thresholds can trigger an unhealthy state change to alleviate startup pains
logOnUnhappy boolean true If enabled, all checks for state will be logged if NOT HAPPY to help troubleshoot state changes

A happy-feet instance has this interface:

Property Type Info
state string Get/set the current state of your process.
updateState(state, reason, code) function Calling this does the same thing as setting the state property, except it allows you to log a reason for the change. The code argument is there to enable identifying the reason for transitions programmatically (see events below).
destroy() function Use this to free the instance.

Additionally, each happy-feet instance is an EventEmitter. It emits a change event with the following parameters whenever its state changes:

Parameter Type Info
state string The new state
reason string The textual description of the change
code string Can be any value for custom implementations or manual if the state was changed through assignment. For automatic transitions can be one of the following: uncaughtExceptions, unhandledRejections, memory, eventLoop, or escalation

Connect Usage

If you've already got an (Connect) API, attach a handler like so:

const happyConnect = require('happy-feet/connect');
const handler = happyConnect({ /* options */ }, { /* optional happy options */ });

app.use(handler);

// Manually change the state
handler.happy.state = handler.happy.STATE.WARN; 
Option Type Default Info
url string "/_health" Route to return health status. Defaults to common Kubernetes healthcheck route.
method string "GET" Method required to return health status.
errorStatus number 500 Status code returned if in UNHAPPY state.
status object {} A collection of custom overrides for status responses based on individual states.
status[STATE].statusCode number `200 500`
status[STATE].body string ${STATE} Body message returned for the given state. By default ${STATE} will be returned verbatim.
status[STATE].contentType string text/plain Content type to respond with.
Return Property Type Info
happy Happy Instance of Happy.

Express Usage

If you've already got an (Express) API, attach a handler like so:

const happyExpress = require('happy-feet/express');

const handler = happyExpress({ /* options */ }, { /* optional happy options */ });
app.get('/_health', handler);

// Manually change the state
handler.happy.state = handler.happy.STATE.WARN; 
Option Type Default Info
errorStatus number 500 Status code returned if in UNHAPPY state.
status object {} A collection of custom overrides for status responses based on individual states.
status[STATE].statusCode number `200 500`
status[STATE].body string ${STATE} Body message returned for the given state. By default ${STATE} will be returned verbatim.
status[STATE].contentType string text/plain Content type to respond with.
Return Property Type Info
happy Happy Instance of Happy.

API Usage

Or if your service does not expose an API, you can use this helper to expose your healthcheck for you.

const happyApi = require('happy-feet/api');

const handler = happyApi({ /* options */ }, { /* optional happy options */ });
Option Type Default Info
port 'number' 80 HTTP port to bind to.
url string "/_health" Route to return health status. Defaults to common Kubernetes healthcheck route.
method string "GET" Method required to return health status.
errorStatus number 500 Status code returned if in UNHAPPY state.
status object {} A collection of custom overrides for status responses based on individual states.
status[STATE].statusCode number `200 500`
status[STATE].body string ${STATE} Body message returned for the given state. By default ${STATE} will be returned verbatim.
status[STATE].contentType string text/plan Content type to responde with.
Return Property Type Info
happy Happy Instance of Happy.
server http.Server Instance of http.Server.

Advanced Healthchecks

The main advantage of Advanced Healthchecks over the default escalation threshold is that a centralized system can coordinate restarts in a more graceful and controlled manner without impact to availability.

If your system for monitoring healthchecks is capable of discerning between catastrophic (remove from LB immediately) and unhealthy (some undesirable event or threshold, operational but eventually should be restarted), happy.STATE.WARN (HTTP statusMessage of WARN) can be leveraged to gracefully handle rolling restarts in whatever fashion deemed acceptable. By default, WARN states will be automatically escalated to UNHAPPY after a period of time to avoid vulnerabilities resulting in mass concurrent crashes and ultimately impact to customers and availability of your services.

About

Application state management with triggers that span both urgent and non-urgent states to account for more graceful recoveries

Resources

Stars

Watchers

Forks

Packages

No packages published