Skip to content
kendru edited this page Sep 29, 2014 · 2 revisions

Note that the current version is a standalone set-up and does not include clustering functionality.

Design Goals

Tourbillon is designed to be a highly-available scheduling and workflow service. It supports scheduling tasks with a 1-second resolution and must not be prone to outages for any significant length of time. The system is designed to run as a cluster to guarantee availability and ensure some level of partition tolerance.

Concepts

Workflow

An entity that provides the template for creating any number of concrete Jobs. A client should be able to create a relatively small number of workflows and many jobs from each. Defines the various states, transitions, and default subscribers.

Job

A concrete instance of a workflow that holds its own state and manages its own subscribers. While jobs are built from a workflow as a template, they can add or remove subscribers from what the original workflow specified.

Event

An action that can trigger a state change on a workflow and may be observed by a subscriber. Most events fire immediately and only fire once, but each event can be executed on a delay and can also execute regularly at intervals. While events can be used in conjunction with jobs, they can also appear to function as stand-alone entities that enable scheduled and recurring events. We use the term "appear" because internally, a one-state job is created for scheduled and recurring events.

While clients interact directly with scheduled and recurring events, they are an implementation detail for jobs. A number of mechanisms could be used behind the scenes to drive a job's finite state machine, but we are able to reduce the number of concepts and re-use code by utilizing events in our implementation of jobs.

Subscriber

A subscriber is something that listens for a specific event. It is managed by a job and can be attached to or removed from an event at any point. There are different types of subscribers that perform different types of actions, such as send an email or call a webhook. For a complete list of subscriber types, see Subscribers.

Components

Workers

Workers are processes that perform actions subscriber actions, such as sending emails, calling HTTP endpoints, etc.

API servers

API servers accept incoming HTTP requests and perform CRUD actions for Workflows, Jobs, Events, and Subscribers. They do not perform any actual work tasks.

Clustering

Leader Election

When a worker joins a cluster, it looks to see if a master is registered. If not, it elects itself the new master and starts polling for events that need to be executed. It also registers itself as a worker with the lowest priority so that it will only perform work if no other workers are connected to the cluster. Any worker that joins a cluster with an existing master server will register itself as a worker.

In order to avoid the master server being a single point of failure, each slave will ping the master at regular intervals, and if the master does not respond, the workers hold an election once at least 1/2 of the workers agree that the master is unresponsive. The details of the election algorithm have not yet been decided. The new master immediately registers itself and begins polling for new messages.

In the event that a master has not been totally disabled and becomes responsive after a new master has been elected, there is a locking mechanism that allows only 1 server to poll for events at a time. When an election is held, this lock is relinquished from the current master and is given to the new master. When the old master realizes that it cannot acquire the lock, it registers itself as a worker.