-
-
Notifications
You must be signed in to change notification settings - Fork 273
Fault Tolerance
Building fault-tolerant programs with Celluloid is a bit counterintuitive. This page attempts to explain the prerequisite building blocks you need to understand if you want to truly leverage Celluloid's approach to fault tolerance, then how you wire them together.
For starters, we suggest you acquaint yourself with Linking, Supervisors, and Supervision Groups. These are the foundation of fault tolerant programs. However, these alone are only a part of Celluloid's fault tolerance story. This page will attempt to explain how you use these primitives to build fault tolerant programs, even if you're too lazy to click all those links and read the associated pages.
Many Celluloid programs will start off looking something like this:
require 'celluloid/autostart'
class Itchy
include Celluloid
def bite(thing)
# ...
end
end
class Scratchy
include Celluloid
def initialize
@itchy = Actor[:itchy]
end
def bite(thing)
@itchy.bite thing
end
end
Celluloid::Actor[:itchy] = Itchy.new
Celluloid::Actor[:scratchy] = Scratchy.new
However, there's a problem with this approach: the actor referenced as Actor[:itchy]
can crash. This means when we call Scratchy#bite
, there's a chance a DeadActorError
can occur, especially since we've memoized @itchy
as an instance variable and aren't looking it up using Actor[:itchy]
every time. Fixing these errors in a deterministic way seems hard! How can we write code that is free of race conditions and can prevent these DeadActorErrors
from occurring?
The Celluloid answer: give up on determinism. Let's take a pessimistic view that any part of our program can crash at any time. We need a way to not only tolerate DeadActorError
s, but minimize the number of times they occur in practice. This is the "Let It Crash" philosophy espoused by Erlang.
To handle DeadActorErrors
properly we need to build programs that are resilient to them.
In our previous example we had two actors: Itchy
and Scratchy
. The latter, Scratchy
, depends on the former, Itchy
.
To make our program more fault tolerant, we need to write Scratchy
a little differently than before:
class Scratchy
include Celluloid
def initialize
@itchy = Actor[:itchy]
link @itchy
end
def bite(thing)
@itchy.bite thing
end
end
Next, we need to supervise Itchy
and Scratchy
using Celluloid Supervision Groups:
class MyGroup < Celluloid::SupervisionGroup
supervise Itchy, as: :itchy
supervise Scratchy, as: :scratchy
end
MyGroup.run
We've basically done two things differently from the original example:
-
Scratchy
is now linked directly toItchy
, so that ifItchy
crashes,Scratchy
will die too. This should mostly eliminateDeadActorErrors
that arise because Scratchy has a stale handle to Itchy. Hopefully Scratchy dies before it can even use the stale handle. -
Itchy
andScratchy
are now supervised, so when they crash, they get restarted.Itchy
is listed first in theSupervisionGroup
, beforeScratchy
which depends onItchy
. This way we ensure that they get started up in the correct dependency ordering. It also means when we them shut down,Scratchy
will get shut down first.
To better ensure actors in the same SupervisionGroup
are restarted in the same order, Celluloid should support "restart strategies" ala Erlang and Akka:
http://www.erlang.org/doc/design_principles/sup_princ.html#id71212
Celluloid presently supports only the "one for one" restart strategy, i.e. if a particular actor dies, only that actor is restarted:
However, to ensure that all services that are affected by a crash boot back up correctly, it could also support a "one for all" strategy where the entire SupervisionGroup
is rebooted in the event of the crash of any member of the group:
Always feel free to:
- Visit the
#celluloid
channel on freenode. - Post a bug report or feature request.
- Ask questions on our mailing list.