Pools

Hinrik Örn Sigurðsson edited this page Sep 12, 2013 · 12 revisions

Celluloid provides a generalized pool mechanism to all classes that include Celluloid via the pool class method. Let's say we have the following class that includes Celluloid:

class MyWorker
  include Celluloid

  def add_one(number)
    # roflscale computation goes here
    number + 1
  end
end

To create a pool of workers, call the pool class method on your worker class where you would otherwise call the new method:

pool = MyWorker.pool

The pool variable now holds a proxy object which will delegate incoming method calls, async calls, and futures to a pool of MyWorker cells. By default, the pool method will make one MyWorker cell for each CPU core available in the system. This means you can fire whatever work you want at the pool and it will automatically parallelize across all available CPU cores (see notes on GIL below).

A pool also works with the Registry. It can be registered just like a normal actor.

Celluloid::Actor[:a_pool] = MyWorker.pool
i_am_two = Celluloid::Actor[:a_pool].add_one(1)

The pool method also accepts options:

pool = MyWorker.pool(size: 100, args: [:foo, :bar, :baz])

The available options are:

  • size: number of workers in the pool (minimum 2). Defaults to Celluloid.cores (number of CPU cores in the system) or 2 on single-core systems
  • args: arguments to pass along to MyWorker.new when creating a worker

To execute a method within the pool, use:

i_am_three = pool.add_one(2)

The pool will automatically check out and delegate the requested method to a worker.

Using a synchronous method call is fine if you have many actors accessing the pool concurrently. However, if you would like to use the worker pool to execute work in parallel, you can't use synchronous calls, as by design they block until they complete.

To load the pool up with jobs to do, we must use futures or async calls.

Futures can be used to retrieve the value of a given method after it's been executed:

(0..10).to_a.map { |n| pool.future.add_one(n) }

Async calls can be used for "fire and forget" work processing:

(0..10).to_a.each { |n| pool.async.add_one(n) }

Gotcha: Don't make pools inside workers!

Never ever do anything like this:

class MyWorker
  include Celluloid

  def do_something
    # Making a pool inside a pool!
    pool = MyWorker.pool
    pool.do_something
  end
end

pool = MyWorker.pool
pool.do_something

Using MyWorker.pool within MyWorker will result in an unbounded explosion of worker threads.

Got GIL?

If you are using JRuby or Rubinius, Celluloid will distribute work being done in Ruby code across multiple CPU cores.

However, if you are using MRI/YARV, this isn't the case, because this interpreter has a global interpreter lock which prevents parallel execution of Ruby threads. This is fine if you are doing parallel I/O, but it is not the case if you are trying to do parallel computation.

Fault tolerance

Unhandled exceptions within a worker will crash a worker, and the exception will also be raised in the caller. However, you don't need to worry about the worker crashing. The pool will automatically start new workers to replace ones that crash. This means that things like stale persistent network connections can be tolerated when they cause crashes in workers, because new workers will make new connections when they are spawned to replace crashed workers.

Shutdown

Pools are automatically terminated when they are garbage collected. You don't need to explicitly cleanup.