Andromeda is a ultra light weight multicore stream processing framework based on a small data flow DSL in ruby
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is even with moviepilot:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
spec
yard_extensions
.gitignore
.rvmrc
AUTHORS
CHANGELOG.md
Gemfile
Gemfile.lock
LICENSE.txt
README.md
ROADMAP.md
Rakefile
andromeda.gemspec

README.md

andromeda

Andromeda is a light weight framework for complex event processing on multicore architectures. Andromeda users construct networks of plans that are interconnected via endpoint spots, describe how plans are scheduled onto threads, and process data by feeding data events to the resulting structure.

It currently comes without tests but the core architectures is stable (i.e. the concepts have been fleshed out).

Example

Below is an example that writes events to a file and reads them back such that the JSON gets parsed in parallel, to give an idea of what it does:

require 'andromeda'

# Enter scope 'Andromeda' in irb
cb Andromeda

# Write Cmd instances to a log file, nothing fancy here
w = Cmd::Writer.new path: '/tmp/some_file'
w << :open
w << (Cmd::Cmd.new_input :test)
w << (Cmd::Cmd.new_input :test, weight: 40)
w << (Cmd::Cmd.new_input :test, height: 20)
w << :close

r = Cmd::Reader.new path: '/tmp/some_file'
p = Cmd::Parser.new
t = Kit::Tee.new
s = Sync::Bracket.new
# Connect the processing steps (Plans)
s >> r >> p >> t
# Enfore reader to run on a separate single thread
r.guide = Guides.single
# Set multicore processing behaviour to parse Cmd's in parallel
p.guide = Guides.shared_pool
# Set logger to execute in sending thread (i.e. Parser)
t.guide = Guides.local
# Start reading and wait till processing finishes
s << :start
# t will log to a Logger.new(STDERR) by default

There is much more, dig the source, luke!

Note All active development happens on the devel branch, cf. boggle/devel, too.

Installation

gem install andromeda

Requirements

Any ruby that has working atomic and threadpool gems should do.

Effectively, that is rubinius, jruby and mri ruby (if the provided threading of mri ruby is enough for your purpose).

Online Docs

Docs for the latest released gems are to be found in:

http://rubydoc.info/gems/andromeda

Overview

Key Concepts: Spots, Plans, Guides, and Tracks

Andromeda works by sending data as events over a network of interconnected event handler endpoints (called spots). Each spot is implemented in a container object that is called it's plan. A plan can contain multiple spots, either in the form of event handling method spots (on_name methods of the plan) or as attribute spots that point to spots in other plans. Each plan has a default entry spot, a default exit spot, and an optional spot attribute called errors for signaling exceptions. Plans are connected with each other by assigning spot references to a plan's spot attributes.

Event handling is initiated by sending data to a plan's start spot (a special spot that encapsulates the plan's entry spot). Sendin data to a spot is called method spot activation.

During processing, andromeda distinguised between two kinds of state, plan state and tag state. Plan state is the state of the concret plan instance that contains an event handling method spot prior to its activation. Tag state is state that gets passed along between spots as a side-effect of event handling.

Each plan is associated to a guide. First, guides control if and how plan instances are copied (or locked) prior to method spot activation to ensure isolated state access. Secondly, guides assign each method spot activation to a track that describes how and where (on which thread) it actually gets executed.

Out of the box andromeda supports various guides: single thread (per plan or globally shared), thread pool (per plan or globally shared), execution in current thread, and spawning of a new thread per data event.

To sum up, plans are factory objects that describe the instantiation of concrete data processing networks as guided by their associated guide objects and according to the rules of the underlying, executing tracks.

Quick Usage Example

class MyPlan < Andromeda::Plan
  spot_attr :a, :b
  spot_meth :alternative

  def data_key(name, data) ; data end

  def on_enter(key, val)
    exit << val
  end

  def on_alternative(key, val)
    return (a << val) if key == :a
    return (b << val) if key == :b
    signal_error ArgumentError.new("Unknown key: #{data}")
  end
end

p = MyPlan.new
p.guide = Andromeda::Guides.shared_pool
p >> Andromeda::Kit::Tee.new(nick: :red)
p.a = Andromeda::Kit::Tee.new(nick: :green)
p.b = Andromeda::Kit::Tee.new(nick: :blue)

p << :a # logs to :red
p << :b # logs to :red
p.alternative << :a # logs to :green
p.alternative << :b # logs to :blue
p << :c # logs error

Event handling details

Data processing starts when a data object is submitted to a spot. Processing happens mainly in two steps, preprocessing in the sending thread, and actual execution (processing) on the target track.

Preprocessing

During preprocessing, the data object may be mapped, split into a key for routing and an actual data value, it may be used to modify the set of associated tags, and finally get filtered out before sending. Furthermore, the key may be used to switch the target spot name and track label. All of these steps are optional and aim to push preprocessing fucntionality to the sending thread to avoid unneccesary thread context switches.

Please consult the documentation of class Plan to discover the preprocessing methods that are available for overloading in subclasses.

Execution/Processing

Prior to execution, the plan's guide selects a track for spot activation, packs the plan (i.e. copies/freezes/locks it's plan state as necessary), and optionally modifies the associated incoming tags. Finally, the method spot gets activated by calling the spot's method on the packed plan inside the chosen track with the accumulated tags (plan tags and incoming tags).

Tags

Each method activation is associated with a set of tags (a hash) that contains optional parameters. The tags may be modified by the spot method and are passed on whenever a spot method activates another.

Andromeda provides a small set of reserved default tags that should not be overwritten:

  • tags[:name] final target spot name
  • tags[:scope] an Atom::Region instance that is used to wait for completion of processing (cf. below)
  • tags[:label] the label passed on to the guide to select the track for execution (usually identical to name)
  • tags[:mark] used for xor-mark based tracking of event flow

Wating for event handling completion

Waiting for event handling completion may be achieved by utilizing a special wrapper plan (cf. Sync::Bracket). This is implemented using an atomic counter (cf. Atom::Region).

Performance

Andromeda's event handling mechanism is powerful but associated with some performance overhead due to the associated state management. It was written for using it with larger events (i.e. array slices) that user plans iterate over and is not intended for the processing of massively many small events. YMMV.

Correctness

Andromeda provides guides to ensure that state is only accessed by a single thread or that it's state is locked apropriately otherwise. However this only works if you assign correct guides to your plans. Please read and understand the documentation of the various available guides to make sure that no unintended concurrent access of plans takes place.

Alternatively, look at the provided plan implementations for example code.

Remarks

Inspiration

Andromeda takes inspiration from several existing approaches / techniques in the area of concurrent programming.

  • actor model: state encapsulation
  • event processing: preprocessing in sender thread, large events
  • libdispatch: abstracting over used queues / thread pool
  • join calculus: Sync::Sync

Status

Alpha at best.