GitHub - bmizerany/nanite: Self assembling map/reduce agent system

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
bin		bin
config		config
docs		docs
examples		examples
lib		lib
spec		spec
.gitignore		.gitignore
LICENSE		LICENSE
README		README
Rakefile		Rakefile
TODO		TODO
async_rack_front.ru		async_rack_front.ru

Repository files navigation

Nanite: self assembling fabric of ruby daemons

=============================================================

A Nanite system has two types of components. There are nanite agents, these are the
daemons where you implement your code functionality as actors. And then there are
mappers. 

Mappers are the control nodes of the system. There can be any number of mappers, these
typically run inside of your merb or rails app running on the thin webserver(eventmachine is needed)
But you can also run command line mappers with a shell into the system.

Each nanite agent sends a Ping to the mapper exchange every @ping_time seconds. All of
the mappers are subscribed to this exchange and so they all get a copy of the Ping with your 
status update. If the mappers do not get a ping from a certain agent within a timeout @ping_time
the mappers will remove said agent from any dispatch. When the agent comes back online or gets less
busy it will readvertise itself to the mapper exchange therefore adding itself back to the dispatch.
This makes for a very nice self-healing cluster of worker processes and a cluster of front end mapper
processes.

So in your nanite's  you have any number of actor classes. these are like controllers
in rails or merb and this is where you implement your own functionality. These methods
that you 'expose' are advertised to the mappers like so:

Mock#list => /mock/list
Foo#bar   => /foo/bar
etc..

actors look like so:

class Foo < Nanite::Actor
  
  expose :bar
    
  def bar(payload)
    "got payload: #{payload}"
  end
end

Nanite::Dispatcher.register(Foo.new)


Every agent advertises its status every time it pings the mapper cluster.
the default status that is advertised is the load average as a float. This 
is used for the default request dispatching based on least loaded server.

You can change what is advertised as status to anything you want that is 
comparable(<=>) by doing this in your agents init.rb file:

Nanite.status_proc = lambda { MyApp.some_statistic_indicating_load }

This proc will be recalled every @ping_time and sent to the mappers.

This is the 'fitness function' for selcting the least loaded nanite. You can
dream up any scheme of populating this function so the mappers can select the
right nanites based on their status.



==============================================================================
Quick note about security:

nanite security is handled with rabbitmq usernames/passwords as well as vhosts.
but anything in one vhost and talk to anything on the same vhost. So you generally
want one vhost per app space. 

Nanite is a new way of thinking about building cloud ready web applications. Having 
a scalable message queueing backend with all the discovery and dynamic load based
dispatch that Nanite has is a very scalable way to construct web application backends.


==============================================================================
Make sure you have erlang installed

Get amqp and get a rabbitmq server up and running in one terminal
$ sudo gem install eventmachine
$ git clone git://github.com/tmm1/amqp.git
$ cd amqp && rake gem && sudo gem install amqp-<version>.gem

RabbitMQ Setup:
You'll need mercurial http://www.selenic.com/mercurial/wiki/index.cgi/BinaryPackages

Mac OS X
$ cd ~/<your root src dir>
$ mkdir rabbit (this will be <RABBIT> hereunder)
$ hg clone http://hg.rabbitmq.com/rabbitmq-server/
$ hg clone http://hg.rabbitmq.com/rabbitmq-codegen/
$ cd /usr/local/lib/erlang/lib
$ ln -s <RABBIT>/rabbitmq-server rabitmq-server-<version>
* Add <RABBIT>/scripts to your $PATH

TODO: Linux (I'm told it's easier here)

$ cd rabbit/rabbitmq-server
$ make run

==============================================================

Lets run a script to setup 15 agent accounts with a vhost and 
permissions setup properly as well as mapper account.
(rabbitmq broker must be running before you run this script)

$ cd nanite
$ ./bin/rabbitconf

now run a few nanite agents

$ cd nanite/examples/myagent
$ nanite

$ cd nanite/examples/foo
$ nanite


now you will need to run a mapper to make requests from. You can run this stad alone 
to get a shell or you can instantiate a mapper from within your merb/rails app as needed

$ ./bin/nanite-mapper -i 
>> Nanite.request('/mock/list') {|res| p res }

By default this will dispatch to the agent with the lowest reported load average. 

There are a few other selectors as well:
# run this request on *all* agents that expose the /foo/bar Foo#bar actor
>> Nanite.request('/foo/bar', 'hi', :selector => :all) {|res| p res }

# run this request on one random agent that expose the /whatever/hello Whatever#hello actor
>> Nanite.request('/whatever/hello', 42, :selector => :random) {|res| p res }

You can create your own selectors based on arbitrary stuff you put in status from your agents
see mapper.rb for examples of how least_loaded, all and random are implemented.

you can run as many mappers as you want, they will all be hot masters

The calls are asyncronous. meaning the block you pass to Nanite.request is not run
until the response from the agent(s) have returned. So keep that in mind. Should you
need to poll from an ajax web app for results you should have your block stuff the results in 
the database for any web front end to pick up with the next poll:

#merb controller

def kickoffjob(jobtype, payload)
  token = Nanite.request(jobtype, payload, :least_loaded) do |res|
    # remember this block is called async later when the agent responds
    job = PendingJobs.first :ident => res.keys.first
    job.result = res.values.first
    job.done!
  end
  # so this happens before the block
  PendingJobs.create :ident => token
  {:status => 'pending', :token => token}.to_json
end    

def poll(token)
  job = PendingJobs.first :ident => token
  if job.done?
    {:status => 'done',
     :response => job.result,
     :token => token}.to_json 
  else
    {:status => 'pending', :token => token}.to_json
  end    
end


Have fun!


=====================================================================
ROADMAP

1. ** DONE **  Add config option to allow for JSON packets rather then marshaled ruby objects for multi 
language interop.

2. ** DONE ** Add timouts for requests that expect results.

3. Codify more patterns that make sense in this environment.

4. Now that I know where I'm going with this specs are sorely lacking :(

5. profit?