Desaster Job Queueing Manager
C++ C Other
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cmake
desaster-cli
desaster-web
desaster
playground
.gitignore
CMakeLists.txt
DESIGN.md
README.md
autogen.sh
environ

README.md

Desaster Job Queue Manager

desaster is a job queue manager and primarily inspired by resque and it's web frontend resque-web.

Command-line Syntax

desaster, Job Queueing Manager, version 0.1.0 [http://github.com/trapni/desaster/]
Copyright (c) 2012 by Christian Parpart <trapni@gentoo.org>
Licensed under GPL-3 [http://gplv3.fsf.org/]

 usage: desaster [-a BIND_ADDR] [-b BROADCAST_ADDR] [-p PORT] [-k GROUP_KEY]

 -?, -h, --help                 prints this help
 -a, --bind-address=IPADDR      local IP address to bind to [0.0.0.0]
 -b, --broadcast-address=IPADDR remote IP/multicast/broadcast address to announce to [255.255.255.255]
 -p, --port=NUMBER              port number for receiving/sending packets [2691]
 -k, --key=GROUP_KEY            cluster-group shared key [default]
 -s, --standalone               do not broadcast for peering with cluster

To rn desaster in standard cluster-mode, just run without any arguments: desaster. This will auto-peer with other desaster-nodes within the same subnet.

To run desaster in standalone-mode, just run desaster -s.

Software Implementation

Dependencies

UI Requirements

  • queue management
    • queues should be managed via the UI
    • dynamically attach/detach queues from/to workers
    • true and pluggable queueing system (SFQ, HTB, ...)
    • timed job-requeuing blacklist (rejecting every job identity that has been enqueued/processed just recently)
    • list queue entries
      • searchable, sortable, filterable, paginated
      • perform actions on selected jobs
  • worker management:
    • assign jobs to different workers/queues as needed
  • job management:
    • ETA property (r/o) for queued jobs
    • kill/relocate/re-role jobs
  • dashboard:
    • live update feature (button on-top, or even by default)
  • failed queue:
    • searchable, sortable, filterable, paginated, groupable (by originated queues e.g.)
    • automatic retry by default with exponential backoff

Backend Requirements

  • easily scalable by just adding new nodes that will auto-negotiate with the existing cluster.
  • maximum fault tolerance of the designated scheduler master. any worker node can take over the scheduling for the cluster if the current master becomes unreachable.
  • on-demand spawning from within the Rails environment but also be possible to start it dedicated, e.g. via systemd or SysV init script.
  • binary process upgrades
  • zero configuration, at least the (1..n) cluster should be able to run without configuration.
  • worker CPU/memory resource monitoring

desaster-web

desaster-web is the dedicated daemon, possibly written in Ruby/Sinatra, to provide access to the backend scheduler and all worker nodes.

desaster

desaster is the dedicated daemon, written in C++11, to run on every worker node.

This daemon shall be only executed once per host. It can be started as a system service or in some rare cases on demand by (e.g.) your rails application that needs this functionality.

This service will:

  • if this node is the designated scheduler master:
    • it will schedule all tasks and pass them to all workers.
    • it will propagate any scheduler state changes to all other nodes in the cluster.
  • if this node is NOT the designated scheduler master:
    • it will just receive scheduler state changes by the designated master scheduler.
    • forward incoming tasks to the designated master
  • receive incoming tasks by the designated scheduler master perform them locally.
  • start multiple auto-scaling child worker processes as a child process.
    • guards over resource usage of their worker processes (CPU and memory usage).
  • should log worker resource usage and load distribution to allow the web frontend to generate nice looking graphs.

Implementation

Single threaded scheduling system using libev as event machine (timers are implemented via ev::timer).

Modules

Shell

Executes shell commands, forking on demand.

Cron

Executes shell commands on any shell worker node at a given time/period, just like UNIX cronjobs but enqueued and scheduled within the Desaster cluster.

This module rarely depends on the shell-module as it just schedules tasks to be performed by the shell module.

At a later stage, we might also want to schedule ruby method invokations, which should be kept in mind when implementing this module.

Ruby

executes ruby methods, pre-forking and communicating over shared file descriptors (pipes / unnamed sockets) to pass jobs and their response status.

This module should effectively be able to spawn Rack, and thus Rails, applications.

HTTP

For Desaster, an HTTP request is nothing else than a Job to be executed, so this framework just fits the best when it comes to fair load balancing your incoming HTTP requests over a large cluster.

This worker implements receiving HTTP requests, possibly terminating SSL, possibly passing it to the designated master scheduler, to be then scheduled and passed to the actual backend worker.

Speaking of our particular use-case, on a backend node, we might even optimize the communication path a little further to combine it with the ruby glue code to actually handle our Rack/Passenger Rails requests.

API Bindings

You communicate via TCP/IP to the Desaster cluster, however, we at least provide an access API for Ruby (1.8 and 1.9 compatible) and C++.

References