Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Rector coordinates parallelized jobs that generate metrics or other data together
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
spec
.gitignore
.travis.yml
Gemfile
README.md
Rakefile
rector.gemspec

README.md

Rector

Build Status

Rector allows coordination of a number of jobs spawned with a mechanism like Resque (though any job manager will do). If you are able to parallelize the processing of a task, yet all these tasks are generating metrics, statistics, or other data that need to be combined, Rector might be for you.

Requirements

  • Ruby >= 1.9.2 (or 1.9 mode of JRuby or Rubinius)

Configuration

Rector currently supports Redis as a backend for job coordination and data storage.

Redis Server

Rector.configure do |c|
  c.redis = Redis.new(:host => "10.0.1.1", :port => 6380)
end

Job Creation (Master)

Rector requires that some process be designated as the "master" process. This is usually the process that is also responsible for spawning the worker jobs.

job = Rector::Job.new

# e.g., processing files in parallel
files.each do |file|
  worker = job.workers.create

  # e.g., using Resque for job management; Rector doesn't really care
  Resque.enqueue(WordCounterJob, worker.id, file)
end

# wait for all the workers to complete
job.join

# get aggregated data from all the jobs
job.data.each do |word, count|
  puts "#{word} was seen #{count} times across all files"
end

job.cleanup

Job Processing (Workers)

class ProcessFileJob
  def self.perform(worker_id, file)
    worker = Rector::Worker.new(worker_id)

    words = File.read(file).split(/\W/)
    words.reject(&:blank?).each do |word|
      worker.data[word] ||= 0 
      worker.data[word]  += 1
    end

    worker.finish
  end
end
Something went wrong with that request. Please try again.