Permalink
Commits on Sep 10, 2014
  1. Rename JobsForeman to WorkUnitsDispatcher

    As pointed out by @knowtheory: the JobsForeman isn't really
    a Foreman, nor does it have anything to do with Jobs.  It deals
    with Work Units exclusively.
    
    There are only two hard things in Computer Science:
    cache invalidation and naming things. -- Phil Karlton
    nathanstitt committed Sep 10, 2014
  2. JobsForeman class to distribute units to workers

    Handles distributing work_units to ndoes, either
    periodically every few seconds, or in response
    to a signal being sent via the "distribute!" method
    nathanstitt committed Sep 10, 2014
  3. Event-driven distribution with timer backup

    Trigger job distribution immediately when an
    event occurs instead of relying on the server
    to perform it every X seconds.
    
    The distribution should still occur periodically
    even if no events occur so it can distribute jobs
    that have been triggered from external sources.
    nathanstitt committed Sep 10, 2014
Commits on Sep 9, 2014
  1. Prepend current timestamp to logged output

    Intended to allow meaningful analysis of logfiles
    to figure out which types of jobs are taking an
    abnormally long time.
    nathanstitt committed Sep 9, 2014
Commits on Aug 5, 2014
  1. Add beta to version

    nathanstitt committed Aug 5, 2014
  2. Notify on any exceptions

    nathanstitt committed Aug 5, 2014
  3. Distribute jobs periodically in background thread

    When the server is heavily loaded, the distribute task gets called
    too often, leading to DB connection exhaustion and too much
    network chatter.
    nathanstitt committed Aug 5, 2014
Commits on Jun 26, 2014
  1. Consolidate VERSION in one location

    This way it can't get out of sync.
    nathanstitt committed Jun 26, 2014
  2. Set version to 0 if schema table query fails

    Fixes issue #45.
    
    Ideally we'd figure out if the query failed because the table didn't
    exist or because the connection was bad.
    
    that approach isn't feasible hear since CloudCrowd is database agnostic
    and each DB adapter throws a different exception.
    
    This will at least give a somewhat sane error message rather than
    blowing up like we currently do.
    nathanstitt committed Jun 26, 2014
Commits on Jun 5, 2014
  1. Use ActiveRecord 3.x compatible syntax

    find_or_create_by was introduced in version 4.  Reported in #44
    nathanstitt committed Jun 5, 2014
  2. Wait for thread to fire callback on test.

    This was intermittently failing because the thread might not have fired the callback stub by the time we tested.
    nathanstitt committed Jun 5, 2014
  3. Use Thread.new vs CloudCrowd.defer in node

    Worker nodes do not require DB access.  Since the defer call wraps a connection pool it's usage was causing excessive
    connections and was causing errors if the node's configuration didn't have a database setup.
    
    Reported in issue #44
    nathanstitt committed Jun 5, 2014
  4. A testing Gemfile with ActiveRecord locked to 3.2

    Can be used as: BUNDLE_GEMFILE=./Gemfile-ar32 bundle exec rake test
    nathanstitt committed Jun 5, 2014
Commits on May 6, 2014
  1. derp

    knowtheory committed May 6, 2014
  2. Upgrade to minitest.

    knowtheory committed May 6, 2014
  3. Typo

    knowtheory committed May 6, 2014
  4. update gitignore.

    knowtheory committed May 6, 2014
Commits on Apr 24, 2014
  1. Log the exception received so it's easier to debug

    Otherwise nodes just seem to mysteriously vanish
    nathanstitt committed Apr 24, 2014
  2. Don't distribute tasks immediately upon check-in

    Oftentimes a node isn't prepared to immediately receive a new connection once it checks in,
    causing a  Errno::ECONNRESET exception.
    nathanstitt committed Apr 24, 2014
Commits on Apr 23, 2014
  1. Use a back-off timeout for periodic node check-ins

    This way the nodes won't hammer the server if it's too overloaded to respond
    nathanstitt committed Apr 23, 2014
  2. WorkUnit.distribute_to_nodes inside defer block

    This way the server will complete the request as soon as possible before
    it takes the time to parcel out work to the nodes.
    nathanstitt committed Apr 23, 2014
  3. require 'pathname'

    Even though it's part of the std lib, it's not included by default.  It's lack was causing the "crowd" command to be unusable.
    nathanstitt committed Apr 23, 2014
  4. Rescue RequestTimeout before other exceptions

    Since RestClient::RequestTimeout inherits from RestClient::RequestFailed it was being caught by the later rescue block
    and re-raised.  This caused the server to be unable to process any new work if one of the first nodes in the list ever
    went away.
    
    When they did so, the server would attempt to distribute work to them, it would encounter a  RestClient::RequestTimeout,
    it would re-raise it and then the WorkUnit#distribute_to_nodes would never complete.
    nathanstitt committed Apr 23, 2014
Commits on Apr 17, 2014
  1. Lets cut a release.

    knowtheory committed Apr 17, 2014
  2. Lets call it a beta.

    knowtheory committed Apr 17, 2014
  3. New pre.

    knowtheory committed Apr 7, 2014
  4. Fix exploding nodes.

    knowtheory committed Apr 7, 2014
  5. Tear out right_aws

    knowtheory committed Apr 7, 2014
Commits on Apr 14, 2014
  1. Remove after_block for clearing active_connections

    The task's already performed by ActiveRecord::ConnectionAdapters::ConnectionManagement
    nathanstitt committed Apr 14, 2014
  2. Fix threading with ActiveRecord connection pool

    Add a CrowdCrowd.defer method that sets up a new thread and ensures that
    ActiveRecord connections are returned to the connection pool upon exit
    nathanstitt committed Apr 14, 2014
Commits on Apr 6, 2014
Commits on Apr 5, 2014