Peas in a #862

Merged
merged 84 commits into from Apr 18, 2013
Commits on Mar 18, 2013
  1. Turn on some debug

    temas committed Mar 18, 2013
  2. Make friendsPump.vpump asynchronous

    kristjan committed Mar 18, 2013
    This is blocking up the whole pipeline.
  3. Fix missing async callback

    kristjan committed Mar 18, 2013
  4. Log the exit from an inject

    temas committed Mar 18, 2013
  5. Actually give us a string

    temas committed Mar 18, 2013
  6. Fix the logic here

    temas committed Mar 18, 2013
  7. Trying to find the stuck bits

    temas committed Mar 18, 2013
  8. Stray code

    temas committed Mar 18, 2013
  9. Simplify firstRun check

    kristjan committed Mar 18, 2013
    This construction is silly and angers lint.
  10. Fix missing cbEach in dMap.pump

    kristjan committed Mar 18, 2013
    Didn't catch all our paths. This was causing a hang during the pipeline.
  11. Fix numbering in pipeline logging

    kristjan committed Mar 18, 2013
    1-based counting was confusing and we were talking about the wrong pumps.. Now
    the `injector` is 0.
  12. isArray... my friend

    temas committed Mar 18, 2013
Commits on Mar 19, 2013
  1. More specific ijod timings

    temas committed Mar 19, 2013
  2. Redo Foursquare checkins synclet

    kristjan committed Mar 19, 2013
    Another recursive monstrosity destroyed.
    
    The strategy here is to start at `now` on the first run and use the earliest
    known checkin time with the API's `beforeTimestamp` to page backwards. Meantime,
    we keep track on the side of the latest checkin we've seen. When we hit either
    the end of history or a checkin we've seen before, we clear the cursor, set the
    "we've seen this" time to the latest known checkin, and start over.
    
    Since it looks more and more like large chunks of data hose the pipeline, I
    modulated page size based on whether it's the first sync or not. At first sync,
    we want to move as fast as we can, so we max out Foursquare's API at 250. Later
    on, it's unlikely you've checked in 100 places since we last synced, so we can
    drop to the smaller page size and save ourselves effort.
  3. Synchronize addData

    temas committed Mar 19, 2013
  4. We don't have an err here

    temas committed Mar 19, 2013
  5. Missed a callback removal

    temas committed Mar 19, 2013
  6. Bit more sync

    temas committed Mar 19, 2013
  7. No ijod events right now

    temas committed Mar 19, 2013
  8. make the facebook personal feed page back in time

    jparkrr committed Mar 18, 2013
    This is done by saving the oldest item on the page and passing it back
    as 'until' on the next iteration. 'Since' is always passed in so we
    page back until the last time we synced, so existing profiles will
    need their configs reset.
Commits on Mar 20, 2013
  1. Merge branch 'master' into peas-in-a

    kristjan committed Mar 20, 2013
    * master:
      More package.json updates
      Updates for package.json
      add optional tumblr_params for posting
      No more logger.verbose
      Add localLevel, logger.vital
      Improvements to logger
      prefix all logger.warn calls with WARNING.
      Don't crash testSynclet when missing a config
      Be super cautious with the pipeline
    
    Conflicts:
    	lib/podClient.js
  2. Update Foursquare checkins test

    kristjan committed Mar 20, 2013
    New URL's getting hit
Commits on Mar 21, 2013
  1. Bail early when things fail

    temas committed Mar 21, 2013
Commits on Mar 22, 2013
  1. remove extra comments

    jparkrr committed Mar 22, 2013
  2. small changes for clarity

    jparkrr committed Mar 22, 2013
Commits on Mar 25, 2013
Commits on Mar 26, 2013
  1. Page through Facebook Page data

    kristjan committed Mar 22, 2013
    This one is rare for users, but if they have a lot of pages, it was recursing.
    The `getAdministeredPages` function felt like an extraneous layer, but as I
    remove the other uses of `getPages`, I suspect there will be a useful pattern to
    consolidate.
  2. Initialize sandboxed configs in testSynclet

    kristjan committed Mar 22, 2013
    This matches what `taskman` does, which is to initialize empty objects in the
    case that any config data is absent. This avoids us having to check or set
    `pi.config` at the beginning of every synclet.
  3. Make Facebook photos page

    kristjan committed Mar 26, 2013
    I didn't make albums page on the assumption that very few people will have
    *that* many of them. We also haven't paged any FQL yet.
    
    I wanted to just update the album's `since` time as we gathered photos, but it
    interacts poorly with `limit`. Basically, `limit` is applied first, so on the
    second call, we get nothing back.
    
    Instead, I parse the `after` out of Facebook's returned paging URL and use that
    on the next call. I wanted to use the whole URL they give back, but it doesn't
    include an `access_token` and this way fit better with existing code like
    `apiUrl`.
  4. Update Facebook synclet tests

    kristjan committed Mar 26, 2013
    Just mashing around URLs for Fakeweb.
    
    I'm checking `album.since` with `_.isNumber` as opposed to just `if
    (album.since)` because set it to `0` on the first run and otherwise it wouldn't
    be included.
  5. Merge branch 'optimize-facebook' into peas-in-a

    kristjan committed Mar 26, 2013
    * optimize-facebook:
      Update Facebook synclet tests
      Make Facebook photos page
      Initialize sandboxed configs in testSynclet
      Page through Facebook Page data
Commits on Mar 27, 2013
  1. Separate pods' external and listening ports

    kristjan committed Mar 27, 2013
    In production, we want the pods' API hosts up on some port (say, 8070), but the
    load balancer litening for HTTPS on 443, so this lets us split those up.
    `listenPort` is the one the server will spin up on, and `port` is the one the
    client uses.
  2. Up number of workers used by profile sync script

    kristjan committed Mar 27, 2013
    We've had good results this high. Probably something we want configurable later.
  3. Make sync-profiles load everything in the DB

    kristjan committed Mar 27, 2013
    Since we're testing pods more holistically now, it's easier to do this than
    build a file to load. We may want to accept a file later to do specific sets,
    and all manner of other options, but for now it's simple.
Commits on Mar 28, 2013
  1. Add backlog reporting script

    kristjan committed Mar 28, 2013
    This asks `taskman` what the current backlog looks like and reports it to
    statsd. If you want to run it a while, put it in a tmux session somewhere.
  2. Make sync-profiles script take a pod number

    kristjan committed Mar 28, 2013
    Now that there are more than one, we need to be able to specify where to kick
    the sync.
Commits on Apr 3, 2013
  1. Catch errors in Facebook photos synclet

    kristjan committed Apr 3, 2013
    When I rewrote this, forgot to watch for incoming errors. I hope it's a real
    error causing `photos` to be null, but just in case I've checked that explicitly
    too.
  2. Fix testSynclet when there's no return data

    kristjan committed Apr 3, 2013
    It was assuming the presence of `data` in the synclet's return value, but that's
    not always there.
    
    The script now also prints out the details of any exceptions that occur, because
    that's a little useful.
  3. Fix infinite requeue of shutterfly photos

    kristjan committed Apr 3, 2013
    Now if the albums synclet never succeeds (maybe thanks to bad auth), we won't
    immediately reenqueue the photos synclet forever. After three tries (30 seconds
    in a clear queue), it'll treat the lack of albums as an error and back off
    appropriately.
  4. Make testSynclet handle no data returned

    kristjan committed Apr 3, 2013
    Thought we would have caught this earlier, but sometimes when there's no data at
    all returned from the synclet, this would bail.
Commits on Apr 8, 2013
  1. Print the final config after testSynclet runs

    kristjan committed Apr 5, 2013
    A lot of the time, this last state has a useful and important difference from
    the paging states, so this output is nice.
  2. Print total entries fetched by testSynclet

    kristjan committed Apr 8, 2013
    Just handy to have at the end there for basic consistency checking.
  3. Don't crash testSynclet when nothing's returned

    kristjan committed Apr 8, 2013
    Synclets don't really have to return anything at all, but `testSynclet` would
    bail looking for `data.config`. Now it's forgiving.
  4. Merge branch 'improve-testSynclet' into peas-in-a

    kristjan committed Apr 8, 2013
    * improve-testSynclet:
      Don't crash testSynclet when nothing's returned
      Print total entries fetched by testSynclet
      Print the final config after testSynclet runs
  5. Split FB likes into separate synclets

    kristjan committed Apr 4, 2013
    Putting them together doesn't really help anything, but it does conflate their
    errors (which I don't think were being properly reported before). Splitting them
    up isolates them for easier management and debugging.
  6. Remove recursion in Facebook url_likes

    kristjan committed Apr 5, 2013
    Now the synclet yields while paging. This requires a little more state
    management as we try to notice URLs we've already seen, but it's not bad.
  7. Remove recursion from Facebook stream_likes

    kristjan committed Apr 8, 2013
    Facebook returns arbitrarily different batches of Likes depending on page size,
    so to lose as few as possible, we're using a larger page size here. If the
    amount of data coming back in the Posts becomes too much, we can cut them out
    https://developers.facebook.com/docs/reference/api/field_expansion/.
  8. Remove recursion from Facebook page_likes

    kristjan committed Apr 8, 2013
    This will be the only likes synclet that pages forward in time unless we decide
    it's important to reverse it. The reason being it's the only source of likes
    that lets us page based on timestamp, and it's easier, more readable, and less
    likely to have bugs if we just use that to run one direction.
  9. Merge branch 'better-like-it' into peas-in-a

    kristjan committed Apr 8, 2013
    * better-like-it:
      Remove recursion from Facebook page_likes
      Remove recursion from Facebook stream_likes
      Remove recursion in Facebook url_likes
      Split FB likes into separate synclets
Commits on Apr 9, 2013
  1. fix merge conflict

    quartzjer committed Apr 9, 2013
Commits on Apr 15, 2013
  1. fix conflict

    quartzjer committed Apr 15, 2013
Commits on Apr 16, 2013
Commits on Apr 17, 2013
  1. need dummy commit

    quartzjer committed Apr 17, 2013
  2. fix merge conflict

    quartzjer committed Apr 17, 2013
Commits on Apr 18, 2013
  1. revert to sync

    quartzjer committed Apr 18, 2013
  2. remove dmap pump

    quartzjer committed Apr 18, 2013
  3. log these

    quartzjer committed Apr 18, 2013