Bulk processing for workers #82

mauricio · 2014-02-07T07:14:48Z

The rationale behind this is that sometimes you have workers that publish work in units but would be better served by taking the work in batches.

The best example of this is solr/elasticsearch, when integrated, most of the time you will want to work with them publishing every change, but then it's much more efficient to process them in batches given they both offer batch/bulk update interfaces. So, instead of incurring the cost of publishing every document at solr/elasticsearch, you only pay the price once for every 10 documents.

Other than that, there's the fact that SQS itself has a much higher latency and taking in more messages instead of just one is much better.

This is mostly a proof of concept, but I have hacked a solution to do this before and it would be nice to have the queue supporting this out of the box.

This is already part of bkeepers#82 but given it still needs a lot of work, it would be nice to have this in before. If `true` is not returned here it will never see that the services are actually running and won't make all specs run on `unattended_spec`.

jnunemaker · 2014-02-07T18:16:37Z

This one is going to take me a bit to review.

mauricio · 2014-02-08T21:10:42Z

The build will stay broken for a bit until this goes in iain/fake_sqs#9

…lete and batch fail/abort messages

mauricio · 2014-02-15T20:51:46Z

@jnunemaker ping

jnunemaker · 2014-02-17T17:43:48Z

lib/qu/backend/instrumented.rb


 module Qu
  module Backend
    # Internal: Backend that wraps all backends with instrumentation.
    class Instrumented < Base
-      extend Forwardable
+      include Wrapper


I like this fine.

jnunemaker · 2014-02-17T18:35:36Z

Overall, I love the idea of making batching possible. I also love all the hard work you put it. It is far easier for me to critique than to fully work on the solution.

The thing this pull is really missing is good abstraction of batch. A few spots are directly related to functionality in SQS rather than batch in general.

As I think about it more, I almost wonder if the right abstraction is just a BatchPayload and a new backend for any queue that can handle batch operations. For example, here is some psuedo ruby for sqs and batching:

class SQSBatch
  class BatchPayload
    def initialize(payloads)
      @payloads = payloads
    end

    def perform
      payloads.each(&:perform)
    end
  end

  def pop(queue_name = 'default')
    payloads = sqs.receive_messages(limit: 10).map { |message| 
      create_payload(message) 
    }

    BatchPayload.new(payloads)
  end
end

I don't doubt that this will require some thought on how to ensure that batch payloads have the same semantics as payload, but I think that is probably the route I would want to take. It avoids qu itself having any batch knowledge (though it may need a slim down of the stock payload interface) which I like because it means that the abstraction of qu is correct in that it is flexible enough to work with anything that can pop and push.

Any of this make sense or am I just blabbering?

jnunemaker · 2014-02-17T18:37:08Z

There are several places in qu where qu assumes payload to have certain methods, such as queue, klass, etc. Those are what we would need to abstract a bit to make qu work with single or batch payloads.

mauricio · 2014-02-17T21:09:23Z

WOW.

@jnunemaker ok, gonna work on this feedback and thanks for taking a look at this!

It's important to have another set of eyes on top of it, since it creates a, somewhat, different way of processing stuff.

jnunemaker · 2014-02-17T21:11:12Z

WOW.

Hope that is a good wow!

mauricio · 2014-02-17T21:12:25Z

@jnunemaker that's a much comments, very feedback wow, hahaha.

jnunemaker · 2014-02-17T21:59:51Z

Lol.

mauricio · 2014-02-28T18:17:31Z

@jnunemaker the main problem I see with this:

    def perform
      payloads.each(&:perform)
    end

Is that it wouldn't allow me to send the many payloads as a single entity (in the case of batch indexing, for instance).

The job would have to receive the many items so it would be able to process them all at once and not one by one.

jnunemaker · 2014-02-28T18:41:11Z

Is that it wouldn't allow me to send the many payloads as a single entity (in the case of batch indexing, for instance).

Well more what I mean is that payload could be the abstraction. We could even allow people swaping out their own payload mechanism. Then you could have a batch payload that "does the right thing".

mauricio · 2014-02-28T18:46:37Z

Makes sense, gonna dig more into this and build something to look at how it would be used.

mauricio · 2014-02-28T18:48:48Z

This is for another PR, but the current worker implementation could possibly take more than one job at a time if the runner handles it. I need to write more tests for the forking runner to have this in action and flesh out a thread-pool based runner as well, both would allow the main worker to run more than one job at a time.

jnunemaker · 2014-02-28T19:08:37Z

👍 to small iterations to the worker and payload rather than a big PR. Big PR's are a lot harder to digest.

jnunemaker · 2014-06-14T22:55:09Z

Closing this due to staleness. Feel free to pick it back up someday.

mauricio · 2014-06-14T23:43:43Z

Sure, still haven't found a nice way to implement this :(

mauricio mentioned this pull request Feb 7, 2014

Return true on running? check, otherwise it will never run all specs #84

Merged

mauricio added 2 commits February 8, 2014 18:17

First stab at a batch-based worker

116abe2

Rebasing against master and updating implementation to correctly comp…

9c1ebbd

…lete and batch fail/abort messages

jnunemaker self-assigned this Feb 16, 2014

jnunemaker reviewed Feb 17, 2014
View reviewed changes

jnunemaker closed this Jun 14, 2014

mauricio mentioned this pull request Oct 3, 2014

Process payloads in batches #93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk processing for workers #82

Bulk processing for workers #82

mauricio commented Feb 7, 2014

jnunemaker commented Feb 7, 2014

mauricio commented Feb 8, 2014

mauricio commented Feb 15, 2014

jnunemaker Feb 17, 2014

jnunemaker commented Feb 17, 2014

jnunemaker commented Feb 17, 2014

mauricio commented Feb 17, 2014

jnunemaker commented Feb 17, 2014

mauricio commented Feb 17, 2014

jnunemaker commented Feb 17, 2014

mauricio commented Feb 28, 2014

jnunemaker commented Feb 28, 2014

mauricio commented Feb 28, 2014

mauricio commented Feb 28, 2014

jnunemaker commented Feb 28, 2014

jnunemaker commented Jun 14, 2014

mauricio commented Jun 14, 2014

Bulk processing for workers #82

Bulk processing for workers #82

Conversation

mauricio commented Feb 7, 2014

jnunemaker commented Feb 7, 2014

mauricio commented Feb 8, 2014

mauricio commented Feb 15, 2014

jnunemaker Feb 17, 2014

Choose a reason for hiding this comment

jnunemaker commented Feb 17, 2014

jnunemaker commented Feb 17, 2014

mauricio commented Feb 17, 2014

jnunemaker commented Feb 17, 2014

mauricio commented Feb 17, 2014

jnunemaker commented Feb 17, 2014

mauricio commented Feb 28, 2014

jnunemaker commented Feb 28, 2014

mauricio commented Feb 28, 2014

mauricio commented Feb 28, 2014

jnunemaker commented Feb 28, 2014

jnunemaker commented Jun 14, 2014

mauricio commented Jun 14, 2014