Support for pipelining #10

Merged
merged 18 commits into from Mar 11, 2012

Conversation

Projects
None yet
3 participants
Contributor

arc commented Mar 4, 2012

See also the existing issue #3.

This pull request provides minimal support for pipelining. The API looks like this:

my @responses = $redis->pipeline(sub {
    # many requests here
});
# Now @responses are the gathered responses, including unthrown errors.

The "unthrown errors" bit seems important to me: even if one command yielded an error status, you can still look at the responses to all the other commands.

My measurements are noisier than I'd prefer, but for batches of 1000 requests with status-only replies, I see execution time in pipelined mode about 60% or 70% of the execution time in synchronous mode. Put another way: amortised per-request time on a fairly old Linux machine, running an old, threaded vendor Perl goes from ≥64 µs (with encoding => undef) to ≥41 µs. Combining pipelining with my AUTOLOAD cache code from pull request #9 reduces the minimum time to 36 µs.

There's also an additional commit which slightly improves transaction handling; it's implemented using pipelining's ability to suppress the exceptions which are ordinarily raised for error statuses, so again, this lets you look at the responses to all the commands which did succeed. (This feature is the itch I'm really scratching; from my point of view, the pipelining support is less important, though the speedup is pleasing nonetheless.)

My APIs for both pipelining and transaction execution are to some extent straw men; if you think there's a better approach, I'd love to hear your ideas, and I'd be happy to rework my changes to accommodate them.

Owner

melo commented Mar 5, 2012

Hi,

I mentioned in the previous release notes posted on the Redis ML that the next release would include pipeline so I thank you for tackling the issue.

I'll look at the internals of your implementation and reuse as most as possible, but the API I want to implement doesn't require a pipeline() method nor a special mode.

My current design is this: to enable pipeline, add a coderef callback as the last parameter on any API call.

That is it. If Redis sees the codeRef as the last parameter, it sends the command to the servers, sticks the coderef in a internal queue, and returns. The response parser, when a full response is detected, takes the first item from that queue and calls it with the response. Any exceptions detected can be send in a second argument.

This allows you to mix pipeline and non-pipelined commands. You could:

$redis->incr('i', sub { ... });
$redis->get('a', sub { ... });
$redis->get('b');

This would pipeline the first two, send the third, and then way for the answer to the third command. This would in effect make sure all the previous command responses would complete.

A special $redis->wait_all_responses would allow you to make sure all responses were accounted for.

A side-benefit of this API would be to enable support for MULTI/EXEC easily, given that in those sequences, the answer is also delayed.

What do you think?

Contributor

arc commented Mar 5, 2012

Hi, Pedro. I hadn't realised you were announcing releases on the Redis mailing list — I'll clearly have to subscribe.

Your API proposal sounds good to me. I'll see what I can come up with by way of an implementation, and submit a new set of changes today or tomorrow.

arc added some commits Mar 5, 2012

@arc arc Factor out new __with_reconnect method fb5f5bd
@arc arc Bug fix: apply reconnect logic to KEYS and INFO 4054b4d
@arc arc Bug fix: forbid PING and SHUTDOWN in SUBSCRIBE mode 2c3b57a
@arc arc Refactor: don't conflate subscription commands with standard commands d8a1856
@arc arc Document scalar-context KEYS return
It's tested for in t/01-basic.t, so I assume it's deliberate.
6a61b12
@arc arc Rename __try_reconnect to __throw_reconnect
Since it never tries to do anything, but always throws an exception, the new
name is more descriptive of what it does.
f859ef4
@arc arc Refactor to use __run_cmd where possible 3b1c2e8
@arc arc Remove final argument in __read_response, __read_response_r
There are no remaining callers that use it.
5cc8b55
@arc arc Always call __read_response_r as a method ba03498
@arc arc Pipelining support
Most command methods now take an optional trailing coderef; if a coderef is
supplied, we don't wait for the command's response, but merely schedule the
coderef to be called once the response has been read.
ad922d7
Contributor

arc commented Mar 7, 2012

Hi, Pedro. Sorry that took me a little longer than expected, but I've updated this pull request to use the approach you suggest. (The code from my original pull request is still available as arc@18a81d8 if you want to compare it.)

This new branch builds on my AUTOLOAD-caching branch.

I've included tests, but I haven't yet updated the documentation, because I want to confirm that this looks good to you first. If you like this patch set, I'll add documentation and resubmit.

In this code, QUIT, SHUTDOWN, and PING are always executed synchronously, on the assumption that pipelining them is never going to be useful; but let me know if you disagree, and I'll look at making them pipelinable. For that matter, I'd be happy to tweak any of my changes and resubmit if you'd prefer them to be implemented differently.

Finally, I've also included fixes for a couple of bugs I spotted by inspection while working on this:

  • KEYS and INFO weren't honouring the reconnect setting
  • PING and SHUTDOWN were incorrectly accepted in subscriber mode

jzawodn commented Mar 8, 2012

Just wanted to drop in and say I like the direction this is headed. Thanks for the work on this!

Owner

melo commented Mar 8, 2012

Hey @arc, excellent stuff!

I'm currently migrating one of my sites to a new server, should be done by Friday. Do you mind waiting for the weekend to merge this?

Again, thanks!

Contributor

arc commented Mar 8, 2012

Sure, that's great. I'll work on getting some documentation submitted today or tomorrow.

I'm also making a couple of clarity and performance improvements. This current version has a best-case per-request cost about 15–20 µs (30–50%) longer on my test platform than my original version in arc@18a81d8ea2be8b76de3c9187dae76d9b21553089. That seems to be entirely due to my new __with_reconnect method, which adds a method call and a closure call to the fast path. I'm now looking for a way to remove that additional overhead, without unduly harming maintainability.

arc added some commits Mar 8, 2012

@arc arc Refactor __run_cmd as __queue_cmd
- Takes an additional argument indicating whether nested errors should be
  collected (so callers don't have to manually fix up the relevant data
  structure)

- Renamed to better reflect its purpose
dea5540
@arc arc Add tests for pipelined INFO and KEYS
Since they're the two pipeline-capable commands that have unusual return
values, it seems particularly valuable to test them.
89241f4
@arc arc Factor out new __run_cmd method
Unlike the previous method of the same name, this one actually sends a
request and enqueues a response handler.
8c6f8c8
@arc arc Refactor: inline __queue_cmd in its only caller 9fc61b4
@arc arc Refactor: save one level of closure invocation 9c18872
@arc arc Performance enhancement in non-reconnect mode
If we aren't in reconnect mode, using __with_reconnect adds a method call
and a closure invocation to every request, no matter how trivial.  This
change adds a fast path for directly calling __run_cmd in the three relevant
locations.  In my tests, this saves up to a third of the best-case amortised
per-request execution time.
05a2973
@arc arc Documentation for pipelining 1a9371b
Contributor

arc commented Mar 8, 2012

Hi. These changes add documentation, as promised, and match performance with my original pipelining code (at least to within the precision of my measurements). I think the code's also rather cleaner than the slower version.

Let me know if you'd like anything else from me before you merge.

Thanks.

melo merged commit 157cb35 into PerlRedis:master Mar 11, 2012

Owner

melo commented Mar 11, 2012

Hey @arc, I've merged your code and will release monday (don't want to do it over the weekend, one of my kids is sick so I ended up with less free time than expected).

The code looks very good, I only have one lingering doubt that I want to discuss with you (and with @jzawodn given that he is listening in :) ). Its about the MULTI() API. Right now, it returns the list of responses for each command we sent.

I was wondering if, when in pipeline() mode, if commands inside MULTI/EXEC are called with a callback I expect those answer will not be present in the exec() return value, correct?

The relevant part of the documentation is this: arc/perl-redis@1a9371b#L0R1016

As I said, I have little time right now, monday morning at the office I'll do a quick test to see if it works like that, as it would be my expected behavior, but I wanted to have your opinion.

Please disregard this if that is the actual current behavior :)

Contributor

arc commented Mar 11, 2012

Hi. Sure, releasing on Monday sounds great, and I hope your kid feels better soon.

As for EXEC, here's more detail on how it works. When nothing's pipelined, MULTI/EXEC work the way they do in 1.926:

$redis->multi;              # +OK
$redis->set(foo => 'bar');  # +QUEUED
$redis->get('baz');         # +QUEUED
$redis->exec;               # ['OK', $value_of_baz]

where, in protocol terms, the response to the EXEC looks like this:

*2
+OK
$12
value of baz

This also means that an exception is thrown if there's an error executing any of the commands in the transaction:

$redis->multi;               # +OK
$redis->set(foo => 'bar');   # +QUEUED
$redis->lpush(foo => 'bar'); # +QUEUED
$redis->exec;                # dies when decoding LPUSH reply

That is, the EXEC itself gets a multi-bulk reply looking like this:

*2
+OK
-ERR Operation against a key holding the wrong kind of value

In the current implementation, using pipelining for the individual commands doesn't affect any of that, only (a) whether we wait for the response before returning, and (b) how the replies in the protocol are provided to the caller. For example:

$redis->multi(sub {});             # ('OK', undef)
$redis->set(foo => 'bar', sub {}); # ('QUEUED', undef)
$redis->get('baz', sub {});        # ('QUEUED', undef)
$redis->exec(sub { ... });         # ([  ['OK', undef],
                                   #     [$value_of_baz, undef]  ],
                                   #  undef)

The MULTI and the individual transactional commands never produce an interesting return, just +OK and +QUEUED respectively; so using a no-op callback to trigger pipelining on them is the obvious approach. But the EXEC produces all the replies, so its callback will (almost always) need to look at the data.

The same holds true when part of the transaction produces an error:

$redis->multi(sub {});               # ('OK', undef)
$redis->set(foo => 'bar', sub {});   # ('QUEUED', undef)
$redis->lpush(foo => 'bar', sub {}); # ('QUEUED', undef)
$redis->exec(sub { ... });           # ([  ['OK', undef],
                                     #     [undef, 'ERR…']  ],
                                     #  undef)

And mixing and matching pipelined and synchronous commands follows the same rules:

$redis->multi(sub {});       # cb:  ('OK', undef)
$redis->set(foo => 'bar');   # ret: 'QUEUED'
$redis->get('baz', sub {});  # cb:  ('QUEUED', undef)
$redis->exec;                # ret: ['OK', $value_of_baz]

I'm not quite sure I understand what you're asking, though. I think you're suggesting it would be better if, for a pipelined in-transaction command, the callback were invoked with the data gathered from the ultimate EXEC, rather than the intermediate placeholder +QUEUED response. That does sound better for users, to be honest, though a little painful to implement. Is that what you had in mind?

Owner

melo commented Mar 12, 2012

Yes to the last paragraph, that is what I had in mind.

This morning at the office I'll fix #11 and prepare the release. At that time I'll try and see the difficulties of implementing your last paragraph.

I want to cut a release today, and I don't want to rush this so I'll probably document that for now, during MULTI/EXEC, you cannot use callbacks, and all the responses will be available in the EXEC response.

This allows us more time to see if its feasible to implement it in the way that (IMHO) makes more sense to the end user. The QUEUED responses are important for the client lib, but irrelevant to the programmer, and using a callback to collect the errors later would make more sense to me.

After this release I would also like to use your last comment and paste it into the POD section for the transaction stuff, given that others might have the same doubts about how it all works. :)

Given that I want to fix the test suite to use Test::TCP, and I also want to convert to Dist::Zilla, and I want to rip out the encoding stuff, :) I plan to have at least 2 or 3 releases more before putting 2.0 out there.

Again, thanks for your work.

PS: and yes, the kids are better - classic three days bug

melo referenced this pull request Mar 12, 2012

Closed

multi()/exec() + pipelining #17

Owner

melo commented Mar 12, 2012

Moved the discussion to a new issue, #17.

Contributor

arc commented Mar 12, 2012

Sounds like a pretty decent plan, and feel free to copy anything I've said here into the documentation.

I've got an idea about how to implement the smarter EXEC handling, but it's got some nasty edge cases, so maybe you'll come up with something better. If you want someone to review anything you work on for #17, I'd gladly take a look.

Owner

melo commented Mar 12, 2012

FYI: release 1.950 uploaded to CPAN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment