Support for pipelining #10

arc · 2012-03-04T17:17:54Z

See also the existing issue #3.

This pull request provides minimal support for pipelining. The API looks like this:

my @responses = $redis->pipeline(sub {
    # many requests here
});
# Now @responses are the gathered responses, including unthrown errors.

The "unthrown errors" bit seems important to me: even if one command yielded an error status, you can still look at the responses to all the other commands.

My measurements are noisier than I'd prefer, but for batches of 1000 requests with status-only replies, I see execution time in pipelined mode about 60% or 70% of the execution time in synchronous mode. Put another way: amortised per-request time on a fairly old Linux machine, running an old, threaded vendor Perl goes from ≥64 µs (with encoding => undef) to ≥41 µs. Combining pipelining with my AUTOLOAD cache code from pull request #9 reduces the minimum time to 36 µs.

There's also an additional commit which slightly improves transaction handling; it's implemented using pipelining's ability to suppress the exceptions which are ordinarily raised for error statuses, so again, this lets you look at the responses to all the commands which did succeed. (This feature is the itch I'm really scratching; from my point of view, the pipelining support is less important, though the speedup is pleasing nonetheless.)

My APIs for both pipelining and transaction execution are to some extent straw men; if you think there's a better approach, I'd love to hear your ideas, and I'd be happy to rework my changes to accommodate them.

melo · 2012-03-05T10:17:31Z

Hi,

I mentioned in the previous release notes posted on the Redis ML that the next release would include pipeline so I thank you for tackling the issue.

I'll look at the internals of your implementation and reuse as most as possible, but the API I want to implement doesn't require a pipeline() method nor a special mode.

My current design is this: to enable pipeline, add a coderef callback as the last parameter on any API call.

That is it. If Redis sees the codeRef as the last parameter, it sends the command to the servers, sticks the coderef in a internal queue, and returns. The response parser, when a full response is detected, takes the first item from that queue and calls it with the response. Any exceptions detected can be send in a second argument.

This allows you to mix pipeline and non-pipelined commands. You could:

$redis->incr('i', sub { ... });
$redis->get('a', sub { ... });
$redis->get('b');

This would pipeline the first two, send the third, and then way for the answer to the third command. This would in effect make sure all the previous command responses would complete.

A special $redis->wait_all_responses would allow you to make sure all responses were accounted for.

A side-benefit of this API would be to enable support for MULTI/EXEC easily, given that in those sequences, the answer is also delayed.

What do you think?

arc · 2012-03-05T13:22:58Z

Hi, Pedro. I hadn't realised you were announcing releases on the Redis mailing list — I'll clearly have to subscribe.

Your API proposal sounds good to me. I'll see what I can come up with by way of an implementation, and submit a new set of changes today or tomorrow.

It's tested for in t/01-basic.t, so I assume it's deliberate.

Since it never tries to do anything, but always throws an exception, the new name is more descriptive of what it does.

There are no remaining callers that use it.

Most command methods now take an optional trailing coderef; if a coderef is supplied, we don't wait for the command's response, but merely schedule the coderef to be called once the response has been read.

arc · 2012-03-07T19:31:18Z

Hi, Pedro. Sorry that took me a little longer than expected, but I've updated this pull request to use the approach you suggest. (The code from my original pull request is still available as arc/perl-redis@18a81d8 if you want to compare it.)

This new branch builds on my AUTOLOAD-caching branch.

I've included tests, but I haven't yet updated the documentation, because I want to confirm that this looks good to you first. If you like this patch set, I'll add documentation and resubmit.

In this code, QUIT, SHUTDOWN, and PING are always executed synchronously, on the assumption that pipelining them is never going to be useful; but let me know if you disagree, and I'll look at making them pipelinable. For that matter, I'd be happy to tweak any of my changes and resubmit if you'd prefer them to be implemented differently.

Finally, I've also included fixes for a couple of bugs I spotted by inspection while working on this:

KEYS and INFO weren't honouring the reconnect setting
PING and SHUTDOWN were incorrectly accepted in subscriber mode

jzawodn · 2012-03-08T06:30:27Z

Just wanted to drop in and say I like the direction this is headed. Thanks for the work on this!

melo · 2012-03-08T10:18:14Z

Hey @arc, excellent stuff!

I'm currently migrating one of my sites to a new server, should be done by Friday. Do you mind waiting for the weekend to merge this?

Again, thanks!

arc · 2012-03-08T10:45:03Z

Sure, that's great. I'll work on getting some documentation submitted today or tomorrow.

I'm also making a couple of clarity and performance improvements. This current version has a best-case per-request cost about 15–20 µs (30–50%) longer on my test platform than my original version in arc/perl-redis@18a81d8. That seems to be entirely due to my new __with_reconnect method, which adds a method call and a closure call to the fast path. I'm now looking for a way to remove that additional overhead, without unduly harming maintainability.

- Takes an additional argument indicating whether nested errors should be collected (so callers don't have to manually fix up the relevant data structure) - Renamed to better reflect its purpose

Since they're the two pipeline-capable commands that have unusual return values, it seems particularly valuable to test them.

Unlike the previous method of the same name, this one actually sends a request and enqueues a response handler.

If we aren't in reconnect mode, using __with_reconnect adds a method call and a closure invocation to every request, no matter how trivial. This change adds a fast path for directly calling __run_cmd in the three relevant locations. In my tests, this saves up to a third of the best-case amortised per-request execution time.

arc · 2012-03-08T20:00:34Z

Hi. These changes add documentation, as promised, and match performance with my original pipelining code (at least to within the precision of my measurements). I think the code's also rather cleaner than the slower version.

Let me know if you'd like anything else from me before you merge.

Thanks.

Support for pipelining

melo · 2012-03-11T08:29:46Z

Hey @arc, I've merged your code and will release monday (don't want to do it over the weekend, one of my kids is sick so I ended up with less free time than expected).

The code looks very good, I only have one lingering doubt that I want to discuss with you (and with @jzawodn given that he is listening in :) ). Its about the MULTI() API. Right now, it returns the list of responses for each command we sent.

I was wondering if, when in pipeline() mode, if commands inside MULTI/EXEC are called with a callback I expect those answer will not be present in the exec() return value, correct?

The relevant part of the documentation is this: arc@1a9371b#L0R1016

As I said, I have little time right now, monday morning at the office I'll do a quick test to see if it works like that, as it would be my expected behavior, but I wanted to have your opinion.

Please disregard this if that is the actual current behavior :)

arc · 2012-03-11T11:28:53Z

Hi. Sure, releasing on Monday sounds great, and I hope your kid feels better soon.

As for EXEC, here's more detail on how it works. When nothing's pipelined, MULTI/EXEC work the way they do in 1.926:

$redis->multi;              # +OK
$redis->set(foo => 'bar');  # +QUEUED
$redis->get('baz');         # +QUEUED
$redis->exec;               # ['OK', $value_of_baz]

where, in protocol terms, the response to the EXEC looks like this:

*2
+OK
$12
value of baz

This also means that an exception is thrown if there's an error executing any of the commands in the transaction:

$redis->multi;               # +OK
$redis->set(foo => 'bar');   # +QUEUED
$redis->lpush(foo => 'bar'); # +QUEUED
$redis->exec;                # dies when decoding LPUSH reply

That is, the EXEC itself gets a multi-bulk reply looking like this:

*2
+OK
-ERR Operation against a key holding the wrong kind of value

In the current implementation, using pipelining for the individual commands doesn't affect any of that, only (a) whether we wait for the response before returning, and (b) how the replies in the protocol are provided to the caller. For example:

$redis->multi(sub {});             # ('OK', undef)
$redis->set(foo => 'bar', sub {}); # ('QUEUED', undef)
$redis->get('baz', sub {});        # ('QUEUED', undef)
$redis->exec(sub { ... });         # ([  ['OK', undef],
                                   #     [$value_of_baz, undef]  ],
                                   #  undef)

The MULTI and the individual transactional commands never produce an interesting return, just +OK and +QUEUED respectively; so using a no-op callback to trigger pipelining on them is the obvious approach. But the EXEC produces all the replies, so its callback will (almost always) need to look at the data.

The same holds true when part of the transaction produces an error:

$redis->multi(sub {});               # ('OK', undef)
$redis->set(foo => 'bar', sub {});   # ('QUEUED', undef)
$redis->lpush(foo => 'bar', sub {}); # ('QUEUED', undef)
$redis->exec(sub { ... });           # ([  ['OK', undef],
                                     #     [undef, 'ERR…']  ],
                                     #  undef)

And mixing and matching pipelined and synchronous commands follows the same rules:

$redis->multi(sub {});       # cb:  ('OK', undef)
$redis->set(foo => 'bar');   # ret: 'QUEUED'
$redis->get('baz', sub {});  # cb:  ('QUEUED', undef)
$redis->exec;                # ret: ['OK', $value_of_baz]

I'm not quite sure I understand what you're asking, though. I think you're suggesting it would be better if, for a pipelined in-transaction command, the callback were invoked with the data gathered from the ultimate EXEC, rather than the intermediate placeholder +QUEUED response. That does sound better for users, to be honest, though a little painful to implement. Is that what you had in mind?

melo · 2012-03-12T07:02:59Z

Yes to the last paragraph, that is what I had in mind.

This morning at the office I'll fix #11 and prepare the release. At that time I'll try and see the difficulties of implementing your last paragraph.

I want to cut a release today, and I don't want to rush this so I'll probably document that for now, during MULTI/EXEC, you cannot use callbacks, and all the responses will be available in the EXEC response.

This allows us more time to see if its feasible to implement it in the way that (IMHO) makes more sense to the end user. The QUEUED responses are important for the client lib, but irrelevant to the programmer, and using a callback to collect the errors later would make more sense to me.

After this release I would also like to use your last comment and paste it into the POD section for the transaction stuff, given that others might have the same doubts about how it all works. :)

Given that I want to fix the test suite to use Test::TCP, and I also want to convert to Dist::Zilla, and I want to rip out the encoding stuff, :) I plan to have at least 2 or 3 releases more before putting 2.0 out there.

Again, thanks for your work.

PS: and yes, the kids are better - classic three days bug

melo · 2012-03-12T13:06:17Z

Moved the discussion to a new issue, #17.

arc · 2012-03-12T13:28:33Z

Sounds like a pretty decent plan, and feel free to copy anything I've said here into the documentation.

I've got an idea about how to implement the smarter EXEC handling, but it's got some nasty edge cases, so maybe you'll come up with something better. If you want someone to review anything you work on for #17, I'd gladly take a look.

melo · 2012-03-12T13:55:38Z

FYI: release 1.950 uploaded to CPAN

Cache AUTOLOAD-generated methods for future use

4dabb72

arc added 10 commits March 7, 2012 16:07

Factor out new __with_reconnect method

fb5f5bd

Bug fix: apply reconnect logic to KEYS and INFO

4054b4d

Bug fix: forbid PING and SHUTDOWN in SUBSCRIBE mode

2c3b57a

Document scalar-context KEYS return

6a61b12

It's tested for in t/01-basic.t, so I assume it's deliberate.

Refactor: don't conflate subscription commands with standard commands

d8a1856

Rename __try_reconnect to __throw_reconnect

f859ef4

Since it never tries to do anything, but always throws an exception, the new name is more descriptive of what it does.

Refactor to use __run_cmd where possible

3b1c2e8

Remove final argument in __read_response, __read_response_r

5cc8b55

There are no remaining callers that use it.

Always call __read_response_r as a method

ba03498

Pipelining support

ad922d7

Most command methods now take an optional trailing coderef; if a coderef is supplied, we don't wait for the command's response, but merely schedule the coderef to be called once the response has been read.

arc added 7 commits March 8, 2012 17:56

Refactor __run_cmd as __queue_cmd

dea5540

- Takes an additional argument indicating whether nested errors should be collected (so callers don't have to manually fix up the relevant data structure) - Renamed to better reflect its purpose

Add tests for pipelined INFO and KEYS

89241f4

Since they're the two pipeline-capable commands that have unusual return values, it seems particularly valuable to test them.

Factor out new __run_cmd method

8c6f8c8

Unlike the previous method of the same name, this one actually sends a request and enqueues a response handler.

Refactor: inline __queue_cmd in its only caller

9fc61b4

Refactor: save one level of closure invocation

9c18872

Documentation for pipelining

1a9371b

melo added a commit that referenced this pull request Mar 11, 2012

Merge pull request #10 from arc/pipelining

157cb35

Support for pipelining

melo merged commit 157cb35 into PerlRedis:master Mar 11, 2012

melo mentioned this pull request Mar 12, 2012

multi()/exec() + pipelining #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for pipelining #10

Support for pipelining #10

arc commented Mar 4, 2012

melo commented Mar 5, 2012

arc commented Mar 5, 2012

arc commented Mar 7, 2012

jzawodn commented Mar 8, 2012

melo commented Mar 8, 2012

arc commented Mar 8, 2012

arc commented Mar 8, 2012

melo commented Mar 11, 2012

arc commented Mar 11, 2012

melo commented Mar 12, 2012

melo commented Mar 12, 2012

arc commented Mar 12, 2012

melo commented Mar 12, 2012

Support for pipelining #10

Support for pipelining #10

Conversation

arc commented Mar 4, 2012

melo commented Mar 5, 2012

arc commented Mar 5, 2012

arc commented Mar 7, 2012

jzawodn commented Mar 8, 2012

melo commented Mar 8, 2012

arc commented Mar 8, 2012

arc commented Mar 8, 2012

melo commented Mar 11, 2012

arc commented Mar 11, 2012

melo commented Mar 12, 2012

melo commented Mar 12, 2012

arc commented Mar 12, 2012

melo commented Mar 12, 2012