Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pipelining #10

Merged
merged 18 commits into from Mar 11, 2012
Merged

Support for pipelining #10

merged 18 commits into from Mar 11, 2012

Conversation

arc
Copy link
Contributor

@arc arc commented Mar 4, 2012

See also the existing issue #3.

This pull request provides minimal support for pipelining. The API looks like this:

my @responses = $redis->pipeline(sub {
    # many requests here
});
# Now @responses are the gathered responses, including unthrown errors.

The "unthrown errors" bit seems important to me: even if one command yielded an error status, you can still look at the responses to all the other commands.

My measurements are noisier than I'd prefer, but for batches of 1000 requests with status-only replies, I see execution time in pipelined mode about 60% or 70% of the execution time in synchronous mode. Put another way: amortised per-request time on a fairly old Linux machine, running an old, threaded vendor Perl goes from ≥64 µs (with encoding => undef) to ≥41 µs. Combining pipelining with my AUTOLOAD cache code from pull request #9 reduces the minimum time to 36 µs.

There's also an additional commit which slightly improves transaction handling; it's implemented using pipelining's ability to suppress the exceptions which are ordinarily raised for error statuses, so again, this lets you look at the responses to all the commands which did succeed. (This feature is the itch I'm really scratching; from my point of view, the pipelining support is less important, though the speedup is pleasing nonetheless.)

My APIs for both pipelining and transaction execution are to some extent straw men; if you think there's a better approach, I'd love to hear your ideas, and I'd be happy to rework my changes to accommodate them.

@melo
Copy link
Member

melo commented Mar 5, 2012

Hi,

I mentioned in the previous release notes posted on the Redis ML that the next release would include pipeline so I thank you for tackling the issue.

I'll look at the internals of your implementation and reuse as most as possible, but the API I want to implement doesn't require a pipeline() method nor a special mode.

My current design is this: to enable pipeline, add a coderef callback as the last parameter on any API call.

That is it. If Redis sees the codeRef as the last parameter, it sends the command to the servers, sticks the coderef in a internal queue, and returns. The response parser, when a full response is detected, takes the first item from that queue and calls it with the response. Any exceptions detected can be send in a second argument.

This allows you to mix pipeline and non-pipelined commands. You could:

$redis->incr('i', sub { ... });
$redis->get('a', sub { ... });
$redis->get('b');

This would pipeline the first two, send the third, and then way for the answer to the third command. This would in effect make sure all the previous command responses would complete.

A special $redis->wait_all_responses would allow you to make sure all responses were accounted for.

A side-benefit of this API would be to enable support for MULTI/EXEC easily, given that in those sequences, the answer is also delayed.

What do you think?

@arc
Copy link
Contributor Author

arc commented Mar 5, 2012

Hi, Pedro. I hadn't realised you were announcing releases on the Redis mailing list — I'll clearly have to subscribe.

Your API proposal sounds good to me. I'll see what I can come up with by way of an implementation, and submit a new set of changes today or tomorrow.

arc added 10 commits March 7, 2012 16:07
It's tested for in t/01-basic.t, so I assume it's deliberate.
Since it never tries to do anything, but always throws an exception, the new
name is more descriptive of what it does.
There are no remaining callers that use it.
Most command methods now take an optional trailing coderef; if a coderef is
supplied, we don't wait for the command's response, but merely schedule the
coderef to be called once the response has been read.
@arc
Copy link
Contributor Author

arc commented Mar 7, 2012

Hi, Pedro. Sorry that took me a little longer than expected, but I've updated this pull request to use the approach you suggest. (The code from my original pull request is still available as arc/perl-redis@18a81d8 if you want to compare it.)

This new branch builds on my AUTOLOAD-caching branch.

I've included tests, but I haven't yet updated the documentation, because I want to confirm that this looks good to you first. If you like this patch set, I'll add documentation and resubmit.

In this code, QUIT, SHUTDOWN, and PING are always executed synchronously, on the assumption that pipelining them is never going to be useful; but let me know if you disagree, and I'll look at making them pipelinable. For that matter, I'd be happy to tweak any of my changes and resubmit if you'd prefer them to be implemented differently.

Finally, I've also included fixes for a couple of bugs I spotted by inspection while working on this:

  • KEYS and INFO weren't honouring the reconnect setting
  • PING and SHUTDOWN were incorrectly accepted in subscriber mode

@jzawodn
Copy link

jzawodn commented Mar 8, 2012

Just wanted to drop in and say I like the direction this is headed. Thanks for the work on this!

@melo
Copy link
Member

melo commented Mar 8, 2012

Hey @arc, excellent stuff!

I'm currently migrating one of my sites to a new server, should be done by Friday. Do you mind waiting for the weekend to merge this?

Again, thanks!

@arc
Copy link
Contributor Author

arc commented Mar 8, 2012

Sure, that's great. I'll work on getting some documentation submitted today or tomorrow.

I'm also making a couple of clarity and performance improvements. This current version has a best-case per-request cost about 15–20 µs (30–50%) longer on my test platform than my original version in arc/perl-redis@18a81d8. That seems to be entirely due to my new __with_reconnect method, which adds a method call and a closure call to the fast path. I'm now looking for a way to remove that additional overhead, without unduly harming maintainability.

arc added 7 commits March 8, 2012 17:56
- Takes an additional argument indicating whether nested errors should be
  collected (so callers don't have to manually fix up the relevant data
  structure)

- Renamed to better reflect its purpose
Since they're the two pipeline-capable commands that have unusual return
values, it seems particularly valuable to test them.
Unlike the previous method of the same name, this one actually sends a
request and enqueues a response handler.
If we aren't in reconnect mode, using __with_reconnect adds a method call
and a closure invocation to every request, no matter how trivial.  This
change adds a fast path for directly calling __run_cmd in the three relevant
locations.  In my tests, this saves up to a third of the best-case amortised
per-request execution time.
@arc
Copy link
Contributor Author

arc commented Mar 8, 2012

Hi. These changes add documentation, as promised, and match performance with my original pipelining code (at least to within the precision of my measurements). I think the code's also rather cleaner than the slower version.

Let me know if you'd like anything else from me before you merge.

Thanks.

melo added a commit that referenced this pull request Mar 11, 2012
@melo melo merged commit 157cb35 into PerlRedis:master Mar 11, 2012
@melo
Copy link
Member

melo commented Mar 11, 2012

Hey @arc, I've merged your code and will release monday (don't want to do it over the weekend, one of my kids is sick so I ended up with less free time than expected).

The code looks very good, I only have one lingering doubt that I want to discuss with you (and with @jzawodn given that he is listening in :) ). Its about the MULTI() API. Right now, it returns the list of responses for each command we sent.

I was wondering if, when in pipeline() mode, if commands inside MULTI/EXEC are called with a callback I expect those answer will not be present in the exec() return value, correct?

The relevant part of the documentation is this: arc@1a9371b#L0R1016

As I said, I have little time right now, monday morning at the office I'll do a quick test to see if it works like that, as it would be my expected behavior, but I wanted to have your opinion.

Please disregard this if that is the actual current behavior :)

@arc
Copy link
Contributor Author

arc commented Mar 11, 2012

Hi. Sure, releasing on Monday sounds great, and I hope your kid feels better soon.

As for EXEC, here's more detail on how it works. When nothing's pipelined, MULTI/EXEC work the way they do in 1.926:

$redis->multi;              # +OK
$redis->set(foo => 'bar');  # +QUEUED
$redis->get('baz');         # +QUEUED
$redis->exec;               # ['OK', $value_of_baz]

where, in protocol terms, the response to the EXEC looks like this:

*2
+OK
$12
value of baz

This also means that an exception is thrown if there's an error executing any of the commands in the transaction:

$redis->multi;               # +OK
$redis->set(foo => 'bar');   # +QUEUED
$redis->lpush(foo => 'bar'); # +QUEUED
$redis->exec;                # dies when decoding LPUSH reply

That is, the EXEC itself gets a multi-bulk reply looking like this:

*2
+OK
-ERR Operation against a key holding the wrong kind of value

In the current implementation, using pipelining for the individual commands doesn't affect any of that, only (a) whether we wait for the response before returning, and (b) how the replies in the protocol are provided to the caller. For example:

$redis->multi(sub {});             # ('OK', undef)
$redis->set(foo => 'bar', sub {}); # ('QUEUED', undef)
$redis->get('baz', sub {});        # ('QUEUED', undef)
$redis->exec(sub { ... });         # ([  ['OK', undef],
                                   #     [$value_of_baz, undef]  ],
                                   #  undef)

The MULTI and the individual transactional commands never produce an interesting return, just +OK and +QUEUED respectively; so using a no-op callback to trigger pipelining on them is the obvious approach. But the EXEC produces all the replies, so its callback will (almost always) need to look at the data.

The same holds true when part of the transaction produces an error:

$redis->multi(sub {});               # ('OK', undef)
$redis->set(foo => 'bar', sub {});   # ('QUEUED', undef)
$redis->lpush(foo => 'bar', sub {}); # ('QUEUED', undef)
$redis->exec(sub { ... });           # ([  ['OK', undef],
                                     #     [undef, 'ERR…']  ],
                                     #  undef)

And mixing and matching pipelined and synchronous commands follows the same rules:

$redis->multi(sub {});       # cb:  ('OK', undef)
$redis->set(foo => 'bar');   # ret: 'QUEUED'
$redis->get('baz', sub {});  # cb:  ('QUEUED', undef)
$redis->exec;                # ret: ['OK', $value_of_baz]

I'm not quite sure I understand what you're asking, though. I think you're suggesting it would be better if, for a pipelined in-transaction command, the callback were invoked with the data gathered from the ultimate EXEC, rather than the intermediate placeholder +QUEUED response. That does sound better for users, to be honest, though a little painful to implement. Is that what you had in mind?

@melo
Copy link
Member

melo commented Mar 12, 2012

Yes to the last paragraph, that is what I had in mind.

This morning at the office I'll fix #11 and prepare the release. At that time I'll try and see the difficulties of implementing your last paragraph.

I want to cut a release today, and I don't want to rush this so I'll probably document that for now, during MULTI/EXEC, you cannot use callbacks, and all the responses will be available in the EXEC response.

This allows us more time to see if its feasible to implement it in the way that (IMHO) makes more sense to the end user. The QUEUED responses are important for the client lib, but irrelevant to the programmer, and using a callback to collect the errors later would make more sense to me.

After this release I would also like to use your last comment and paste it into the POD section for the transaction stuff, given that others might have the same doubts about how it all works. :)

Given that I want to fix the test suite to use Test::TCP, and I also want to convert to Dist::Zilla, and I want to rip out the encoding stuff, :) I plan to have at least 2 or 3 releases more before putting 2.0 out there.

Again, thanks for your work.

PS: and yes, the kids are better - classic three days bug

@melo melo mentioned this pull request Mar 12, 2012
@melo
Copy link
Member

melo commented Mar 12, 2012

Moved the discussion to a new issue, #17.

@arc
Copy link
Contributor Author

arc commented Mar 12, 2012

Sounds like a pretty decent plan, and feel free to copy anything I've said here into the documentation.

I've got an idea about how to implement the smarter EXEC handling, but it's got some nasty edge cases, so maybe you'll come up with something better. If you want someone to review anything you work on for #17, I'd gladly take a look.

@melo
Copy link
Member

melo commented Mar 12, 2012

FYI: release 1.950 uploaded to CPAN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants