Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): Adjust batching behavior to reduce network latency on MULTI blocks #1777

Merged
merged 2 commits into from
Sep 4, 2023

Conversation

royjacobson
Copy link
Contributor

@royjacobson royjacobson commented Aug 31, 2023

  1. Add a Yield() call before executing the last command in the async queue when needed.
  2. Allow the receive buffer to grow when needed.
  3. Improve debugging logs for batching behavior.

The whole 'triggering the bug' process looks like this:

  1. We process a larger packet with multiple short commands inside a MULTI segment.
  2. We read the commands into a relatively small buffer (size 256)
  3. The ParseRedis loop reads commands from the first part of the packet and pushes them into the dispatch queue.
  4. The dispatch fiber processes all the commands in the dispatch queue quickly and without preemption because it's inside a MULTI block and all commands return immediately. After emptying the queue it calls SetBatchMode(false) which causes a flush of the response buffer after the next command is executed.
  5. The rest of the commands are parsed and executed, but the second packet is not sent until an ACK is received from the client because of Nagle's algorithm.

Close #1285

@romange
Copy link
Collaborator

romange commented Aug 31, 2023

a small correction: SetBatchMode is not flushing anything but the subsequent reply will indeed flush.

Can you please add here how you reproduced the scenario with increased latency?

@royjacobson
Copy link
Contributor Author

I reproduced it by running the provided benchmark from the issue (addr=redis://10.0.101.178:6378/7 go test -count=1 -v ./pkg/meta/... -run=TestDgfAndRedis).
The latency bug manifests after running it twice. (Probably due to some offset randomness or something?)

Comment on lines 804 to 807
// As a small optimization, if the queue was never larger than 1, skip the Yield() call.
if (dispatch_q_.size() == 1 && queue_was_larger_than_1) {
ThisFiber::Yield();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With memtier_benchmark --pipeline=2 you'll yield a lot 🤔 But its still more effective, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be more efficient to check yield only if we didn't yield on the previous iteration (i.e. the transaction was run with inline scheduling)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need helio machinery for this. Moreover, iouring api allows learning if there is more data pending in the network buffer pending after the last Recv call (IORING_CQE_F_SOCK_NONEMPTY from 5.19). I do not think I implemented support for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does trigger quite a few more Yields, I'm not sure what's the performance impact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the new helio epoch interface. Looked at the logs with memtier_benchmark and the JuiceFS benchmark and it looks quite good, probably 1 yield every 10 commands or something.

// command in the queue and let the producer fiber push more commands if it wants
// to.
// As a small optimization, if the queue was never larger than 1, skip the Yield() call.
if (dispatch_q_.size() == 1 && queue_was_larger_than_1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be that we wake up on non-empty queue and it does not have size > 1 ?
I thought we only push to dispatch_q if we have more than 1 request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least all pubsub/monitor messages go over the dispatch queue and there we can wake for a single element

…queue when needed.

2. Allow the receive buffer to grow when needed.
3. Improve debugging logs for batching behavior.
@royjacobson royjacobson merged commit f94c4be into main Sep 4, 2023
10 checks passed
@royjacobson royjacobson deleted the batching_async_queue_fix branch September 4, 2023 12:29
kostasrim added a commit that referenced this pull request Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dragonfly is about 10x slower than Redis when used by JuiceFS
3 participants