Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize batch index ACK performance #988

Merged

Conversation

BewareMyPower
Copy link
Contributor

Motivation

Currently, when EnableBatchIndexAck is true, the ACK performance is very poor. There are two main reasons:

  1. Acknowledgment by list is not supported. It means that even N MessageIDs are grouped, there are still N ACK requests to send.
  2. The implementation of ACK grouping tracker is wrong. Give a batch that has N messages, when batch index ACK is enabled, each MessageID is cached. However, after all these N MessageIDs arrived, the current implementation does not clear them.

Modifications

  • Add a func(id []*pb.MessageIdData) to the ACK grouping tracker. When flushing individual ACKs, construct the slice and wrap the slice to CommandAck directly.
  • Refactor the implementation of the ACK grouping tracker to have these two data structures:
    • pendingAcks: Cache the non-batched MessageIDs
    • pendingBatchAcks: Cache the batched MessageIDs, once all messages in a batch have been added, remove them from it and add the non-batched MessageID to pendingAcks
  • Add TestBatchIndexAckAllMessages to verify the new behavior.

After this change, the ACK order cannot be guaranteed, sort the acknowledged MessageIDs in the ack_grouping_tracker_test.go.

@BewareMyPower
Copy link
Contributor Author

BewareMyPower commented Mar 9, 2023

Performance test setup

Run a Pulsar 2.11.0 standalone locally with acknowledgmentAtBatchIndexLevelEnabled=true.

Build the perf binary:

cd perf
go build

Run the consumer in terminal 1:

./perf consume --profile \
    -s sub \
    --enable-batch-index-ack \
    my-topic

Run the producer in terminal, the throughput is about 20 MB/s:

./perf produce \
  --batching-num-messages=10 \
  --size 105 \
  --rate 200000 \
  my-topic

Test result

Before this patch

INFO[13:31:17.584] Stats - Publish rate: 150038.6 msg/s -  120.2 Mbps -
                                Latency ms: 50%   3.1 -95%  13.0 - 99%  48.9 - 99.9%  48.9 - max   48.9
INFO[13:31:27.581] Stats - Publish rate: 179909.8 msg/s -  144.1 Mbps -
                                Latency ms: 50%   2.4 -95%   8.7 - 99%  26.2 - 99.9%  26.2 - max   26.2
INFO[13:31:37.581] Stats - Publish rate: 188957.1 msg/s -  151.4 Mbps -
                                Latency ms: 50%   2.3 -95%   7.8 - 99%  15.9 - 99.9%  15.9 - max   15.9
INFO[13:31:11.592] Stats - Consume rate: 22956.0 msg/s -   18.4 Mbps
INFO[13:31:21.592] Stats - Consume rate: 46805.3 msg/s -   37.5 Mbps
INFO[13:31:31.593] Stats - Consume rate: 47494.7 msg/s -   38.0 Mbps
INFO[13:31:41.594] Stats - Consume rate: 48100.3 msg/s -   38.5 Mbps
INFO[13:31:51.592] Stats - Consume rate: 59705.8 msg/s -   47.8 Mbps
INFO[13:32:01.597] Stats - Consume rate: 61694.0 msg/s -   49.4 Mbps
INFO[13:32:11.592] Stats - Consume rate: 58100.2 msg/s -   46.5 Mbps
INFO[13:32:21.592] Stats - Consume rate: 60818.2 msg/s -   48.7 Mbps
INFO[13:32:31.596] Stats - Consume rate: 60681.5 msg/s -   48.6 Mbps

The consumer can not catch up the producer even if the produce rate is only 20 MB/s. So it fell back to catch up read and the broker reads messages directly with BK.

After this patch

INFO[16:19:40.725] Stats - Publish rate: 152244.3 msg/s -  122.0 Mbps -
                                Latency ms: 50%   3.4 -95%  11.8 - 99%  63.5 - 99.9%  63.5 - max   63.5
INFO[16:19:50.725] Stats - Publish rate: 176077.8 msg/s -  141.1 Mbps -
                                Latency ms: 50%   2.6 -95%   6.3 - 99%  19.3 - 99.9%  19.3 - max   19.3
INFO[16:20:00.725] Stats - Publish rate: 134651.8 msg/s -  107.9 Mbps -
                                Latency ms: 50%   3.2 -95%   9.7 - 99%  26.6 - 99.9%  26.6 - max   26.6
INFO[16:20:10.725] Stats - Publish rate: 147599.8 msg/s -  118.2 Mbps -
                                Latency ms: 50%   2.9 -95%   6.7 - 99%  15.8 - 99.9%  15.8 - max   15.8
INFO[16:20:20.725] Stats - Publish rate: 146333.1 msg/s -  117.2 Mbps -
                                Latency ms: 50%   2.9 -95%   7.2 - 99%  22.8 - 99.9%  22.8 - max   22.8
INFO[16:20:30.725] Stats - Publish rate: 146170.7 msg/s -  117.1 Mbps -
                                Latency ms: 50%   2.9 -95%   7.3 - 99%  19.3 - 99.9%  19.3 - max   19.3
INFO[16:20:40.726] Stats - Publish rate: 145344.2 msg/s -  116.4 Mbps -
                                Latency ms: 50%   2.9 -95%   6.9 - 99%  16.8 - 99.9%  16.8 - max   16.8
INFO[16:20:50.724] Stats - Publish rate: 143197.0 msg/s -  114.7 Mbps -
                                Latency ms: 50%   3.0 -95%   7.7 - 99%  20.7 - 99.9%  20.7 - max   20.7
INFO[16:21:00.724] Stats - Publish rate: 148160.2 msg/s -  118.7 Mbps -
                                Latency ms: 50%   2.9 -95%   7.4 - 99%  17.6 - 99.9%  17.6 - max   17.6
INFO[16:19:38.663] Stats - Consume rate: 114350.6 msg/s -   91.6 Mbps
INFO[16:19:48.663] Stats - Consume rate: 183788.1 msg/s -  147.2 Mbps
INFO[16:19:58.663] Stats - Consume rate: 139771.1 msg/s -  112.0 Mbps
INFO[16:20:08.663] Stats - Consume rate: 141662.6 msg/s -  113.5 Mbps
INFO[16:20:18.663] Stats - Consume rate: 146116.1 msg/s -  117.1 Mbps
INFO[16:20:28.663] Stats - Consume rate: 144571.8 msg/s -  115.8 Mbps
INFO[16:20:38.662] Stats - Consume rate: 144215.5 msg/s -  115.5 Mbps
INFO[16:20:48.662] Stats - Consume rate: 143453.2 msg/s -  114.9 Mbps
INFO[16:20:58.663] Stats - Consume rate: 147724.5 msg/s -  118.3 Mbps
INFO[16:21:08.663] Stats - Consume rate: 148170.2 msg/s -  118.7 Mbps

Flame Graph

Before this patch:

image

image

We can see internalSendRequest takes about 40% of the total, which also leads to the result that runEventLoop method also takes 8.87% of the total.

After this patch:

image

We can see the internalSendRequest only takes 1.26% of the total. Now the bottleneck becomes the performance of the ACK grouping tracker.

Conclusion

From my observation, sometimes the backlog was still increasing slowly, once it became the catch up read, the performance could downgrade fast in my local env. But it's still a big improvement and can be improved by adjusting the ACK grouping options.

@BewareMyPower BewareMyPower marked this pull request as draft March 9, 2023 06:08
### Motivation

Currently, when `EnableBatchIndexAck` is true, the ACK performance is
very poor. There are two main reasons:
1. Acknowledgment by list is not supported. It means that even N
   MessageIDs are grouped, there are still N ACK requests to send.
2. The implementation of ACK grouping tracker is wrong. Give a batch
   that has N messages, when batch index ACK is enabled, each MessageID
   is cached. However, after all these N MessageIDs arrived, the current
   implementation does not clear them.

### Modifications
- Add a `func(id []*pb.MessageIdData)` to the ACK grouping tracker. When
  flushing individual ACKs, construct the slice and wrap the slice to
  `CommandAck` directly.
- Refactor the implementation of the ACK grouping tracker:
  - Do not save each MessageID instance, instead, save the ledger id and
    the entry id as the key of `pendingAcks`.
  - Release the mutex before calling ACK functions
- Add `TestTrackerPendingAcks` to verify the list of MessageIDs to ACK.

After this change, the ACK order cannot be guaranteed, sort the
acknowledged MessageIDs in the `ack_grouping_tracker_test.go`.
@BewareMyPower BewareMyPower force-pushed the bewaremypower/batch-index-ack-perf branch from 0e77afd to a8289cf Compare March 9, 2023 08:32
@BewareMyPower BewareMyPower marked this pull request as ready for review March 9, 2023 08:32
@BewareMyPower BewareMyPower merged commit 352c463 into apache:master Mar 10, 2023
@BewareMyPower BewareMyPower deleted the bewaremypower/batch-index-ack-perf branch March 10, 2023 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants