Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparent batching of ZK operations #13043

Merged
merged 11 commits into from
Dec 2, 2021
Merged

Conversation

merlimat
Copy link
Contributor

Motivation

It is more efficient to do operations to ZK in batch, using the ZooKeeper.multi() operation instead of individual calls.
This reduces the RPC traffic between ZK client and servers and it also reduces the number of write transactions, since each multi() correspond to a single ZK transaction, containing multiple write operations.

Modifications

  • Added a base class AbstractBatchedMetadataStore which contains the batching logic
  • Added configurable batching parameters
  • Using 2 different transaction sets, one for read operations and the other just for write operations
  • Added code of (very) basic benchmark

Preliminary results

This is done with the included basic benchmark, making request to the embedded ZK running on a laptop:

image

image

@merlimat merlimat added the type/feature The PR added a new feature or issue requested a new feature label Nov 30, 2021
@merlimat merlimat added this to the 2.10.0 milestone Nov 30, 2021
@merlimat merlimat self-assigned this Nov 30, 2021
@merlimat merlimat added doc-required Your PR changes impact docs and you will update later. and removed doc-label-missing labels Nov 30, 2021
@apache apache deleted a comment from github-actions bot Nov 30, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

import org.testng.annotations.Test;

@Slf4j
public class MetadataBenchmark extends MetadataStoreTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many resources does this test use? How does it take to complete?
Does it make sense to keep this running on CI?

I believe that if the test is no heavyweight we can keep it, but I wanted to double check with you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just takes a couple of seconds for each type and it doesn't take many resources, just keeping 2 threads busy.

@lhotari
Copy link
Member

lhotari commented Nov 30, 2021

Awesome!

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! One minor comment about exception handling.

batchOperation(Collections.singletonList(op));
return;
}
if (queue.size() > maxOperations && flushInProgress.compareAndSet(false, true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will not trigger a flush if reach the size limitation? Looks like the current implementation is split into multiple batches when flushing, if we have enough data should be flush not the operations, we will wait for maxDelayMillis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM and just left a minor comment, I think it should not be a big problem here, but will simplify implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, there are several optimizations that are still possible in the batching logic :) I'd say to continue in subsequent PRs.

Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@merlimat merlimat merged commit 0497f67 into apache:master Dec 2, 2021
@merlimat merlimat deleted the zk-batching branch December 2, 2021 02:57
@Anonymitaet
Copy link
Member

For the doc side, I've discussed with @codelipenghui, the following changes will be made after 2.9.0 is released:

  1. Add parameters to the Pulsar configuration page.

  2. Add descriptions and two images in this PR (result comparison) to ZooKeeper.

@merlimat any thoughts? Thanks

fxbing pushed a commit to fxbing/pulsar that referenced this pull request Dec 19, 2021
* Transparent batching of ZK operations

* Addressed comments

* Handle default switch case

* Fixed issues in MockZookeeper with usage of multi()

* Wrap Throwable with MetadataStoreException

* Fixed getChildren in MockZookeeper

* Handle cases in which ZK is failing the partial request, but with error=OK

* Fixed test to wait for put result

* Fixed test that was not waiting on put future to complete
@Anonymitaet Anonymitaet added doc-complete Your PR changes impact docs and the related docs have been already added. and removed doc-required Your PR changes impact docs and you will update later. labels Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-complete Your PR changes impact docs and the related docs have been already added. type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants