Skip to content

ZOOKEEPER-3437: Improve sync throttling on a learner master#995

Closed
jhuan31 wants to merge 4 commits intoapache:masterfrom
jhuan31:ZOOKEEPER-3437
Closed

ZOOKEEPER-3437: Improve sync throttling on a learner master#995
jhuan31 wants to merge 4 commits intoapache:masterfrom
jhuan31:ZOOKEEPER-3437

Conversation

@jhuan31
Copy link
Copy Markdown

@jhuan31 jhuan31 commented Jun 20, 2019

No description provided.

@mayawang
Copy link
Copy Markdown

Thank you for working on this!
Do we have design documentation on this issue other than the contents in JIRA? It will be really helpful to put some of the technical/design decisions in description for reviewers to learn more about the contexts.

I left a comment based on my understanding. :)

private static final Logger LOG = LoggerFactory.getLogger(LearnerMaster.class);

// Throttle when there are too many concurrent snapshots being sent to observers
private static final String MAX_CONCURRENT_SNAPSYNCS = "zookeeper.leader.maxConcurrentSnapSyncs";
Copy link
Copy Markdown

@mayawang mayawang Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this configuration be shared between leader and observer master? If so, I think there are a few assumptions that worth a closer look.

  1. Leader and Followers are running on machines with similar capability (CPU, network bandwidth). This assumption is usually true.
  2. ObserverMasters and Observers are also running on machines with similar capabilities (CPU, network bandwidth). This assumption may not be true. E.g. If we deploy some Observers in a different data center than the ObserverMaster, then the effective network bandwidth between ObserverMaster and Observer could be significantly smaller than the bandwidth between Leader and Followers. In this case, we might want to set a different maxConcurrentSnapSyncs for ObserverMaster than Leader.

Do those assumptions make sense here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, we don't know what the appropriate thresholds are until we roll the feature out and then tune the settings. If it turns out that we do need different settings for Leader and ObserverMaster, we can add it, instead of introducing a setting that might never be used. What do you think? (BTW, in our deployment, Leader and Followers can be in different data centers too.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details! I think we can keep this configuration just one parameter for now. I'd love to learn the actual system behavior once this is used in production.

return maxConcurrentSnapSyncs;
}

public void setMaxConcurrentSnapSyncs(int maxConcurrentSnapSyncs) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not seen any usage of setMaxConcurrentSnapSyncs, and I am not sure how we can set different values for Leader vs ObserverMaster. How do we plan to use setMaxConcurrentSnapSyncs?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called in FollwerBean and LeaderBean. No, we can't set different values for Leader and ObserverMaster. We did have a discussion whether to support different settings for Leader and ObserverMaster. Since we don't see a need for different settings and since we have too many flags/settings already, we decide to go with one setting for both Leader and ObserverMaster

@jhuan31
Copy link
Copy Markdown
Author

jhuan31 commented Jul 1, 2019

We don't have any design docs :( But you just remind me that I need to update the ZooKeeper Administrator's Guide :)

public void shutdown() {
// Send the packet of death
try {
queuedPackets.clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to this patch, but it looks the right thing to do. Just want to make sure this was not accidentally introduced.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It is actually added in this patch to handle diff sync throttling. Diff txns are queued in syncFollower() before the throttle. Later when we decide that this diff sync should be throttled, the txns are in the queues already so we need to remove them.


/**
* Utility class to limit the number of concurrent syncs from a leader to
* observers and followers. {@link LearnerHandler} objects should call
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might add "or syncs from a follower to observers".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hanm
Copy link
Copy Markdown
Contributor

hanm commented Jul 3, 2019

update the ZooKeeper Administrator's Guide

Yes it'll be great to include the update as part of this patch.

Copy link
Copy Markdown
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits (and after doc is updated).

Copy link
Copy Markdown
Contributor

@lvfangmin lvfangmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jhuan31, this looks good me.

I'll merge it back to master by end of this week if there is no objection.

Copy link
Copy Markdown
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well.

but I left a very few minor comments, please fix them


syncLimitCheck.start();
// sync ends when NEWLEADER-ACK is received
syncThrottler.endSync();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this endSync() call to be put in a finally block?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endSync() is called in the finally block line 720-722, if not called earlier.

}

@Test(expected = SyncThrottleException.class)
public void testTryWithResourceThrottle() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very clear that we are testing that the second call to beginSync is the one that throws the exception
I suggest to use an explicit catch block and not use the 'expected' attribute

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


final LearnerSyncThrottler throttler =
new LearnerSyncThrottler(numThreads, syncType);
ExecutorService threadPool = Executors.newFixedThreadPool(numThreads);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please shutdown this pool in a finally block

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@eolivelli
Copy link
Copy Markdown
Contributor

retest maven build

@asfgit asfgit closed this in f38905e Jul 25, 2019
stickyhipp pushed a commit to stickyhipp/zookeeper that referenced this pull request Aug 2, 2019
Author: Jie Huang <jiehuang@fb.com>

Reviewers: Michael Han <lhan@twitter.com>, Enrico Olivelli <eolivelli@apache.org>, Fangmin Lyu <fangmin@apache.org>

Closes apache#995 from jhuan31/ZOOKEEPER-3437
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Aug 31, 2022
Author: Jie Huang <jiehuang@fb.com>

Reviewers: Michael Han <lhan@twitter.com>, Enrico Olivelli <eolivelli@apache.org>, Fangmin Lyu <fangmin@apache.org>

Closes apache#995 from jhuan31/ZOOKEEPER-3437
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
Author: Jie Huang <jiehuang@fb.com>

Reviewers: Michael Han <lhan@twitter.com>, Enrico Olivelli <eolivelli@apache.org>, Fangmin Lyu <fangmin@apache.org>

Closes apache#995 from jhuan31/ZOOKEEPER-3437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants