[ZOOKEEPER-1177] Add the memory optimized watch manager for concentrate watches scenario #590

lvfangmin · 2018-08-06T23:51:27Z

The current HashSet based WatcherManager will consume more than 40GB memory when
creating 300M watches.

This patch optimized the memory and time complexity for concentrate watches scenario, compared to WatchManager, both the memory consumption and time complexity improved a lot. I'll post more data later with micro benchmark result.

Changed made compared to WatchManager:

Only keep path to watches map
Use BitSet to save the memory used to store watches
Use ConcurrentHashMap and ReadWriteLock instead of synchronized to reduce lock retention
Lazily clean up the closed watchers

hanm

Findbug is not happy:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2029/artifact/build/test/findbugs/newPatchFindbugsWarnings.html

lvfangmin · 2018-08-08T06:06:06Z

@hanm the code complained by Findbug are all correct and expected:

System.exit if create watch manager failed in DataTree
It's using the factory to create the watch manager based on the class name, in case we cannot initialize the class due to invalid classname, we have to exit.
BitHashSet.elementCount not synchronized in iterator()
It cannot guarantee thread safe during iteration inside this function, so it's marked as non thread safe in the comment, and the caller needs to and is synchronizing it during iterating.
Synchronize on AtomicInteger
In the code, we synchronize on it to do wait/notify, but not relying on the synchronization to control the AtomicInteger value update, so it's used correctly.

anmolnar · 2018-08-08T15:34:42Z

@lvfangmin In which case you need to add exceptions to config/findbugsExcludeFile.xml.

maoling · 2018-08-08T16:50:51Z

@lvfangmin interesting.where can we find a benchmark which shows how this pr improves the memory?

lvfangmin · 2018-08-08T21:44:18Z

Based on internal benchmark, this may save more than 95% memory usage on concentrate watches scenario. I'm working on adding some micro benchmark.

In the original Jira, @phunt also added some basic test, which could show you some ideas about how much memory it's going to save.

The current HashMap based watch manager uses lots of memory due to the overhead of storing each entry (32 Bytes for each entry), the object reference and the duplicated path strings.

Use BitSet without revert index could reduce those overhead and make it more memory efficient.

lvfangmin · 2018-08-08T21:46:37Z

There is a PR in the Jira few years ago, which uses BitMap as well, but it sacrificed the performance on triggering watches, this patch improved that and uses lazy clean up for cleaning those watchers who has closed the session.

anmolnar · 2018-08-09T06:38:51Z

This is going to be a huge improvement for Accumulo for example that is a heavy user of watchers. I'm going to allocate some capacity to review these new patches.

lvfangmin · 2018-08-31T23:36:28Z

Added JMH micro benchmark for the watch manager:

It shows big win for the watch heavy cases, with the current implementation, it uses more than 50MB memory to store 1M watches, with WatchManagerOptimized it only uses around 0.2MB.
It also makes add and trigger watches more efficient, since WatchManagerOptimized doesn't maintain the reverse map.
In sparse watches use case, the WatchManagerOptimized is expected to use a bit more memory because it needs extra effort to maintain those bit set. In the test it shows around 10% more memory usage.

Here are more result about the throughput/latency related with WatchManager:

Benchmark                               (pathCount)    (watchManagerClass)  (watcherCount)  Mode  Cnt   Score    Error  Units
WatchBench.testAddConcentrateWatch            10000           WatchManager             N/A  avgt    9   5.382 ±  0.968  ms/op
WatchBench.testAddConcentrateWatch            10000  WatchManagerOptimized             N/A  avgt    9   0.696 ±  0.133  ms/op
WatchBench.testAddSparseWatch                 10000           WatchManager           10000  avgt    9   4.889 ±  1.585  ms/op
WatchBench.testAddSparseWatch                 10000  WatchManagerOptimized           10000  avgt    9   4.794 ±  1.068  ms/op
WatchBench.testTriggerConcentrateWatch            1           WatchManager               1  avgt    9  ≈ 10⁻⁴           ms/op
WatchBench.testTriggerConcentrateWatch            1           WatchManager            1000  avgt    9   0.037 ±  0.002  ms/op
WatchBench.testTriggerConcentrateWatch            1  WatchManagerOptimized               1  avgt    9  ≈ 10⁻⁴           ms/op
WatchBench.testTriggerConcentrateWatch            1  WatchManagerOptimized            1000  avgt    9   0.025 ±  0.001  ms/op
WatchBench.testTriggerConcentrateWatch         1000           WatchManager               1  avgt    9   0.048 ±  0.003  ms/op
WatchBench.testTriggerConcentrateWatch         1000           WatchManager            1000  avgt    9  71.838 ±  4.043  ms/op
WatchBench.testTriggerConcentrateWatch         1000  WatchManagerOptimized               1  avgt    9   0.079 ±  0.002  ms/op
WatchBench.testTriggerConcentrateWatch         1000  WatchManagerOptimized            1000  avgt    9  26.135 ±  0.223  ms/op
WatchBench.testTriggerSparseWatch             10000           WatchManager           10000  avgt    9   1.207 ±  0.035  ms/op
WatchBench.testTriggerSparseWatch             10000  WatchManagerOptimized           10000  avgt    9   1.321 ±  0.019  ms/op

You can try the following command to run the micro benchmark:

$ ant clean package
$ ant clean package -buildfile zookeeper-contrib/zookeeper-contrib-fatjar/build.xml
$ java -jar build/contrib/fatjar/zookeeper-dev-fatjar.jar jmh

@maoling @anmolnar hope this gives you a more vivid comparison between the old and new watch manager implementation.

anmolnar

@lvfangmin Please take a look at my initial feedback on the patch. Didn't have time to review the testing side, but it generally looks good.

@nkalmar There's a new project being added here: bench. Please take a quick look at it and advise on what would be the best place to put in terms of Maven migration. Thanks.

anmolnar · 2018-09-04T10:35:18Z

build.xml

@@ -119,6 +119,7 @@ xmlns:cs="antlib:com.puppycrawl.tools.checkstyle.ant">
    <property name="test.java.classes" value="${test.java.build.dir}/classes"/>
    <property name="test.src.dir" value="${src.dir}/java/test"/>
    <property name="systest.src.dir" value="${src.dir}/java/systest"/>
+    <property name="bench.src.dir" value="${src.dir}/java/bench"/>


I think this new dir should be added to classpath of eclipse task too.

src/java/main/org/apache/zookeeper/server/watch/WatchManager.java

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

anmolnar · 2018-09-04T12:54:39Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+    @Override
+    public void removeWatcher(Watcher watcher) {
+        Integer watcherBit;
+        addRemovePathRWLock.writeLock().lock();


Do you need to acquire write lock here?

I need the exclusive lock with addWatch, otherwise addWatch may still add a dead watch which won't be cleaned up in the WatchCleaner when it started to clean up.

In which case you need to move addDeadWatcher() call inside the critical block.

Missed this comment last time, what we need here is that, as long as we called addDeadWatcher, there will be no watches added related with this dead watcher. Code executed after line 136 means the watcher will be marked as stale, after we release this lock, any on flying addWatcher for this dead watcher will be rejected, so it guarantees when we call addDeadWatcher there will be no race condition between removing and adding watch.

And I need to move addDeadWatcher out of the locking block, since the WatchCleaner might block on it to avoid OOM issue if the cleaner cannot catch up of cleaning the dead watchers.

Where do you mark the watcher as stale inside the critical block?
It only calls a getter on the BitIdMap, right?

In the caller of removeWatcher, which is in NIOServerCnxn.close and NettyServerCnxn.close.

When the cnxn is closed, it will set stale before call removeCnxn on zkServer, which calls this function sequentially, if we grabbed this lock, it means the cnxn has been marked as stale.

We can explicitly setStale here as well, but I don't think that's necessary.

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

anmolnar · 2018-09-04T13:37:53Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerFactory.java

+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class WatchManagerFactory {


A quick javadoc would be awesome here.

anmolnar · 2018-09-04T13:38:30Z

src/java/main/org/apache/zookeeper/server/watch/DeadWatcherListener.java

+
+import java.util.Set;
+
+public interface DeadWatcherListener {


Please add a few words javadoc here.

anmolnar · 2018-09-04T13:39:09Z

src/java/main/org/apache/zookeeper/server/util/BitMap.java

+import java.util.BitSet;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+public class BitMap<T> {


I think a short javadoc similar to BitHashSet's would useful here.

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

anmolnar · 2018-09-04T13:54:13Z

src/java/test/org/apache/zookeeper/server/DumbWatcher.java

+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.proto.ReplyHeader;
+
+public class DumbWatcher extends ServerCnxn {


Please consider using mockito.

I agree from unit test case mock object is easier to maintain than stub ones, but I also need this DumbWatcher in the micro benchmark, I'll put this class somewhere in the code, so the micro benchmark and unit test can share it.

anmolnar · 2018-09-05T09:39:00Z

@lvfangmin Docs for the new caching parameter zookeeper.bitHashCacheSize?

nkalmar

On the new project bench:
I'm not sure we need a top level module for this bench test (although more could come in the future...)

From my side, this can stay in an org.apache.zookeeper project, but then it should go to org.apache.zookeeper.bench in my opinion, and directory structure should be src/java/org/apache/zookeeper/bench/**

It will be moved with maven migration anyway (it can't stay in java/bench/org/..

@lvfangmin Can you please move the directory, and possibly also create a new package "branch" for it, under zookeeper?

Otherwise, looking at the patch, really great job! Thanks!

lvfangmin · 2018-09-05T19:03:14Z

Thanks @nkalmar for the suggestion of bench project position, I was following the same directory as the src/java/systest for now, do you think we can move them together later? I'm not against to move it now.

If we want to move now, just to confirm the directory is src/java/org/apache/zookeeper/bench/ without main folder, right?

nkalmar · 2018-09-06T10:33:59Z

systest will go to src/test/java , from my side, you can put the bench in org.apache.zookeeper.test.system . Thinking about it, that's a pretty good place.

No need to create main directory or anything, I will move all the files anyway. Just move the files amongst the others in systest. (Hopefully no package level dependency, I didn't check)

But there's a chance you will have to rebase if this this PR cannot be merged before the movement of all the remaining files as the directory refactor's last step. Sorry about that in advance... I'm not going to be the most popular with that PR :(

lvfangmin · 2018-09-12T17:43:12Z

Update based on the comments:

add findbugs exclusion
add comments for the newly added classes
add admin doc for the new JVM options introduced in this diff
move the bench project to src/test/java/bench
move DumbWatcher to src/java/main dir for share between unit test and bench
change to use ExitCode

@nkalmar I moved the bench to src/test/java/bench, which seems more reasonable to me, let me know if you think that's not a good position based on your plan.

lvfangmin · 2018-09-12T17:49:09Z

Resolve conflict with latest code on master.

add watch memory usage

lvfangmin · 2018-09-19T16:57:26Z

Rebased to resolve conflict. @anmolnar @hanm @maoling @nkalmar please revisit this PR when you have time.

anmolnar · 2018-09-20T10:09:45Z

@lvfangmin Thanks. Sorry for the delay. I'd like to check one more thing before accepting. Bear with me please.

anmolnar · 2018-09-20T11:00:09Z

@lvfangmin Just to wrap up the difference between this and original 6-year-old patch on Jira: you've added deadWatchers collection and lazy WatcherCleaner to avoid the performance penalty on removeWatches().

Is that correct?

lvfangmin · 2018-09-20T20:11:45Z

@anmolnar Thanks for reviewing, take your time.

Here are the main differences between this version and the old one on the Jira:

use path to watchers map instead of watcher to paths to improve the performance of trigger watches, we need the lazily dead watcher clean up because of this (main change)
better and cleaner implementation, for example, added WatchManagerFactory to easily switch between different watch manager implementation
some perf improvement by using HashSet and BitSet to find a balance between memory usage and time complexity
fix the watcher leaking issue due to adding dead watcher (we can separate this out if we want to)
fix the NettyServerCnxn doesn't de-register itself from watcher manager when the cnxn closed (this is actually fixed recently in [ZOOKEEPER-3131] Remove watcher when session closed in NettyServerCnxn #612, so it's not necessary to do it here now)
added jmh micro benchmark

anmolnar

Cool, thanks for the summary. It's very useful for the records.
+1 approved

anmolnar · 2018-09-21T09:39:57Z

I cannot commit, because we still have -1 from @hanm . Waiting for him to approve.
Does it make #612 redundant?

asfgit · 2018-09-21T10:13:04Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2223/

hanm · 2018-09-21T16:54:30Z

src/test/java/bench/org/apache/zookeeper/BenchMain.java

@@ -0,0 +1,12 @@
+package org.apache.zookeeper;


This file is missing apache license header. This triggers a -1 in last jenkins build.

Will add it.

hanm · 2018-09-21T17:00:40Z

@anmolnar I'll sign this off by end of next Monday if no other issues.
@lvfangmin great work and thanks for your patience!

hanm

some nitpicks on spelling, a couple of questions, and some comments around code i see suspicious..

hanm · 2018-09-21T22:22:47Z

src/java/main/org/apache/zookeeper/server/watch/IWatchManager.java

+import org.apache.zookeeper.server.ServerCnxn;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;


Remove all imports here except these three since rest of those were not used (my guess is this file was copied pasted?)
import java.io.PrintWriter; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.Watcher.Event.EventType;

hanm · 2018-09-21T22:25:21Z

src/java/main/org/apache/zookeeper/server/watch/IWatchManager.java

+    public boolean addWatch(String path, Watcher watcher);
+
+    /**
+     * Checks the specified watcher exists for the given path


nit: missing full stop at end of sentence.

hanm · 2018-09-21T22:25:39Z

src/java/main/org/apache/zookeeper/server/watch/IWatchManager.java

+    public boolean containsWatcher(String path, Watcher watcher);
+
+    /**
+     * Removes the specified watcher for the given path


nit: missing full stop at end of sentence.

hanm · 2018-09-21T22:29:03Z

src/java/main/org/apache/zookeeper/server/watch/IWatchManager.java

+
+    /**
+     * Distribute the watch event for the given path, but ignore those
+     * supressed ones.


spell check: suppressed instead ofsupressed

hanm · 2018-09-21T22:30:20Z

src/java/main/org/apache/zookeeper/server/watch/IWatchManager.java

+     * @return the watchers have been notified
+     */
+    public WatcherOrBitSet triggerWatch(
+            String path, EventType type, WatcherOrBitSet supress);


similar spelling issue for supress

hanm · 2018-09-22T03:59:46Z

src/java/main/org/apache/zookeeper/server/watch/DeadWatcherListener.java

+/**
+ * Interface used to process the dead watchers related to closed cnxns.
+ */
+public interface DeadWatcherListener {


would be good to rename this to IDeadWatchListner, which makes it obvious this is an interface. We already do this for IWatchManager.

hanm · 2018-09-22T04:08:46Z

zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml

+                <ulink url="https://issues.apache.org/jira/browse/ZOOKEEPER-1179">ZOOKEEPER-1179</ulink> The new watcher
+                manager WatchManagerOptimized will clean up the dead watchers lazily, this config is used to decide how
+                many thread is used in the WatcherCleaner. More thread usually means larger clean up throughput. The
+                default value is 2, which is good enough even for heavy and continuous session closing/receating cases.</para>


closing/recreating

hanm · 2018-09-22T04:31:13Z

src/java/main/org/apache/zookeeper/server/watch/WatcherCleaner.java

+                try {
+                    if (deadWatchers.size() < watcherCleanThreshold) {
+                        int maxWaitMs = (watcherCleanIntervalInSeconds +
+                            r.nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 1000;


is there a particular reason of choosing this versus, say exponential backoff?

Clean up dead watches on large watches ensemble is a heavy work, which might affect the performance, so add jitter to make sure we don't do the lazily clean up at the same time on all the servers in the ensemble.

hanm · 2018-09-22T04:31:58Z

src/java/main/org/apache/zookeeper/server/watch/WatcherCleaner.java

+                deadWatchers.clear();
+                int total = snapshot.size();
+                LOG.info("Processing {} dead watchers", total);
+						    cleaners.schedule(new WorkRequest() {


indentation issue on this line...

hanm · 2018-09-22T04:35:43Z

src/java/main/org/apache/zookeeper/server/watch/WatcherCleaner.java

+            synchronized (this) {
+                // Snapshot of the current dead watchers
+                final Set<Integer> snapshot = new HashSet<Integer>(deadWatchers);
+                deadWatchers.clear();


Is there a particular reason to copy the deadWatchers, clear it immediately and then do the work, instead of just operate on deadWatchers directly and only clear it after the work is done? I assume the motivation was to free deadWatchers earlier so we can pipeline the work: adding more dead watchers while the previous pipeline of cleaning was in progress, but it looks like the new dead watchers will block on totalDeadWatchers, which will only be reset after previous dead watchers were cleaned up.

Clean the dead watchers need to go through all the current watches, which is pretty heavy and may take a second if there are millions of watches, that's why we're doing lazily batch clean up here, we don't want to block addDeadWatcher, which is called from the Cnxn.close while we're doing the clean work.

The totalDeadWatchers is used to avoid OOM when the watcher cleaner cannot catch up (we haven't seen this problem even with heavy reconnecting scenario), it is suggested to be set to something like 1000 * watcherCleanThreshold. The watcherCleanThreshold is used to control the batch size when doing the clean up, there is trade off between GC the dead watcher memory and the time complexity of cleaning up, so we cannot set this too large.

lvfangmin

Thanks @hanm for the detailed review, I have made comments to address your concerns, let me know if there is anything unclear to you.

Meanwhile, I'll remove the unused imports and correct the typo as you suggested.

lvfangmin · 2018-09-23T04:06:03Z

src/java/main/org/apache/zookeeper/server/watch/BitHashSet.java

+        if (elementBit == null) {
+            return false;
+        }
+        return elementBits.get(elementBit);


BitSet.get is O(1), check cache doesn't may actually more expensive.

HashSet is used to optimize the iterating, for example, if there is a single element in this BitHashSet, but the bit is very large, without HashSet we need to go through all the words before return that element, which is not efficient.

lvfangmin · 2018-09-23T04:06:36Z

src/java/main/org/apache/zookeeper/server/watch/BitHashSet.java

+     * iterate through this set.
+     */
+    @Override
+    public Iterator<Integer> iterator() {


It's used in the triggerWatcher with for iterator.

lvfangmin · 2018-09-23T04:22:08Z

src/java/main/org/apache/zookeeper/server/watch/WatcherCleaner.java

+                try {
+                    if (deadWatchers.size() < watcherCleanThreshold) {
+                        int maxWaitMs = (watcherCleanIntervalInSeconds +
+                            r.nextInt(watcherCleanIntervalInSeconds / 2 + 1)) * 1000;


Clean up dead watches on large watches ensemble is a heavy work, which might affect the performance, so add jitter to make sure we don't do the lazily clean up at the same time on all the servers in the ensemble.

lvfangmin · 2018-09-23T04:31:09Z

src/java/main/org/apache/zookeeper/server/util/BitMap.java

+        Integer bit = getBit(value);
+        if (bit != null) {
+            return bit;
+        }


This BitMap is used by WatchManagerOptimized.watcherBitIdMap, which is used to store watcher to bit mapping.

Add might be called a lot if the same client connection is watching on thousands of even millions of nodes, remove only called once when the session is closed, that's why we optimized to check read lock first in add, but use write lock directly in remove.

lvfangmin · 2018-09-23T04:33:38Z

src/java/main/org/apache/zookeeper/server/watch/BitHashSet.java

+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.watch;


At the beginning when we added this class, it was bound with Watcher, but not anymore after refactoring, we can move this to server.util, I'll do that.

lvfangmin · 2018-09-23T04:36:42Z

src/java/main/org/apache/zookeeper/server/watch/BitHashSet.java

+ */
+public class BitHashSet implements Iterable<Integer> {
+
+    static final long serialVersionUID = 6382565447128283568L;


Previously, it was using inheritance instead of composition with HashSet, at that time we added this serialVersionUID, didn't remove this after changing to composition, will remove it.

lvfangmin · 2018-09-23T04:48:47Z

src/java/main/org/apache/zookeeper/server/watch/WatcherCleaner.java

+            synchronized (this) {
+                // Snapshot of the current dead watchers
+                final Set<Integer> snapshot = new HashSet<Integer>(deadWatchers);
+                deadWatchers.clear();


Clean the dead watchers need to go through all the current watches, which is pretty heavy and may take a second if there are millions of watches, that's why we're doing lazily batch clean up here, we don't want to block addDeadWatcher, which is called from the Cnxn.close while we're doing the clean work.

The totalDeadWatchers is used to avoid OOM when the watcher cleaner cannot catch up (we haven't seen this problem even with heavy reconnecting scenario), it is suggested to be set to something like 1000 * watcherCleanThreshold. The watcherCleanThreshold is used to control the batch size when doing the clean up, there is trade off between GC the dead watcher memory and the time complexity of cleaning up, so we cannot set this too large.

lvfangmin · 2018-09-23T04:54:33Z

src/test/java/bench/org/apache/zookeeper/BenchMain.java

@@ -0,0 +1,12 @@
+package org.apache.zookeeper;


Will add it.

lvfangmin · 2018-09-23T05:00:21Z

@anmolnar #612 was fired because there is user reporting that issue, and it's better to solve it earlier than waiting this diff being reviewed and merged. I'll do the rebase to get rid of that change in this patch.

hanm · 2018-09-24T16:55:29Z

src/java/main/org/apache/zookeeper/server/watch/WatcherCleaner.java

+        // this is will slow down the socket packet processing and
+        // the adding watches in the ZK pipeline.
+        while (maxInProcessingDeadWatchers > 0 && !stopped &&
+                totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {


I think this should be maxInProcessingDeadWatchers != -1 && totalDeadWatchers.get() >= maxInProcessingDeadWatchers. Otherwise we'll always wait on totalDeadWatchers if user use default configuration value of maxInProcessingDeadWatchers.

@hanm I'm not sure I understand this correctly, the default value of maxInProcessingDeadWatchers is -1, in this case it will skip checking the totalDeadWatchers, am I missing anything?

oops, did not see maxInProcessingDeadWatchers > 0. we are good here.

hanm · 2018-09-24T16:58:45Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.util.BitMap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;


couple of unused imports here

hanm · 2018-09-24T17:00:28Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+ * Optimized in memory and time complexity, compared to WatchManager, both the
+ * memory consumption and time complexity improved a lot, but it cannot
+ * efficiently remove the watcher when the session or socket is closed, for
+ * majority usecase this is not a problem.


nit: use case

hanm · 2018-09-24T17:58:00Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+            return null;
+        }
+
+        int triggeredWatches = 0;


we can remove this - it's assigned later but never used.

We had metrics for this, I removed those metrics because #580 was still in review, I'll add those metrics back since that patch has been merged.

I knew it's the metrics. It's fine to leave this variable and we can add metrics in another patch, since this patch is already big enough and almost ready to land.

hanm · 2018-09-24T18:06:22Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+
+    @Override
+    public WatcherOrBitSet triggerWatch(
+            String path, EventType type, WatcherOrBitSet supress) {


nit - suppress

hanm · 2018-09-24T19:20:20Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+    @Override
+    public boolean addWatch(String path, Watcher watcher) {
+        boolean result = false;
+        addRemovePathRWLock.readLock().lock();


Is the purpose of using a read lock here is to optimize for addWatch heavy workloads? Would be good to add a comment here about why choose use a read lock instead of write lock.

Yes, it's used to improve the read throughput, creating new watcher bit and adding it to the BitHashSet has it's own lock to minimize the lock scope. I'll add some comments here.

hanm · 2018-09-24T19:24:51Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+                if (watchers == null) {
+                    watchers = new BitHashSet();
+                    BitHashSet existingWatchers = pathWatches.putIfAbsent(path, watchers);
+                    if (existingWatchers != null) {


Is this check necessary, because we are using a read lock here so it's possible for another thread to modify the pathWatches while we are here?

That's correct, reading requests are processed concurrently in CommitProcessor worker service, so it's possible multiple thread might add to pathWatches while we're holding read lock, that's why we need this check here.

hanm · 2018-09-24T19:26:44Z

src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java

+
+    @Override
+    public boolean containsWatcher(String path, Watcher watcher) {
+        BitHashSet watchers = pathWatches.get(path);


Would be good to add a comment here regarding why no synchronization is required here.

hanm · 2018-09-24T20:32:39Z

thanks @lvfangmin for detailed reply.
just had another review pass over some files i forgot to look last time. overall looks good. will sign this off once all review comments are addressed. thanks

…l folder

lvfangmin · 2018-09-28T05:36:02Z

Added more comments about the locking where @anmolnar and @hanm asked during review, it will make the future reference easier as well. Also corrected the typo and unused import.

asfgit · 2018-09-28T05:51:31Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2278/

hanm

LGTM, thanks @lvfangmin . Will commit today after running through jenkins for a few time.

hanm · 2018-09-28T21:32:44Z

got a "green" build (minors the known failed test testReconfigRemoveClientFromStatic discussed at ZOOKEEPER-2847).

hanm · 2018-09-28T21:39:45Z

merged to master. great work @lvfangmin !

vivekpatani · 2020-07-28T22:04:58Z

Is it worth it to back port this to 3.4? @hanm @lvfangmin

eolivelli · 2020-07-29T05:32:35Z

@vivekpatani sorry but 3.4 is end of life .
Consider moving to 3.6.x.
We ate going to release 3.7.0 so even 3.5.x is becoming old.

…e watches scenario The current HashSet based WatcherManager will consume more than 40GB memory when creating 300M watches. This patch optimized the memory and time complexity for concentrate watches scenario, compared to WatchManager, both the memory consumption and time complexity improved a lot. I'll post more data later with micro benchmark result. Changed made compared to WatchManager: * Only keep path to watches map * Use BitSet to save the memory used to store watches * Use ConcurrentHashMap and ReadWriteLock instead of synchronized to reduce lock retention * Lazily clean up the closed watchers Author: Fangmin Lyu <allenlyu@fb.com> Reviewers: Andor Molnár <andor@apache.org>, Norbert Kalmar <nkalmar@yahoo.com>, Michael Han <hanm@apache.org> Closes apache#590 from lvfangmin/ZOOKEEPER-1177

lvfangmin force-pushed the ZOOKEEPER-1177 branch from d4f996f to 04fad10 Compare August 7, 2018 00:00

hanm suggested changes Aug 8, 2018

View reviewed changes

lvfangmin force-pushed the ZOOKEEPER-1177 branch from 04fad10 to 88bfdab Compare August 31, 2018 23:26

anmolnar requested changes Sep 4, 2018

View reviewed changes

nkalmar reviewed Sep 5, 2018

View reviewed changes

lvfangmin force-pushed the ZOOKEEPER-1177 branch from 4043700 to f212cfd Compare September 12, 2018 17:48

Fangmin Lyu added 3 commits September 19, 2018 09:51

add the memory optimized watch manager for concentrate watches

16f9a1e

add jmh benchmark for watch manager

f9cc11f

add watch memory usage

add doc, add findbugs exclusion, and move bench project position

c9962c9

lvfangmin force-pushed the ZOOKEEPER-1177 branch from f212cfd to c9962c9 Compare September 19, 2018 16:56

anmolnar approved these changes Sep 21, 2018

View reviewed changes

hanm suggested changes Sep 21, 2018

View reviewed changes

hanm suggested changes Sep 22, 2018

View reviewed changes

lvfangmin commented Sep 23, 2018

View reviewed changes

hanm suggested changes Sep 24, 2018

View reviewed changes

lvfangmin mentioned this pull request Sep 25, 2018

ZOOKEEPER-3131 #611

Closed

Add more comments about locking, correct typo, move BitHashSet to uti…

025e836

…l folder

hanm approved these changes Sep 28, 2018

View reviewed changes

asfgit closed this in fdde8b0 Sep 28, 2018

[ZOOKEEPER-1177] Add the memory optimized watch manager for concentrate watches scenario #590

[ZOOKEEPER-1177] Add the memory optimized watch manager for concentrate watches scenario #590

Conversation

lvfangmin commented Aug 6, 2018

hanm left a comment

Choose a reason for hiding this comment

lvfangmin commented Aug 8, 2018

anmolnar commented Aug 8, 2018

maoling commented Aug 8, 2018

lvfangmin commented Aug 8, 2018

lvfangmin commented Aug 8, 2018

anmolnar commented Aug 9, 2018

lvfangmin commented Aug 31, 2018

anmolnar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmolnar commented Sep 5, 2018

nkalmar left a comment • edited Loading

Choose a reason for hiding this comment

lvfangmin commented Sep 5, 2018

nkalmar commented Sep 6, 2018

lvfangmin commented Sep 12, 2018

lvfangmin commented Sep 12, 2018

lvfangmin commented Sep 19, 2018

anmolnar commented Sep 20, 2018

anmolnar commented Sep 20, 2018

lvfangmin commented Sep 20, 2018

anmolnar left a comment

Choose a reason for hiding this comment

anmolnar commented Sep 21, 2018

asfgit commented Sep 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanm commented Sep 21, 2018

hanm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvfangmin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvfangmin commented Sep 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkalmar left a comment •

edited

Loading