-
Notifications
You must be signed in to change notification settings - Fork 475
improves zoocache performance #5040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ZooCache currently has zero contention among threads for reads of cached data. It did this w/ snapshots of data. However it has two problems for updates. First there is global lock for writes. Second writes are expensive because the snapshots for reads must be recomputed, which can lead to O(N^2) behavior for a large series of rapid small updates. This change removes the global update lock and the snapshots and replaces them with a ConcurrentHashMap. It also replaces the three maps that existed with a single ConcurrentHashMap that has a complex value that represented the data in the three maps. This single complex values allows removal of the global lock wich existed to keep the three maps in sync for a given path. Now for any path all of its data is stored in a single value in the map and can be safely updated with the compute function on ConcurrentHashMap which only allows one thread to update a path at a time. This change should maintain similar read behavior performance as the snapshots because ConcurrentHashMap does not block reads for writes, if there is a concurrenty compute operation happening when a read happens then it will return the previously computed value. This is the same behavior that ZooCache used to have. So hopefully this change has similar read performance as before and much better update performance.
|
Noticed that zoocache may not be properly caching the absence of a node for the case of getChildren, but did not want to change that in this PR. Will open a follow on issue. For this changes wanted to maintain the exact same behavior and only change the concurrency model. Will open an issue about back-porting the changes to 2.1.0 or 3.1.0 if they do not cause problems in 4.0.0 testing. |
dlmarion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok, but I think we should resolve the TODOs before merging.
| try { | ||
| final ZooKeeper zooKeeper = getZooKeeper(); | ||
| List<String> children = null; | ||
| // TODO zookeeper may never return null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this TODO mean? Are you suggesting that ZooKeeper.getChildren may only return an empty list and never return null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look into this. I think the code is assuming that when a node does not exist that null is returned. However when a node does not exist the javadocs say that a KeeperException.NONODE is thrown. So I think the code is attempting to cache non-existence but is not actually doing that. I need to remove this todo and open an issue prior to merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes preserve the existing possibly incorrect behavior as I was not sure if it was incorrect or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #5047
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the TODOs in 2f2a424
|
Wrote the following test to experiment with these changes from a performance perspective. package org.apache.accumulo.test.zookeeper;
import static java.nio.charset.StandardCharsets.UTF_8;
import static org.apache.accumulo.harness.AccumuloITBase.ZOOKEEPER_TESTING_SERVER;
import java.io.File;
import java.util.List;
import java.util.UUID;
import org.apache.accumulo.core.Constants;
import org.apache.accumulo.core.fate.zookeeper.ZooCache;
import org.apache.accumulo.core.fate.zookeeper.ZooReaderWriter;
import org.apache.accumulo.core.fate.zookeeper.ZooUtil;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Tag;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;
@Tag(ZOOKEEPER_TESTING_SERVER)
public class ZooCacheIT {
private static ZooKeeperTestingServer szk = null;
private static ZooReaderWriter zk = null;
private static final String ZK_ROOT = "/accumulo/" + UUID.randomUUID();
@TempDir
private static File tempDir;
@BeforeAll
public static void setup() throws Exception {
szk = new ZooKeeperTestingServer(tempDir);
zk = szk.getZooReaderWriter();
zk.mkdirs(ZK_ROOT + Constants.ZTABLES);
}
@AfterAll
public static void teardown() throws Exception {
szk.close();
}
@Test
public void testLotsOfNodes() throws Exception {
for (int i = 0; i < 30_000; i++) {
zk.putPersistentData(ZK_ROOT + Constants.ZTABLES + "/" + i, "i".getBytes(UTF_8),
ZooUtil.NodeExistsPolicy.FAIL);
}
var zc = new ZooCache(zk, null);
List<String> children = zc.getChildren(ZK_ROOT + Constants.ZTABLES);
Assertions.assertEquals(30_000, children.size());
for (int i = 0; i < 20; i++) {
long t1 = System.currentTimeMillis();
for (var child : children) {
zc.get(ZK_ROOT + Constants.ZTABLES + "/" + child);
}
long t2 = System.currentTimeMillis();
System.out.println(i + " : " + (t2 - t1) + "ms");
}
}
}When running against these changes saw the following. Assuming the first read from ZK its taking 22s to do 30K RPC to ZK. When running these test against the code that existed before these changes saw the following where it takes 57s to read the 30K entries the first time. So the extra CPU overhead is causing it to take 2.6 times longer. |
ZooCache currently has zero contention among threads for reads of cached data. It did this w/ snapshots of data. However it has two problems for updates. First there is global lock for writes. Second writes are expensive because the snapshots for reads must be recomputed, which can lead to O(N^2) behavior for a large series of rapid small updates.
This change removes the global update lock and the snapshots and replaces them with a ConcurrentHashMap. It also replaces the three maps that existed with a single ConcurrentHashMap that has a complex value that represented the data in the three maps. This single complex values allows removal of the global lock which existed to keep the three maps in sync for a given path. Now for any path all of its data is stored in a single value in the map and can be safely updated with the compute function on ConcurrentHashMap which only allows one thread to update a path at a time.
This change should maintain similar read behavior performance as the snapshots because ConcurrentHashMap does not block reads for writes, if there is a concurrent compute operation happening when a read happens then it will return the previously computed value w/o blocking. This is the same behavior that ZooCache used to have. So hopefully this change has similar read performance as before and much better update performance. Updates to different paths should be able to proceed in parallel and the snapshots no longer need to be computed on update.