Skip to content

Conversation

@keith-turner
Copy link
Contributor

ZooCache currently has zero contention among threads for reads of cached data. It did this w/ snapshots of data. However it has two problems for updates. First there is global lock for writes. Second writes are expensive because the snapshots for reads must be recomputed, which can lead to O(N^2) behavior for a large series of rapid small updates.

This change removes the global update lock and the snapshots and replaces them with a ConcurrentHashMap. It also replaces the three maps that existed with a single ConcurrentHashMap that has a complex value that represented the data in the three maps. This single complex values allows removal of the global lock which existed to keep the three maps in sync for a given path. Now for any path all of its data is stored in a single value in the map and can be safely updated with the compute function on ConcurrentHashMap which only allows one thread to update a path at a time.

This change should maintain similar read behavior performance as the snapshots because ConcurrentHashMap does not block reads for writes, if there is a concurrent compute operation happening when a read happens then it will return the previously computed value w/o blocking. This is the same behavior that ZooCache used to have. So hopefully this change has similar read performance as before and much better update performance. Updates to different paths should be able to proceed in parallel and the snapshots no longer need to be computed on update.

ZooCache currently has zero contention among threads for reads of cached
data.  It did this w/ snapshots of data.  However it has two problems
for updates. First there is global lock for writes.  Second writes are
expensive because the snapshots for reads must be recomputed, which can
lead to O(N^2) behavior for a large series of rapid small updates.

This change removes the global update lock and the snapshots and
replaces them with a ConcurrentHashMap.  It also replaces the three maps
that existed with a single ConcurrentHashMap that has a complex value
that represented the data in the three maps.  This single complex values
allows removal of the global lock wich existed to keep the three maps in
sync for a given path.  Now for any path all of its data is stored in a
single value in the map and can be safely updated with the compute
function on ConcurrentHashMap which only allows one thread to update a
path at a time.

This change should maintain similar read behavior performance as the
snapshots because ConcurrentHashMap does not block reads for writes, if
there is a concurrenty compute operation happening when a read happens
then it will return the previously computed value.  This is the same
behavior that ZooCache used to have.  So hopefully this change has
similar read performance as before and much better update performance.
@keith-turner keith-turner added this to the 4.0.0 milestone Nov 7, 2024
@keith-turner
Copy link
Contributor Author

Noticed that zoocache may not be properly caching the absence of a node for the case of getChildren, but did not want to change that in this PR. Will open a follow on issue. For this changes wanted to maintain the exact same behavior and only change the concurrency model.

Will open an issue about back-porting the changes to 2.1.0 or 3.1.0 if they do not cause problems in 4.0.0 testing.

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok, but I think we should resolve the TODOs before merging.

try {
final ZooKeeper zooKeeper = getZooKeeper();
List<String> children = null;
// TODO zookeeper may never return null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this TODO mean? Are you suggesting that ZooKeeper.getChildren may only return an empty list and never return null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into this. I think the code is assuming that when a node does not exist that null is returned. However when a node does not exist the javadocs say that a KeeperException.NONODE is thrown. So I think the code is attempting to cache non-existence but is not actually doing that. I need to remove this todo and open an issue prior to merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes preserve the existing possibly incorrect behavior as I was not sure if it was incorrect or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #5047

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the TODOs in 2f2a424

@keith-turner
Copy link
Contributor Author

Wrote the following test to experiment with these changes from a performance perspective.

package org.apache.accumulo.test.zookeeper;

import static java.nio.charset.StandardCharsets.UTF_8;
import static org.apache.accumulo.harness.AccumuloITBase.ZOOKEEPER_TESTING_SERVER;

import java.io.File;
import java.util.List;
import java.util.UUID;

import org.apache.accumulo.core.Constants;
import org.apache.accumulo.core.fate.zookeeper.ZooCache;
import org.apache.accumulo.core.fate.zookeeper.ZooReaderWriter;
import org.apache.accumulo.core.fate.zookeeper.ZooUtil;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Tag;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;

@Tag(ZOOKEEPER_TESTING_SERVER)
public class ZooCacheIT {
  private static ZooKeeperTestingServer szk = null;
  private static ZooReaderWriter zk = null;
  private static final String ZK_ROOT = "/accumulo/" + UUID.randomUUID();

  @TempDir
  private static File tempDir;

  @BeforeAll
  public static void setup() throws Exception {
    szk = new ZooKeeperTestingServer(tempDir);
    zk = szk.getZooReaderWriter();
    zk.mkdirs(ZK_ROOT + Constants.ZTABLES);
  }

  @AfterAll
  public static void teardown() throws Exception {
    szk.close();
  }

  @Test
  public void testLotsOfNodes() throws Exception {
    for (int i = 0; i < 30_000; i++) {
      zk.putPersistentData(ZK_ROOT + Constants.ZTABLES + "/" + i, "i".getBytes(UTF_8),
          ZooUtil.NodeExistsPolicy.FAIL);
    }

    var zc = new ZooCache(zk, null);

    List<String> children = zc.getChildren(ZK_ROOT + Constants.ZTABLES);
    Assertions.assertEquals(30_000, children.size());

    for (int i = 0; i < 20; i++) {
      long t1 = System.currentTimeMillis();
      for (var child : children) {
        zc.get(ZK_ROOT + Constants.ZTABLES + "/" + child);
      }
      long t2 = System.currentTimeMillis();
      System.out.println(i + " : " + (t2 - t1) + "ms");
    }

  }
}

When running against these changes saw the following. Assuming the first read from ZK its taking 22s to do 30K RPC to ZK.

0 : 21593ms
1 : 14ms
2 : 13ms
3 : 14ms
4 : 15ms
5 : 6ms
6 : 6ms
7 : 7ms
8 : 13ms
9 : 14ms
10 : 6ms
11 : 9ms
12 : 6ms
13 : 6ms
14 : 6ms
15 : 5ms
16 : 5ms
17 : 6ms
18 : 6ms
19 : 6ms

When running these test against the code that existed before these changes saw the following where it takes 57s to read the 30K entries the first time. So the extra CPU overhead is causing it to take 2.6 times longer.

0 : 56883ms
1 : 11ms
2 : 11ms
3 : 11ms
4 : 11ms
5 : 9ms
6 : 9ms
7 : 8ms
8 : 6ms
9 : 5ms
10 : 6ms
11 : 5ms
12 : 9ms
13 : 6ms
14 : 5ms
15 : 5ms
16 : 5ms
17 : 5ms
18 : 5ms
19 : 6ms

@keith-turner keith-turner merged commit d26a913 into apache:main Nov 8, 2024
@keith-turner keith-turner deleted the zoocache_performance branch November 8, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants