ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294

slackhappy · 2017-12-19T18:22:48Z

SOLR-10524 introduced zk state update batching, with
a default interval of 2 seconds. That opens
the door for a simple, time-based cache on the read side
to address the issue described in SOLR-8327

SOLR-10524 introduced zk state update batching, with a default interval of 2 seconds. That opens the door for a simple, time-based cache on the read side to address the issue described in SOLR-8327

slackhappy · 2017-12-19T18:50:02Z

solr/core/src/java/org/apache/solr/cloud/Overseer.java

@@ -68,7 +68,7 @@
  public static final String QUEUE_OPERATION = "operation";

  // System properties are used in tests to make them run fast
-  public static final int STATE_UPDATE_DELAY = Integer.getInteger("solr.OverseerStateUpdateDelay", 2000);  // delay between cloud state updates
+  public static final int STATE_UPDATE_DELAY = ZkStateReader.STATE_UPDATE_DELAY;


Moved so that I could access this setting in ZkStateReader, but left an alias here for locality.

slackhappy · 2017-12-19T18:52:30Z

solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java

    }

    @Override
-    public DocCollection get() {
+    public synchronized DocCollection get() {


I thought synchronized here would provide a good enough performance increase without the complexity of other approches

dragonsinth · 2017-12-19T20:19:39Z

This approach seems fine to me. Remind me why we use nanoTime vs. normal clock? I'm sure you're right I just want to refresh my brain.

elyograg · 2017-12-19T20:51:13Z

Java programs are migrating to nanoTime instead of currentTimeMillis for elapsed time because many people have found that the latter will go backwards on occasion. It is not monotonic.

Using nanoTime should be far less likely to go backwards. That undesirable behavior has been observed in the wild, but should be rare. Supposedly nanoTime is monotonic if the OS properly supports a monotonic clock. There's a lot of info out there about it:

https://www.google.com/search?q=java+nanotime+monotonic

The fact that nanoTime might produce elapsed times with greater accuracy than one millisecond is a bonus.

chatman · 2017-12-19T21:58:24Z

Seems like there are some test failures due to this change:

[junit4] Tests with failures [seed: DE1D5337E38D2C32]:
[junit4] - org.apache.solr.cloud.TestPullReplica.testRemoveAllWriterReplicas
[junit4] - org.apache.solr.cloud.TestPullReplica.testAddRemovePullReplica
[junit4] - org.apache.solr.cloud.CollectionTooManyReplicasTest.testAddTooManyReplicas
[junit4] - org.apache.solr.cloud.CollectionsAPIDistributedZkTest.addReplicaTest
[junit4] - org.apache.solr.cloud.DeleteShardTest.testDirectoryCleanupAfterDeleteShard
[junit4] - org.apache.solr.cloud.TestCloudRecovery.corruptedLogTest
[junit4] - org.apache.solr.cloud.TestCloudRecovery.leaderRecoverFromLogOnStartupTest
[junit4] - org.apache.solr.cloud.TestUtilizeNode.test
[junit4] - org.apache.solr.cloud.TestTlogReplica.testAddRemoveTlogReplica

Haven't looked into them yet, though.

slackhappy · 2017-12-20T00:17:46Z

I'll look into the test failures. I actually didn't mean to create the PR yet 😳

Limits the scope of the change to SOLR-8327 specifically by adding an optional, default-false option to getCollectionOrNull to allow a cached value to be used, that is only used by HttpSolrCall currently

slackhappy · 2017-12-20T20:52:04Z

I updated my PR to target SOLR-8327 more specifically, and got the tests to pass. I think a smarter approach like that used by CloudSolrClient would be great. My understanding of the change in SOLR-10524 is that even the smartest/fastest updates of zookeeper data won't match the real-world state of the cluster in many situations, such as replica state changes, because those will be batched, but certainly a smarter approach would narrow that gap as much as possible, in addition to reducing the amount of state fetching.

ctargett · 2019-01-19T00:23:24Z

https://issues.apache.org/jira/browse/SOLR-8327 was released in Solr 7.3, so this PR can be closed.

#294) LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns. We can't do this out of box with the analyzer, due to incompatible licenses. But we can make it easy on the user to do this, by linking to repo that has sample code, documentation, and the required data files.

ZkStateReader: cache LazyCollectionRef

e7c6a67

SOLR-10524 introduced zk state update batching, with a default interval of 2 seconds. That opens the door for a simple, time-based cache on the read side to address the issue described in SOLR-8327

slackhappy commented Dec 19, 2017

View reviewed changes

slackhappy changed the title ~~ZkStateReader: cache LazyCollectionRef~~ ZkStateReader: cache LazyCollectionRef (SOLR-8327) Dec 19, 2017

ZkStateReader: optional caching of LazyCollectionRef

4cf3305

Limits the scope of the change to SOLR-8327 specifically by adding an optional, default-false option to getCollectionOrNull to allow a cached value to be used, that is only used by HttpSolrCall currently

ctargett closed this Jan 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294

ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294

slackhappy commented Dec 19, 2017

slackhappy Dec 19, 2017

slackhappy Dec 19, 2017

dragonsinth commented Dec 19, 2017

elyograg commented Dec 19, 2017

chatman commented Dec 19, 2017

slackhappy commented Dec 20, 2017

slackhappy commented Dec 20, 2017

ctargett commented Jan 19, 2019

ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294

ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294

Conversation

slackhappy commented Dec 19, 2017

slackhappy Dec 19, 2017

Choose a reason for hiding this comment

slackhappy Dec 19, 2017

Choose a reason for hiding this comment

dragonsinth commented Dec 19, 2017

elyograg commented Dec 19, 2017

chatman commented Dec 19, 2017

slackhappy commented Dec 20, 2017

slackhappy commented Dec 20, 2017

ctargett commented Jan 19, 2019