New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294
ZkStateReader: cache LazyCollectionRef (SOLR-8327) #294
Conversation
SOLR-10524 introduced zk state update batching, with a default interval of 2 seconds. That opens the door for a simple, time-based cache on the read side to address the issue described in SOLR-8327
@@ -68,7 +68,7 @@ | |||
public static final String QUEUE_OPERATION = "operation"; | |||
|
|||
// System properties are used in tests to make them run fast | |||
public static final int STATE_UPDATE_DELAY = Integer.getInteger("solr.OverseerStateUpdateDelay", 2000); // delay between cloud state updates | |||
public static final int STATE_UPDATE_DELAY = ZkStateReader.STATE_UPDATE_DELAY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved so that I could access this setting in ZkStateReader, but left an alias here for locality.
} | ||
|
||
@Override | ||
public DocCollection get() { | ||
public synchronized DocCollection get() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought synchronized here would provide a good enough performance increase without the complexity of other approches
This approach seems fine to me. Remind me why we use nanoTime vs. normal clock? I'm sure you're right I just want to refresh my brain. |
Java programs are migrating to nanoTime instead of currentTimeMillis for elapsed time because many people have found that the latter will go backwards on occasion. It is not monotonic. Using nanoTime should be far less likely to go backwards. That undesirable behavior has been observed in the wild, but should be rare. Supposedly nanoTime is monotonic if the OS properly supports a monotonic clock. There's a lot of info out there about it: https://www.google.com/search?q=java+nanotime+monotonic The fact that nanoTime might produce elapsed times with greater accuracy than one millisecond is a bonus. |
Seems like there are some test failures due to this change: [junit4] Tests with failures [seed: DE1D5337E38D2C32]: Haven't looked into them yet, though. |
I'll look into the test failures. I actually didn't mean to create the PR yet 😳 |
Limits the scope of the change to SOLR-8327 specifically by adding an optional, default-false option to getCollectionOrNull to allow a cached value to be used, that is only used by HttpSolrCall currently
I updated my PR to target SOLR-8327 more specifically, and got the tests to pass. I think a smarter approach like that used by CloudSolrClient would be great. My understanding of the change in SOLR-10524 is that even the smartest/fastest updates of zookeeper data won't match the real-world state of the cluster in many situations, such as replica state changes, because those will be batched, but certainly a smarter approach would narrow that gap as much as possible, in addition to reducing the amount of state fetching. |
https://issues.apache.org/jira/browse/SOLR-8327 was released in Solr 7.3, so this PR can be closed. |
#294) LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns. We can't do this out of box with the analyzer, due to incompatible licenses. But we can make it easy on the user to do this, by linking to repo that has sample code, documentation, and the required data files.
SOLR-10524 introduced zk state update batching, with
a default interval of 2 seconds. That opens
the door for a simple, time-based cache on the read side
to address the issue described in SOLR-8327