Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16615: Reinstate Jersey app-per-configset #1314

Merged

Conversation

gerlowskija
Copy link
Contributor

@gerlowskija gerlowskija commented Jan 25, 2023

https://issues.apache.org/jira/browse/SOLR-16615

Description

Currently, our Jersey integration involves creating an 'ApplicationHandler' class for each SolrCore. These classes are expensive and their creation can have a noticeable cumulative effect at startup time on nodes hosting many many cores. (See SOLR-16531 for more details.)

Solution

This PR addresses this problem by allowing ApplicationHandler's to be shared across multiple SolrCores, as long as each of those cores use the same configset. To do this we introduce JerseyAppHandlerCache, which takes over ownership of all core-level ApplicationHandler's. AH's are ref-counted so that they can be removed from the cache and closed when no longer needed by any SolrCore's.

A previous attempt at this functionality was committed but then reverted due to test failures. This new PR reinstates the functionality after fixing a few bugs and addressing some additional review feedback.

Tests

See diff for new JUnit tests.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

Previously reverted due to a large bug that crept in while resolving
some merge conflicts, this PR adds the app-per-configset code back.

Still TODO
  - address David's addtl review feedback on the original PR
@gerlowskija
Copy link
Contributor Author

(Credit to Kevin for pointing this out). Anyone familiar with the previous PR see what has changed since then with this link

@dsmiley
Copy link
Contributor

dsmiley commented Jan 25, 2023

I don't think the ApplicationHandler needs to be RefCount'ed because there is no intrinsic closing work to do (unlike SolrIndexSearcher & SolrCore, say). RefCount'ing is no fun -- it's awkward; not an API I reach for unless there's no better option. We just don't want to retain them in memory; that's all. There are already utilities for this -- CaffeineCache with weak references.

Still need to address a minor race condition around concurrent
inc/decref calls for JerseyAppHandlerCache entries.
@gerlowskija
Copy link
Contributor Author

I don't think the ApplicationHandler needs to be RefCount'ed because there is no intrinsic closing work to do (unlike SolrIndexSearcher & SolrCore, say). RefCount'ing is no fun -- it's awkward; not an API I reach for unless there's no better option. We just don't want to retain them in memory; that's all. There are already utilities for this -- CaffeineCache with weak references.

I'm not super familiar with CaffeineCache, but I'll look into it. Thanks for the suggestion @dsmiley

Two initial questions I'd have:

  1. Does it bring along any sort of overhead? I see it's tracking a bunch of metrics and historical data for cache entries, computing RAM usage estimates, etc. Does that stuff have any non-negligible cost? Seemed reasonable to ask as performance is the driving motivation for this PR...
  2. I understand the drive of switching from RefCounted to WeakReference. But doesn't that get us what we need right there? What does switching the enclosing data structure from ConcurrentHashMap to CaffeineCache get us - we don't need (or want?) cache-eviction or any of the other stuff that CC would bring, unless I'm missing some reason we'd need to evict? Feel like I'm missing some aspect of what you're suggesting...

@dsmiley
Copy link
Contributor

dsmiley commented Jan 25, 2023

  1. I believe its overhead is minimal; we use it already in Solr for many of our caches on the query path. We layer on our own metrics :-)
  2. CHM doesn't have weak values natively -- Caffeine supports this option. It's an internal detail, not something we wrap ourselves, so is similar to understand/review.

Allows a JerseyAppHandlerCache to present a much cleaner interface by
eliminating the need to RefCount entirely.  By removing RefCounting,
also fixes a race condition that David S pointed out between when items
were being fetched from a ConcurrentMap and when they were inc/decref'd.
@gerlowskija
Copy link
Contributor Author

Alright - perfect. This ended up being much much nicer I think!

I ended up using Caffeine.newBuilder directly instead of Solr's CaffeineCache wrapper. (CaffeineCache doesn't expose Caffeine's native support for weak-references afaict, and comes with a good bit of other baggage that we don't need here.). But it allows us to get rid of the ref-counting and all that by using "weak reference eviction"

In eliminating ref-counting, it also clears up the race condition that David mentioned here, so a win-win all around.

@gerlowskija
Copy link
Contributor Author

gerlowskija commented Jan 26, 2023

Thanks for the remaining fixes Houston. It LGTM, and I think addresses all of David's outstanding comments from the previous PR?

Running check locally now - would love to get this merged before the weekend!

Comment on lines 1191 to 1193
public String effectiveId(String configSetName) {
return configSetName + "-" + getName() + "-" + znodeVersion + "-" + rootDataHashCode;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels off. "Generates a String ID to represent the ConfigSet". No... this is on SolrConfig; it should be an ID to represent the ID of this SolrConfig. I'm skeptical configSetName even needs to be a part of this. Sure in practice it will will be by-configSet but as a by-product of configSets having different configs. If there is a reason ApplicationHandler needs to be by ConfigSet even if there may be identical solrconfig's (which I could imagine being so) then I think it's the caller of this method to concatenate the configSet name. Or build the configSet's identity into an existing field to reference so that it's not passed in (I less prefer).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the documentation was meant to say SolrConfig, not ConfigSet.

I'm skeptical configSetName even needs to be a part of this.

I'm also skeptical, but saw no harm in adding it.

then I think it's the caller of this method to concatenate the configSet name.

I did this originally then added the configSetName here. In general if we don't see a use case of sharing SolrConfigs between configSets, then I don't really see a need to make that a possibility. @gerlowskija can weigh in, but I don't really want to open that rats nest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical configSetName even needs to be a part of this.

I think it should be a part of the ID.

we don't see a use case of sharing SolrConfigs between configSets
agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with it being as a part of the ID so long as configSet is a field in the SolrConfig so that it's more clearly a part of its identity. Adding as a parameter on this method is suspicious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical configSetName even needs to be a part of this

I didn't have a single specific reason in mind. It just seemed prudent from an "abundance of caution" perspective. Maybe some bug will crop up where we'd want to eliminate AH sharing, maybe a paranoid admin running a multi-tenant service would want to ensure sharing doesn't happen, etc.

That said I don't feel strongly about where the configset name gets added. I'll move it closer to the actual caching as suggested.

solr/core/src/java/org/apache/solr/core/SolrCore.java Outdated Show resolved Hide resolved
* <p>ApplicationHandler creation is expensive; caching these objects allows them to be shared by
* multiple cores with the same configuration.
*/
public class JerseyAppHandlerCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class now seems useless -- and that's a good thing :-). Less to name & document -- can be done so more succinctly in-context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerlowskija WDYT?

@@ -34,15 +34,15 @@
* performed on tbhis gives a new copy of the object with the changed value
*/
public class ConfigOverlay implements MapSerializable {
private final int znodeVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why this rename was necessary ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because its no longer necessarily the znode version, it could be the file "version" for a file-system based configSet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a znode with version 0 is deleted and recreated quickly then it comes back with the same version and this missed updates before

solr/core/src/java/org/apache/solr/core/CoreContainer.java Outdated Show resolved Hide resolved
@@ -178,8 +179,12 @@ private class ResourceProvider implements Function<String, InputStream> {
ZkSolrResourceLoader.ZkByteArrayInputStream zkin =
(ZkSolrResourceLoader.ZkByteArrayInputStream) in;
zkVersion = zkin.getStat().getVersion();
hash = Objects.hash(zkin.getStat().getCtime(), zkVersion, overlay.getZnodeVersion());
hash = Objects.hash(zkin.getStat().getCtime(), zkVersion, overlay.getVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is zkin.getStat().getCtime() used to compute hash? isn't zkVersion not enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, but that was the behavior before the PR. I'm unsure whether the hash computed here is even used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a re-created file (removed then added) will have its zkVersion reset to 0. So incorporating CTime guards against this.

Comment on lines 1191 to 1193
public String effectiveId(String configSetName) {
return configSetName + "-" + getName() + "-" + znodeVersion + "-" + rootDataHashCode;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical configSetName even needs to be a part of this.

I think it should be a part of the ID.

we don't see a use case of sharing SolrConfigs between configSets
agree

.computeIfAbsent(
effectiveSolrConfigId,
() -> {
log.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add log.isDebugEnabled()`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Note we have checks that can tell when it's helpful and when it isn't. It isn't helpful in this circumstance because the interpolated arguments are plain references -- not method calls that might do expensive stuff. The log.isDebugEnabled is verbose IMO for too little gain.

@gerlowskija
Copy link
Contributor Author

Alright, I think I addressed everyone's comments? Lmk what you think when you guys get a chance...

@HoustonPutman
Copy link
Contributor

Wait this definitely deserves a CHANGES.txt entry if it doesn't already have one

@gerlowskija
Copy link
Contributor Author

I'm kindof on the fence about a CHANGES.txt entry. Why would a user ever care about this? What would they do with the information? (i.e. Yes this fixes a perf regression, but it's one that was never released.)

I'll draft up an entry just mentioning that the Jersey AppHandlers are now shared where possible - I'll push that up shortly and then merge this so that Jenkins can test away over the weekend. If folks have thoughts about the CHANGES.txt entry or additional review, happy to address that all.

@gerlowskija gerlowskija merged commit 73c938a into apache:main Jan 30, 2023
@gerlowskija gerlowskija deleted the SOLR-16615-reinstate-app-per-configset2 branch January 30, 2023 10:27
gerlowskija added a commit that referenced this pull request Feb 4, 2023
Previously reverted due to a large bug that crept in while resolving
some merge conflicts, this PR adds the app-per-configset code back.

Jersey 'ApplicationHandlers' are now shared by cores in the same JVM
with the same configset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants