Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with SPI components (e.g., AnalysisSPILoader) when refreshing from different classloader and closing previous classloader #13101

Open
AndreyBozhko opened this issue Feb 13, 2024 · 4 comments
Assignees
Labels

Comments

@AndreyBozhko
Copy link
Contributor

AndreyBozhko commented Feb 13, 2024

Description

Applications like Solr allow users to reference token filters, tokenizers, etc. by their SPI names - and Lucene provides utility methods to discover and load these services.

The discovery relies on AnalysisSPILoader#reload(ClassLoader) method which scans the provided classloader's classpath, and loads and caches available services.

However, if this method is called again (with a different classloader), it makes no attempt to invalidate the cache of available services.

This means that if:

  • the cache already contains a mapping "myservice" -> org.example.MyServiceFactory, where MyServiceFactory class was defined by classloader1,
  • the reload method is called again with classloader2, so that
    • "myservice" is discovered again
    • MyServiceFactory class is defined again (but by classloader2)

then:

  • the AnalysisSPILoader cache will retain the MyServiceFactory class defined by classloader1 (and not classloader2).

This behavior places restrictions on how the applications can reload the SPI - and it does not work well with how Solr manages the lifecycle of its classloaders.

Specific example

Basically, whenever Solr adds jars to classpath at runtime, it replaces its current classloader with a new instance of URLClassLoader, closes the old classloader, and reloads the SPI.

So this may lead to a situation where the AnalysisSPILoader cache keeps a reference to MyServiceFactory class that was defined by classloader1 (which is now closed because Solr closed it and switched to classloader2).

As a result, when the application code retrieves MyServiceFactory by its SPI name, and the factory executes code which requires loading some other class from classloader1's classpath - the loading fails because Solr already invoked ((URLClassLoader) classloader1).close().

And from users's perspective, this issue manifests in rather cryptic ClassNotFoundException.

Version and environment details

Lucene 9.8.0

@uschindler uschindler self-assigned this Feb 14, 2024
@uschindler
Copy link
Contributor

This is a general problem of Service providers in Java.

As Lucene only has a single instance in the parent classloader (Lucene is part of Solr's classloader) it only has a single lookup to find analysis components.

Your proposal will cause the same problem if other cores. If you unload a core and it removes it's analysis components, the anal is in another core will fail from that point on.

To fix this correctly, Solr would need to have a separate Lucene impl per core. This is undoable. In general the per core classpaths in Solr are strange. Codecs, PostingsFormats, DocvaluesFormats and analysis components should be loaded in same classloader like Solr and Lucene itself.

There is no solution to this well known problem.

@AndreyBozhko
Copy link
Contributor Author

AndreyBozhko commented Feb 15, 2024

Thanks for taking a look @uschindler - I have to admit that I didn't fully grasp your comment about having a separate Lucene per core.

My original understanding was that for the example I described, the loading and reloading of SPI occurs during the Solr node (and not core) startup. So very early in Solr node's lifecycle, it would load all Lucene classes (makes total sense that these are per-node), and also load other libraries (user-provided plugins) shared at the node level.

But even if we only deal with shared classes, users can still encounter a ClassNotFoundException similar to the example above.

@HoustonPutman Please correct me if there's something that I misinterpreted about the issue from our initial discussion.

@AndreyBozhko
Copy link
Contributor Author

I do realize that the issue title may have sounded too prescriptive - I certainly didn't mean that. But I would definitely like to have a discussion on what path forward (if any) makes the most sense.

I realize that the feedback may be along the lines of "The behavior of Lucene's AnalysisSPILoader is set in stone, and Solr should just work around that".

Or, as a hypothetical alternative, Lucene could expose a way to invalidate the SPI cache, and applications like Solr may choose to use that (having full control, but also full responsibility over what they're doing).

Or perhaps there are other ways to go about this.

What are your thoughts?

@uschindler
Copy link
Contributor

uschindler commented Feb 15, 2024

Hi,
I think you should also open an issue in Apache Solr, because basically the problem is more on Solr's side. As said before th

I also thought yesterday evening about the same ide: Allowing to clear the cache. The problem with that is the following:

I could think of another idea:

  • Add another method or boolean parameter to SPI lookup classes (there are many: Codec, PostingsFormat, ....) to enforce cleaning the cache before refresh. Actually the whole thing must be somehow synchronized - I haven't looked at the code for long time.
  • Solr calls that method with the extra "clear=true" parameter.

I could live with that, but I have to state: It is just a workaround for something that is generally broken in Solr. JARs/plugins with Analysis components should be installed in the uber Solr classloader and not per core. This has already brought various problems with ICU plugin and hundreds of support issues of people that have no idea where to place the JAR file and which config files to change. It also caused hazard if different versions of same SPI components lurking in different cores. This would also cause major hazards here.

I tend to think that Solr should "deprecate" the per-Core classloader and have the "lib/" syntax not in solrconfig.xml but in the solr.xml that boots the Solr framework. This is why I think there should be a separate Solr issue.

The current behaviour of the existing method should not be changed, as it works like other SPI components seen in Java's core framework. All of them cache the loaded SPIs (XML components, PNG image loaders, filesystems,....). Lucene only has the option to find "new ones" not yet discovered (to add ones suddenly dropped in or for plugins running in separate classloaders like Elasticserach/Opensearch has). But those plugins should never be unloaded. Therefore removing no longer visible SPIs from the cache is against the usual conventions in Java world.

P.S.: The same problems you see with reloading web applications in Tomcat. That's the reason why nobody does that anymore (it was cool at around 2005, but never worked). Solr's classloader per core has the same problem like Tomcat.

@HoustonPutman @noblepaul I am not familar with recent plugin architecture: Do we have "Server level plugins in Solr already"? I'd really suggest to make sure that people can't load analysis or codec/postingsformat/... (+ soon to be coming vector similarities) SPI plugins for individual core via solrconfig.xml anymore, but have to install those plugins top-level in Solr's main class loader - hopefully with the new plugin architecture. I hope the plugin infrastructure allows that.

@uschindler uschindler changed the title AnalysisSPILoader should refresh cached service when a different classloader is used Problem with SPI comonents (e.g., AnalysisSPILoader) when refreshing from different classloader and closing previous classloader Feb 15, 2024
@uschindler uschindler changed the title Problem with SPI comonents (e.g., AnalysisSPILoader) when refreshing from different classloader and closing previous classloader Problem with SPI components (e.g., AnalysisSPILoader) when refreshing from different classloader and closing previous classloader Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants