New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-7442] Add option for using child-first classloader for loading user code #4554
[FLINK-7442] Add option for using child-first classloader for loading user code #4554
Conversation
@StephanEwen Could you please have a look since this |
I took only a very brief look at this, but I am not totally sure whether the We have a working version of a ChildFirstClassLoader here, why not use that? Is that implementation suboptimal? Line 78 in fa11845
We should also never reference the System Classloader directly, it breaks all embedded setups or service architecture (OSGI) setups where you end up with hierarchical class loaders. My feeling is also that this does many changes that may not be necessary, like change the setup of the client, packaged program, etc. Passing the configuration through everything makes this change rather involved. I was wondering if it is not sufficient to simply let the TaskManager pass this as a flag to the library cache manager. Then you would not need to pass configs everywhere - the only ever config access is by the TaskManager or Task when it creates the classloader, and the config is available there anyways. Concerning class loader setup on the client - not sure if we should change this in the same PR. This is probably much less critical (the main method does not instantiate many of the dependencies) and that part changes so heavily with flip-6 already. Various setups may not even have separate classloaders on the client anyways, but everything is in the app class loader there. |
The system class loader stuff was leftover from an earlier version, I didn't mean to have that in there. In the first version I copied the Line 78 in fa11845
However, when I added the end-to-end tests I noticed that Akka was not working correctly anymore because it couldn't find some configuration stuff. (I'm running that again so that I can post the actual error messages) I also changed the class loader code on the client because I thought people might be just as likely to use their clashing dependencies there as in operators. In that case they would suffer from the same problems. We're, after all, also creating a user-code class loader here. I'm happy to be convinced otherwise, though. |
This is the exception I got with the class loader from https://github.com/apache/flink/blob/fa11845b926f8371e9cee47775ca0e48176b686e/flink-contrib/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDbMultiClassLoaderTest.java:
This is on the client but just seeing it makes me believe that that class loader could cause problems. |
Ah, I see - that is probably related to reading resources from a jar file (like the config), methods like I think that particular bug you encountered there is almost an argument to not change the logic on the client, yet. Since we would run the entire Flink code through that classloader as well, the implications are trickier than in the runtime task, where we only instantiate the user code functions. |
Probably, yes. I also don't know what other stuff this could break, though. For example, I don't know if queryable state, which starts netty (and Akka) stuff will still work with this. Unfortunately we don't have any end-to-end tests for that. |
I think what you mentioned is one more reason to not use this in too many places for now, but only inside the TaskManager / Tasks. Let's introduce that as a tool that users can use to resolve conflicts and gather some feedback before we pull that into client / queryableStateClient / etc... |
Yes, I created an alternative PR for that: #4564 My only worry is that all the tests in that PR would also pass with the other child-first class loader implementation from the RocksDB test, meaning that we don't actually have coverage for something that I discovered by having the class loader in the client. If we're fine with that I will close this PR and merge the other one once it's reviewed. 👌 |
What is the purpose of the change
This PR introduces a new core option (
classloader.resolve-order: child-first
) that allows using a child-first class loader for user code. The default is still to use a parent-first class loader.This also does a minor refactoring in the way the blob manager retrieves the cleanup interval. It's now also read from the
Configuration
, since we already have theConfiguration
for the class loader settings.Brief change log
Configuration
thought to all places where we previously created a user class loaderVerifying this change
This PR introduces new end-to-end tests that verify the new feature in a complete Flink workflow, including starting the program using
bin/flink run
.Does this pull request potentially affect one of the following parts:
This affects class loader, which is quite important to get right.
Documentation