New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-24558][API/DataStream]make parent ClassLoader variable which c… #17521
Conversation
…reate by ClassLoaderFactory ,relevant issue https://issues.apache.org/jira/projects/FLINK/issues/FLINK-24558
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 4f842b9 (Tue Oct 19 12:03:33 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
@baisui1981 checkstyle failed. You can run |
Also I'd like to know who can be cc to for API/DataStream related changes now? |
cc @AHeise from the mailing list discussion. |
i have fix it |
Hey, thank you very much for your contribution.
You can ping me. But this is not really datastream looking at the approach. |
I don't think it is a good idea to give users direct access to this part of the code. It just yet again increases the API surface, and for some very important internal thing that we need to be able to change at a whim. Furthermore, I don't see yet how this would solve the issue at hand. The proposed interface provides no differentiating factor that could be used to create different classloaders for each task (like the Task ID). Even then, the classloader is shared across different tasks running on the same TM, so it must behave the same way? Given that they all have access to the same jars, I'm curious how the behavior is supposed to be different in the first place. All in all, I think this needs way more discussion. |
hi @zentol thanks for your reply, We are building a data center product based on flink, expecting to integrate various third-party components which is provided as flink in my project I have written a class TISFlinClassLoaderFactory by extend with new introduced interface of
thanks for @zentol your reminding , maybe I shall make the parent classloader unchangeable, because as you say it will shared across different tasks running on the same TM, Instead , extend directly from plugin inventory (differentiating factor that could be used to create different classloaders) is store in jar manifest , that submit from flink clientSide as param public BlobLibraryCacheManager.ClassLoaderFactory buildServerLoaderFactory(
FlinkUserCodeClassLoaders.ResolveOrder classLoaderResolveOrder
, String[] alwaysParentFirstPatterns, @Nullable Consumer<Throwable> exceptionHander, boolean checkClassLoaderLeak) {
return new BlobLibraryCacheManager.DefaultClassLoaderFactory(classLoaderResolveOrder
, alwaysParentFirstPatterns, exceptionHander, checkClassLoaderLeak) {
@Override
public URLClassLoader createClassLoader(URL[] libraryURLs) {
try {
PluginManager pluginManager = TIS.get().getPluginManager();
if (libraryURLs.length != 1) {
throw new IllegalStateException("length of libraryURLs must be 1 , but now is:" + libraryURLs.length);
}
for (URL lib : libraryURLs) {
try (JarInputStream jarReader = new JarInputStream(lib.openStream())) {
Manifest manifest = jarReader.getManifest();
Attributes pluginInventory = manifest.getAttributes("plugin_inventory");
if (pluginInventory == null) {
throw new IllegalStateException("plugin inventory can not be empty in lib:" + lib);
}
for (Map.Entry<Object, Object> pluginDesc : pluginInventory.entrySet()) {
pluginManager.dynamicLoadPlugin(String.valueOf(pluginDesc.getKey()));
}
}
}
return new TISChildFirstClassLoader(pluginManager.uberClassLoader, libraryURLs, this.getParentClassLoader()
, this.alwaysParentFirstPatterns, this.classLoadingExceptionHandler);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
};
} @zentol how about it , give me some suggestions ,thanks |
But the user-jar is the same for all tasks. You can't use a manifest in said jar for differentiating which classes should now be loaded / made accessible. You either a) need the CL factory to create a task-specific classloader (which you can't because jars are the same and CLs are shared) or b) need the cooperation from the user-code to load classes in a specific way. We are painfully aware of the dependency conflicts between connectors; there are some ideas floating around to make connectors themselves work as plugins such that they each have their own classloader, but I'm not sure where exactly we are at on that front. I think we'd rather focus on that than introducing a separate mechanism for effectively the same use-case that would also be more difficult to use. |
I must admit that I still haven't fully understood the use case and the solution. Each application bundles the connectors. Say application A is using Kafka 2.4 and application B is using Kafka 2.6. Now I can run both applications in the same cluster without any issues afaik, as both are executed in separate class loaders. That's the state of Flink since a couple of versions already. With your proposal, application A may now use multiple classloaders but it can still only bundle one Kafka version. So I don't see the added value. We would need to add something like a jar in jar approach, where several Kafka versions are bundled. That could be implemented in the application jar entirely. Then the question is: is it worth it? How often does one specific application have conflicting connector versions? For most use cases, it does not seem to matter. There are very large Flink cluster installation, where the current approach seems to be sufficient. I'm probably overlooking something basic here. |
@zentol Please forgive me for my negligence, I didn’t explain the process clearly. code as below FlinkTaskNodeController.java Manifest manifest = new Manifest();
Map<String, Attributes> entries = manifest.getEntries();
Attributes attrs = new Attributes();
attrs.put(new Attributes.Name(collection.getName()), String.valueOf(timestamp));
// put Flink job name
entries.put(TISFlinkCDCStart.TIS_APP_NAME, attrs); when submit to the server side, in my customized extend point implementation method ClassLoaderFactoryBuilder.buildServerLoaderFactory,will extract the param from the submitted jar manifest, at then pull the plugin bundles from TIS plugin repository with http protocol, and initialize the PluginManager which is responsible for load class . |
@zentol @AHeise Thanks to both of you, have already understood me and have understood what I want to express, dirive by this mission, The ISSUE mentioned above was found in the process of building this product, Because building a Flink Job which is drived by user defined DSL in a production environment and submitting it is fully automated, if problems such as dependency conflicts between connectors are found in the process, they cannot be solved by manual intervention, so they just can be solved by facility of CL isolation. One of the ways I thought of this kind of problem. For FLink, only a new extension point is added, which has no effect on the existing functions of flink. For the OCP principle, I think this is a good way to implement it. And that will Added a new implementation option for users like me |
Sorry for coming back so late: I still don't quite understand how the flow is supposed to look like. Let's say I have a user.jar that through DSL depends on flink-kafka and flink-kinesis, both of them use incompatible guava. Now with your approach, the same user.jar is loaded through 2 classloaders to access kafka and kinesis. Did I get that right? What I fail to understand is how the incompatible version of Guava are put into the user.jar. If I just shade without relocation then one Guava version simply wins by overwriting the files. So you need to relocate Guava into kafka.guava and kinesis.guava. But at that point, I don't see the need for separate CL anymore. So I'm missing a piece of information here. |
thanks @AHeise for your reply, Sorry, I have not explained a detail before。For better illustration, I drew a diagram as below: There is a prerequisite for the processing flow. We need to build a plug-in repository based on the http protocol. The user needs to deploy the plugin to the warehouse in advance there are four steps in the process:
|
I'm sorry, but that implementation sounds really unsafe and will only work for the simplest of use-cases. It's now a coin toss as to whether the correct guava version will be loaded. Actually, if we take an application cluster, it is equivalent to just adding the connectors to These kind of problems are exactly why we don't want to expose this to users. |
@zentol ,I must admit that this situation exists, but whether this is a security consideration can be left to the For flink, all need to do is to add an |
Hi, @baisui1981 , Would you mind checking the CI result 'src/main/java/org/apache/flink/runtime/execution/librarycache/ClassLoaderFactoryBuilder.java:[22,8] (imports) UnusedImports: Unused import: org.apache.flink.runtime.rpc.FatalErrorHandler.' ? Thx. |
@RocMarshal thanks for your remind , I'm sorry for having not remove this unused import class from carelessness. I have fixed it. |
@baisui1981 Thanks for the update. |
thanks for your remind,I will add some test cases for the change and fix the checkstyle erros. |
I have make the fixes done,thanks for your reviews @RocMarshal |
@RocMarshal @zentol @AHeise PTAL thx |
@RocMarshal @zentol @AHeise PTAL thx |
1 similar comment
@RocMarshal @zentol @AHeise PTAL thx |
@RocMarshal @zentol @AHeise could you please give me some suggestion ,how to continue processing for this PR? |
3 similar comments
@RocMarshal @zentol @AHeise could you please give me some suggestion ,how to continue processing for this PR? |
@RocMarshal @zentol @AHeise could you please give me some suggestion ,how to continue processing for this PR? |
@RocMarshal @zentol @AHeise could you please give me some suggestion ,how to continue processing for this PR? |
Will not be supported, |
…reate by ClassLoaderFactory ,relevant issue https://issues.apache.org/jira/projects/FLINK/issues/FLINK-24558
What is the purpose of the change
make server side classLoader which create by BloblibraryCacheManager.DefaultClassLoaderFactory pluggable in able to make the parent classloader of ChildFirstClassLoader variable
Brief change log
ClassLoaderFactoryBuilder
which can be extensible by user,the instance ofClassLoaderFactoryBuilder
is instantiable by ServiceLoaderDefaultClassLoaderFactory
, if a customizeClassLoaderFactoryBuilder
instance can be loaded by ServiceLoader ,then the class loading applies will delegate to it.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation