KAFKA-14654: Connector classes should statically initialize with plugin classloader #13165

gharris1727 · 2023-01-25T18:05:35Z

The scanPluginPath -> getPluginDesc -> versionFor code path instantiates connectors in order to evaluate their version() method. This is the first call to initialize these classes, and so performs static initialization, which may be sensitive to the Thread Context Classloader. Currently the TCCL is just the app class loader, which may prevent the connector from discovering isolated resources.

Instead, add the loader swap in getServiceLoaderPluginDesc to getPluginDesc, in order to cover both the service-loaded and reflections-loaded classes, and in particular, initialize connectors with the correct TCCL.

Also add SamplingTestPlugin::allInstances which enables asserting the TCCL used for the initial constructor and version calls.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

Signed-off-by: Greg Harris <greg.harris@aiven.io>

…in classloader Signed-off-by: Greg Harris <greg.harris@aiven.io>

Signed-off-by: Greg Harris <greg.harris@aiven.io>

…tor-tccl

.../runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java

Signed-off-by: Greg Harris <greg.harris@aiven.io>

…tor-tccl

mukkachaitanya

Thanks @gharris1727! LG, with a minor non-blocking suggestion.

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/Plugins.java

C0urante

Thanks Greg, this looks good for the most part.

I'm wondering if we can get better coverage for DelegatingClassLoader::scanPluginPath. Right now we verify in PluginsTest::newConnectorShouldInstantiateWithPluginClassLoader that if we've initialized a Plugins instance, and we invoke Plugins::newConnector, the constructor for that connector is called with the correct context classloader. But it seems like this isn't very powerful since, if the constructor is invoked multiple times, the last invocation's classloader will be recorded--so in this case, we're really testing Plugins::newConnector and not the instantiations that are performed during plugin discovery.

C0urante · 2023-05-23T19:27:47Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/LoaderSwap.java

+    public static LoaderSwap use(ClassLoader loader) {
+        ClassLoader savedLoader = compareAndSwapLoaders(loader);
+        try {
+            return new LoaderSwap(savedLoader);
+        } catch (Throwable t) {
+            compareAndSwapLoaders(savedLoader);
+            throw t;
+        }
+    }


Adding static logic that invokes compareAndSwapLoaders is difficult to test, which was the motivation for KAFKA-14346. Can we try not to re-introduce that kind of static logic?

This is not re-introducing the static logic, it is just refactoring to eliminate the open-ended Plugins.compareAndSwap* methods.

This method is only called in two places: by DelegatingClassLoader.scanPluginPath (before scanning is finished) and Plugins.withClassLoader (after scanning is finished).

I've dropped the visibility and made the DCL call-site mock-able.

This does still involve static logic for classloader swapping, though. And the comment about internal use doesn't seem very helpful since the way we use that term ("internal") has to do with public vs. private API; it's not really clear to people that (or why) they shouldn't just upgrade the visibility to public.

Ultimately I'd prefer to see this logic duplicated in two places (DelegatingClassLoader::withClassLoader and Plugins::withCLassLoader) rather than introduce a new API that might be misused in the future.

I've reverted this change and left Plugins.compareAndSwapLoader unchanged.

.../runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java

C0urante · 2023-05-23T19:42:24Z

connect/runtime/src/main/java/org/apache/kafka/connect/cli/AbstractConnectCli.java

-        plugins.compareAndSwapWithDelegatingLoader();
-        T config = createConfig(workerProps);
-        log.debug("Kafka cluster ID: {}", config.kafkaClusterId());
+        try (LoaderSwap loaderSwap = plugins.withClassLoader(plugins.delegatingLoader())) {


This is actually incorrect; we want the delegating loader to remain the classloader even after this method exits (normally or exceptionally).

I understand that this is a change in semantics, but that change is intentional. After this method completes, operations should not require the delegating loader and should be performed via the Connect handle. That handle only has methods for starting, stopping, and interacting with the REST API, all of which should internally handle setting the context classloader when appropriate.

The reason that I'm changing this is that I think the open-ended swap methods are an anti-pattern, and lead to unexpected behavior later in the caller thread.

Also here is some of the context for this change: #13165 (comment)

Since the elimination of compareAndSwap is technically unrelated to the title change, it could be moved out to it's own PR. Let me know if you'd like me to separate the two changes.

I think it's brittle to change the context classloader back. Currently there's no additional logic that requires it, but we have a choice between adding the potential for bugs related to the context classloader and not adding it.

I get that the approach on trunk requires special treatment for integration tests, but since that's already a solved problem, I'd prefer to keep things as they are, especially since it's preferable to keep the risk in the testing portions of the code base over the main parts.

I've reverted this change.

C0urante · 2023-05-23T20:02:28Z

...ect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedConnectCluster.java

-    // we should keep the original class loader and set it back after connector stopped since the connector will change the class loader,
-    // and then, the Mockito will use the unexpected class loader to generate the wrong proxy instance, which makes mock failed
-    private final ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();


I think we need to keep this since the change to AbstractConnectCli::startConnect is incorrect.

I disagree. I think that this is a symptom of the open-ended context classloader swap having unintended downstream effects. The existing fix is adequate, but is mostly addressing the symptom rather than the problem.

(Discussed above)

Signed-off-by: Greg Harris <greg.harris@aiven.io>

gharris1727 · 2023-05-23T23:08:09Z

I'm wondering if we can get better coverage for DelegatingClassLoader::scanPluginPath. Right now we verify in PluginsTest::newConnectorShouldInstantiateWithPluginClassLoader that if we've initialized a Plugins instance, and we invoke Plugins::newConnector, the constructor for that connector is called with the correct context classloader. But it seems like this isn't very powerful since, if the constructor is invoked multiple times, the last invocation's classloader will be recorded--so in this case, we're really testing Plugins::newConnector and not the instantiations that are performed during plugin discovery.

Yeah this is a blind-spot in the existing tests. The "sampling" paradigm requires an instance of the object in order to perform the assertions, and the scanPluginPath implementation throws away the objects that it creates. The test does not and cannot assert that the TCCL is correct for the first version() call, for example.

In this specific case the regression test is still sensitive, because the static initialization happens when the plugin constructor is first called (not when the Class<?> object is created). This means that we can assert the TCCL used in the first constructor via the staticClassloader inspection.

I think the alternative would involve mocking/spying part of the scanPluginPath (such as versionFor), or keeping track of instantiated objects in SamplingTestPlugins, both of which seem messy, and would make this harder to refactor in the near future. Do you think this should be addressed now, or can it wait until the plugin path scanning refactor is landed?

C0urante · 2023-05-24T15:03:58Z

That's a good point about the static initialization taking place directly before the constructor right now, but it's possible that other logic either directly from the Connect framework or from the Reflections library can cause static initialization to take place earlier than then.

I was thinking we could statically track context classloader instances for the SamplingConnector class across instantiations of that class, and then perform assertion on all of those instances about the correct classloader being set. This wouldn't give us perfect coverage across all plugin (or plugin discovery) types, but would at least harden us against changes to plugin discovery logic.

I have a local draft of this that I'd be happy to share if it's too much work. It's certainly not as clean as the existing logic but the tradeoff of coverage for cleanliness is worth it IMO.

…g scanning" This reverts commit a42b0fb.

…Swap" This reverts commit 0453005.

Signed-off-by: Greg Harris <greg.harris@aiven.io>

gharris1727 · 2023-05-24T17:04:23Z

@C0urante I added a static list to all of the Sampling plugins that allow us to inspect the classloader used for all method calls to all instances of each plugin type. This should now perform the assertions you were describing.

Signed-off-by: Greg Harris <greg.harris@aiven.io>

C0urante

LGTM, thanks Greg!

gharris1727 added 4 commits January 25, 2023 09:59

Change SamplingTestPlugin to an interface so it can be mixed-in

d280853

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Add test for classloading isolation with connector

330e1b1

Signed-off-by: Greg Harris <greg.harris@aiven.io>

KAFKA-14654: Connector classes should statically initialize with plug…

b2bfa30

…in classloader Signed-off-by: Greg Harris <greg.harris@aiven.io>

fixup: typo in javadoc

44ea45a

Signed-off-by: Greg Harris <greg.harris@aiven.io>

C0urante added the connect label Jan 30, 2023

Merge remote-tracking branch 'upstream/trunk' into kafka-14654-connec…

8694833

…tor-tccl

gharris1727 mentioned this pull request Mar 28, 2023

KAFKA-14863: Hide plugins with invalid constructors during plugin discovery #13467

Merged

3 tasks

mukkachaitanya reviewed Apr 10, 2023

View reviewed changes

.../runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/DelegatingClassLoader.java Outdated Show resolved Hide resolved

gharris1727 added 2 commits April 10, 2023 10:39

Replace all usages of compareAndSwapLoaders with safer LoaderSwap

0453005

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Merge remote-tracking branch 'upstream/trunk' into kafka-14654-connec…

9e24304

…tor-tccl

mukkachaitanya approved these changes Apr 11, 2023

View reviewed changes

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/isolation/Plugins.java Outdated Show resolved Hide resolved

C0urante reviewed May 23, 2023

View reviewed changes

fixup: hide LoaderSwap.use and do fine-grained swapping during scanning

a42b0fb

Signed-off-by: Greg Harris <greg.harris@aiven.io>

gharris1727 requested a review from C0urante May 23, 2023 23:08

gharris1727 added 4 commits May 24, 2023 09:19

Revert "fixup: hide LoaderSwap.use and do fine-grained swapping durin…

f773f83

…g scanning" This reverts commit a42b0fb.

Revert "Replace all usages of compareAndSwapLoaders with safer Loader…

86bfbb9

…Swap" This reverts commit 0453005.

add fine-grained swapping with compareAndSwapLoaders

df56290

Signed-off-by: Greg Harris <greg.harris@aiven.io>

Add instance tracking to test plugins

7d8011f

Signed-off-by: Greg Harris <greg.harris@aiven.io>

fixup: revert unnecessary outer swap

2768f78

Signed-off-by: Greg Harris <greg.harris@aiven.io>

C0urante approved these changes May 25, 2023

View reviewed changes

C0urante merged commit dc00832 into apache:trunk May 25, 2023
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-14654: Connector classes should statically initialize with plugin classloader #13165

KAFKA-14654: Connector classes should statically initialize with plugin classloader #13165

gharris1727 commented Jan 25, 2023 •

edited

mukkachaitanya left a comment

C0urante left a comment

C0urante May 23, 2023

gharris1727 May 23, 2023

C0urante May 24, 2023

gharris1727 May 24, 2023

C0urante May 23, 2023

gharris1727 May 23, 2023

gharris1727 May 23, 2023

C0urante May 24, 2023

gharris1727 May 24, 2023

C0urante May 23, 2023

gharris1727 May 23, 2023

C0urante May 24, 2023

gharris1727 commented May 23, 2023

C0urante commented May 24, 2023 •

edited

gharris1727 commented May 24, 2023

C0urante left a comment

KAFKA-14654: Connector classes should statically initialize with plugin classloader #13165

KAFKA-14654: Connector classes should statically initialize with plugin classloader #13165

Conversation

gharris1727 commented Jan 25, 2023 • edited

Committer Checklist (excluded from commit message)

mukkachaitanya left a comment

Choose a reason for hiding this comment

C0urante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gharris1727 commented May 23, 2023

C0urante commented May 24, 2023 • edited

gharris1727 commented May 24, 2023

C0urante left a comment

Choose a reason for hiding this comment

gharris1727 commented Jan 25, 2023 •

edited

C0urante commented May 24, 2023 •

edited