Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14654: Connector classes should statically initialize with plugin classloader #13165

Merged
merged 13 commits into from May 25, 2023

Conversation

gharris1727
Copy link
Contributor

@gharris1727 gharris1727 commented Jan 25, 2023

The scanPluginPath -> getPluginDesc -> versionFor code path instantiates connectors in order to evaluate their version() method. This is the first call to initialize these classes, and so performs static initialization, which may be sensitive to the Thread Context Classloader. Currently the TCCL is just the app class loader, which may prevent the connector from discovering isolated resources.

Instead, add the loader swap in getServiceLoaderPluginDesc to getPluginDesc, in order to cover both the service-loaded and reflections-loaded classes, and in particular, initialize connectors with the correct TCCL.

Also add SamplingTestPlugin::allInstances which enables asserting the TCCL used for the initial constructor and version calls.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
…in classloader

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Copy link
Contributor

@mukkachaitanya mukkachaitanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gharris1727! LG, with a minor non-blocking suggestion.

Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Greg, this looks good for the most part.

I'm wondering if we can get better coverage for DelegatingClassLoader::scanPluginPath. Right now we verify in PluginsTest::newConnectorShouldInstantiateWithPluginClassLoader that if we've initialized a Plugins instance, and we invoke Plugins::newConnector, the constructor for that connector is called with the correct context classloader. But it seems like this isn't very powerful since, if the constructor is invoked multiple times, the last invocation's classloader will be recorded--so in this case, we're really testing Plugins::newConnector and not the instantiations that are performed during plugin discovery.

Comment on lines 27 to 35
public static LoaderSwap use(ClassLoader loader) {
ClassLoader savedLoader = compareAndSwapLoaders(loader);
try {
return new LoaderSwap(savedLoader);
} catch (Throwable t) {
compareAndSwapLoaders(savedLoader);
throw t;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding static logic that invokes compareAndSwapLoaders is difficult to test, which was the motivation for KAFKA-14346. Can we try not to re-introduce that kind of static logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not re-introducing the static logic, it is just refactoring to eliminate the open-ended Plugins.compareAndSwap* methods.

This method is only called in two places: by DelegatingClassLoader.scanPluginPath (before scanning is finished) and Plugins.withClassLoader (after scanning is finished).

I've dropped the visibility and made the DCL call-site mock-able.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does still involve static logic for classloader swapping, though. And the comment about internal use doesn't seem very helpful since the way we use that term ("internal") has to do with public vs. private API; it's not really clear to people that (or why) they shouldn't just upgrade the visibility to public.

Ultimately I'd prefer to see this logic duplicated in two places (DelegatingClassLoader::withClassLoader and Plugins::withCLassLoader) rather than introduce a new API that might be misused in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this change and left Plugins.compareAndSwapLoader unchanged.

plugins.compareAndSwapWithDelegatingLoader();
T config = createConfig(workerProps);
log.debug("Kafka cluster ID: {}", config.kafkaClusterId());
try (LoaderSwap loaderSwap = plugins.withClassLoader(plugins.delegatingLoader())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually incorrect; we want the delegating loader to remain the classloader even after this method exits (normally or exceptionally).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is a change in semantics, but that change is intentional. After this method completes, operations should not require the delegating loader and should be performed via the Connect handle. That handle only has methods for starting, stopping, and interacting with the REST API, all of which should internally handle setting the context classloader when appropriate.

The reason that I'm changing this is that I think the open-ended swap methods are an anti-pattern, and lead to unexpected behavior later in the caller thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here is some of the context for this change: #13165 (comment)

Since the elimination of compareAndSwap is technically unrelated to the title change, it could be moved out to it's own PR. Let me know if you'd like me to separate the two changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's brittle to change the context classloader back. Currently there's no additional logic that requires it, but we have a choice between adding the potential for bugs related to the context classloader and not adding it.

I get that the approach on trunk requires special treatment for integration tests, but since that's already a solved problem, I'd prefer to keep things as they are, especially since it's preferable to keep the risk in the testing portions of the code base over the main parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this change.

Comment on lines 92 to 94
// we should keep the original class loader and set it back after connector stopped since the connector will change the class loader,
// and then, the Mockito will use the unexpected class loader to generate the wrong proxy instance, which makes mock failed
private final ClassLoader originalClassLoader = Thread.currentThread().getContextClassLoader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep this since the change to AbstractConnectCli::startConnect is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. I think that this is a symptom of the open-ended context classloader swap having unintended downstream effects. The existing fix is adequate, but is mostly addressing the symptom rather than the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Discussed above)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
@gharris1727
Copy link
Contributor Author

I'm wondering if we can get better coverage for DelegatingClassLoader::scanPluginPath. Right now we verify in PluginsTest::newConnectorShouldInstantiateWithPluginClassLoader that if we've initialized a Plugins instance, and we invoke Plugins::newConnector, the constructor for that connector is called with the correct context classloader. But it seems like this isn't very powerful since, if the constructor is invoked multiple times, the last invocation's classloader will be recorded--so in this case, we're really testing Plugins::newConnector and not the instantiations that are performed during plugin discovery.

Yeah this is a blind-spot in the existing tests. The "sampling" paradigm requires an instance of the object in order to perform the assertions, and the scanPluginPath implementation throws away the objects that it creates. The test does not and cannot assert that the TCCL is correct for the first version() call, for example.

In this specific case the regression test is still sensitive, because the static initialization happens when the plugin constructor is first called (not when the Class<?> object is created). This means that we can assert the TCCL used in the first constructor via the staticClassloader inspection.

I think the alternative would involve mocking/spying part of the scanPluginPath (such as versionFor), or keeping track of instantiated objects in SamplingTestPlugins, both of which seem messy, and would make this harder to refactor in the near future. Do you think this should be addressed now, or can it wait until the plugin path scanning refactor is landed?

@gharris1727 gharris1727 requested a review from C0urante May 23, 2023 23:08
@C0urante
Copy link
Contributor

C0urante commented May 24, 2023

That's a good point about the static initialization taking place directly before the constructor right now, but it's possible that other logic either directly from the Connect framework or from the Reflections library can cause static initialization to take place earlier than then.

I was thinking we could statically track context classloader instances for the SamplingConnector class across instantiations of that class, and then perform assertion on all of those instances about the correct classloader being set. This wouldn't give us perfect coverage across all plugin (or plugin discovery) types, but would at least harden us against changes to plugin discovery logic.

I have a local draft of this that I'd be happy to share if it's too much work. It's certainly not as clean as the existing logic but the tradeoff of coverage for cleanliness is worth it IMO.

@gharris1727
Copy link
Contributor Author

@C0urante I added a static list to all of the Sampling plugins that allow us to inspect the classloader used for all method calls to all instances of each plugin type. This should now perform the assertions you were describing.

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Greg!

@C0urante C0urante merged commit dc00832 into apache:trunk May 25, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants