Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMZA-2498: [Job coordinator isolation] Classloader isolation for SamzaApplication.describe execution in job coordinator #1337

Closed
wants to merge 2 commits into from

Conversation

cameronlee314
Copy link
Contributor

Feature: https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation#SEP-24:Cluster-basedJobCoordinatorDependencyIsolation-HandlingSamzaApplication.describe

Changes:

  1. Add a method IsolatingClassLoaderFactory.buildSamzaApplicationClassLoader to construct a classloader for loading the SamzaApplication implementation for an app.
  2. If isolation is enabled, then call buildSamzaApplicationClassLoader to load the SamzaApplication implementation in the job coordinator.

Tests:
Deployed wikipedia-feed application (uses low-level API, has custom input descriptor, uses Kafka output descriptor) and wikipedia-parser application (uses low-level API, Kafka descriptors) in split deployment mode with some debug logs, and verified the following:

  1. TaskApplicationDescriptorImpl and KafkaSystemDescriptor came from the framework infrastructure classloader.
  2. WikipediaSystemDescriptor (custom descriptor) came from the application classloader.
  3. WikipediaFeedTaskApplication (impl of SamzaApplication) came from the newly built application describe classloader.
  4. Lambda which was used for the TaskFactory in WikipediaFeedTaskApplication came from the newly built application describe classloader, and it implemented the StreamTaskFactory interface which was loaded by the framework API classloader.
  5. Verified expected system-stream metadata from custom system (WikipediaSystemFactory)
    Deployed wikipedia-parser application (uses low-level API, Kafka descriptors) and verified the classloaders using debug logs.

API/usage changes: None

Comment on lines +50 to +51
* If {@code canApplyFrameworkIsolationIfEnabled} is true and isolation is enabled (in the config), then this will
* build the {@link SamzaApplication} using a special isolating classloader. Otherwise, this will use the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need two levels (this boolean & the config) to determine the behavior? Can we not infer from the config and then use the appropriate loader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we only support using this classloader on the job coordinator. So we can't use this new flow on the processing containers nor RemoteApplicationRunner yet. Once we support split deployment on the processing containers and remove the need from RemoteApplicationRunner, then we can remove this flag. I will update the doc with that clarification.

// almost everything should come from the parent classloader
.addDelegatePreferredClassPredicate(new GlobMatcher("*"))
// blacklisting the SamzaApplication class from the parent so that it gets loaded by this classloader
.addBlacklistedClassPredicate(new GlobMatcher(samzaApplicationClassName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that by only blacklisting samzaApplicationClass, we end up class loading some of the dependencies of user code through parent class loader assuming both parent and application class loader are capable of classing loading that class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that is a good catch. I need to think through what the impact of that is.

@cameronlee314
Copy link
Contributor Author

This approach does not properly satisfy the isolation requirements. I will open up a separate PR with a different solution.

@cameronlee314 cameronlee314 deleted the jc_describe branch November 17, 2021 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants