Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-6606] Create checkpoint hook with user classloader #3933

Closed
wants to merge 1 commit into from

Conversation

EronWright
Copy link
Contributor

  • wrap calls to MasterTriggerRestoreHook (and its factory) such that the user classloader is applied

- wrap calls to MasterTriggerRestoreHook (and its factory) such that the user classloader is applied
@rmetzger
Copy link
Contributor

I haven't worked on this part of the code before, but I believe this code is +1 to merge.

@tillrohrmann
Copy link
Contributor

Thanks for your contribution @EronWright. I'm not quite sure whether I understand which problem we are trying to solve here.

I think by deserializing the MasterTriggerRestoreHook.Factories with the user code class loader in ExecutionGraphBuilder.java:253, we support user code hooks. Given that the Factory is a user defined class, then it should get the user code class loader set as its ClassLoader. Thus, it should also be able to load a user defined MasterTriggerRestoreHook class. And the latter can only be a user defined class if the factory is user defined.

But I might be overlooking something here. Maybe you can give me some more details about the PR.

@EronWright
Copy link
Contributor Author

@tillrohrmann the issue is with code that uses the thread's context classloader (TCCL), that may execute in Factory.create or later in the hook methods. For example, the Pravega connector uses grpc, which uses the TCCL during initialization (see ManagedChannelProvider).

Looking at other areas in Flink where usercode is called, I see that the TCCL is consistently set. For example, see InputFormatVertex, ExecutionJobVertex, and Task.

@rmetzger
Copy link
Contributor

Till and I had an offline discussion about this.
What you can do in your user code to set the TCCL properly is the following:

Thread.currentThread().setContextClassLoader(this.getClass().getClassLoader());

this points in user code to a class loaded through the user code classlaoder. So this.getClass().getClassLoader() is the CL you need to set for the thread.

@EronWright
Copy link
Contributor Author

That's the sort of workaround that Android apps are forced to use. Why not continue the practice of setting the TCCL in transitions from system to user code?

@rmetzger
Copy link
Contributor

Lets see what Till says.
In my opinion, both approaches are fine. What you are proposing makes things on Flink's side a bit more complicated, but the behavior is consistent with other places where user code is executed.

Till's proposal (so get the CL via the class) keeps our code straightforward, but users have to know it.

@tillrohrmann
Copy link
Contributor

I agree with @EronWright that it would be nice to offer all user code the a similar environment. This will become especially important if a MasterTriggerRestoreHook which is contained in the system class loader tries to load user code classes dynamically.

I'm wondering whether to follow the approach of the InputFormatVertex and ExecutionJobVertex to set the TCCL explicitly when calling the hooks. This could happen in MasterHooks#triggerMasterHooks for example. The advantage would be that we don't have to reset the TCCL for each hook. On the other hand, it harbours danger that we will forget to set the TCCL when calling the hooks from somewhere else in the future.

public String getIdentifier() {
Thread thread = Thread.currentThread();
ClassLoader originalClassLoader = thread.getContextClassLoader();
thread.setContextClassLoader(userClassLoader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move the setContextClassLoader into the try body. Just to be on the safe side that whenever something happens we will reset the original class loader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagee with that approach due to lack of symmetry. The try..finally idiom is typically written as:

open();
try { ... } finally { close(); }

In other words, the close statement is needed only if the open statement completes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

@EronWright
Copy link
Contributor Author

EronWright commented May 18, 2017

Originally I pursued the path suggested by @tillrohrmann of setting the TCCL at the sites where the hook is invoked, but found that the classloader wasn't readily available. I found it more expedient to create a wrapper that eagerly captured the classloader. This also improved testability.

@tillrohrmann
Copy link
Contributor

Yes that is true. Thanks for your work and the clarification @EronWright. Merging this PR.

asfgit pushed a commit that referenced this pull request May 19, 2017
…Hook

- wrap calls to MasterTriggerRestoreHook (and its factory) such that the user classloader is applied

This closes #3933.
@asfgit asfgit closed this in 2ad0816 May 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants