Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-3713] [clients, runtime] Use user code class loader when disposing savepoints #2083

Closed
wants to merge 3 commits into from

Conversation

uce
Copy link
Contributor

@uce uce commented Jun 8, 2016

Disposing savepoints via the JobManager fails for state handles or descriptors, which contain user classes (for example custom folding state or RocksDB handles).

With this change, the user has to provide the job ID of a running job when disposing a savepoint in order to use the user code class loader of that job or provide the job JARs.

This version breaks the API as the CLI now requires either a JobID or a JAR. I think this is reasonable, because the current approach only works for a subset of the available state variants.

We can port this back for 1.0.4 and make the JobID or JAR arguments optional. What do you think?

I've tested this with a job running on RocksDB state both while the job was running and after it terminated. This was not working with the current 1.0.3 version.

Ideally, we will get rid of the whole disposal business when we make savepoints properly self-contained. I'm going to open a JIRA issue with a proposal to do so soon.

@StephanEwen
Copy link
Contributor

It seems very clumsy to dispose savepoints like that. If that is needed for a fix currently, then I guess we have to live with that.

Would make sense to make the jar and jobid optional, so that non-custom cases can dispose the savepoints in the simple way.
Then, we could ad the RocksDB state backend to the jar in the flink distribution lib folder, so that it is always available and would not need jar or jobid for disposal. That would give the least user friction...

@uce
Copy link
Contributor Author

uce commented Jun 11, 2016

Full ack, but the problem is not just RocksDB, but also savepoints, which reference a user class in the state descriptor, including FS snapshots. So if you configure a file backend and use folding or reducing state, you run into the issue as well.

What about the following: make it optional and fail with a proper error message in case of a missing user code class. SubtaskState (previously StateForTask), which fails to dispose the state snapshots currently only logs the Exception. We can change this to rethrow and then we give a proper error message on failed savepoint disposal.

Syntax: savepoint [OPTIONS] <Job ID>
"savepoint" action options:
-d,--dispose <savepointPath> Disposes an existing savepoint.
-m,--jobmanager <host:port> Address of the JobManager (master) to which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the jobmanager option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be still there but let me check again.

@tillrohrmann
Copy link
Contributor

Good work @uce :-)

I agree that it currently it is a bit clumsy to discard savepoints of jobs which are no longer running. From the user perspective it should be as easy as possible.

I also think that it would be a good idea to add the RocksDB jar to the flink-dist.jar since all serious user are using it. Furthermore, I think it would be a good idea to not only log possible exceptions in SubtaskState but to communicate it back to the user. Thus, letting them bubble up and handling them on the JobManager level would be the right way to go.

I had some minor comments concerning test coverage and the way the job jars are constructed.

@uce
Copy link
Contributor Author

uce commented Jun 27, 2016

Thanks for review. I will propagate the errors, make the job ID/JAR arguments optional, and try to simplify parts of the CLI as you suggested.

Regarding including the RocksDB jar in dist, I think that should be handled as a separate issue.

uce added 2 commits July 6, 2016 14:16
…sing savepoint

Disposing savepoints via the JobManager fails for state handles or descriptors,
which contain user classes (for example custom folding state or RocksDB handles).

With this change, the user has to provide the job ID of a running job when disposing
a savepoint in order to use the user code class loader of that job or provide the
job JARs.

This version breaks the API as the CLI now requires either a JobID or a JAR. I think
this is reasonable, because the current approach only works for a subset of the
available state variants.
@uce
Copy link
Contributor Author

uce commented Jul 6, 2016

I've addressed the comments:

  • Moved the JAR uploading to a helper:

    // Retrieve blob server address and upload JARs to server
    uploadJarFiles(ActorGateway, FiniteDuration, List<Path>);
    // Upload JARs to server 
    uploadJarFiles(InetSocketAddress, List<Path>);

    I did not remove the JobGraph upload JARs method, but it now calls these helpers. I decided against removing it, because it's called in multiple places with the same pattern (upload, get BLOB keys, set BLOB keys).

  • I extract the libraries as you suggested (no need for the job graph now)

  • Removed the job ID variant as it was overloading disposal too much and just stuck to the JAR variant, which is now optional (no script API break)

  • State disposal errors are now propagated and the a failed disposal with ClassNotFoundException gives a hint to provide the JAR file

  • I've added an automated test with custom KV state in ClassLoaderITCase

@uce
Copy link
Contributor Author

uce commented Jul 12, 2016

If there are no objections, I would like to merge this.

@asfgit asfgit closed this in b67e508 Jul 18, 2016
@uce uce deleted the 3713-dispose_savepoint branch August 2, 2016 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants