Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-12729][state-processor-api] Add state reader for consuming non-partitioned operator state #8615

Closed
wants to merge 6 commits into from

Conversation

sjwiesman
Copy link
Contributor

What is the purpose of the change

This is the initial PR for FLIP-43 adding the functionality to read non-partitioned operator state from a state snapshot.

Brief changelog

  • The first two commits expose internal api's for use outside of their package but mark them for internal use only.
  • The third commit adds the operator state reader functionality

Verifying this change

This change added tests and can be verified as follows:

  • Unit and IT tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
    (Full documentation coming in the keyed state reader PR)

@sjwiesman sjwiesman changed the title Flink 12729 [FLINK-12729][state-processor-api] Add state reader for consuming non-partitioned operator state Jun 4, 2019
@flinkbot
Copy link
Collaborator

flinkbot commented Jun 4, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ✅ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❗ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@sjwiesman
Copy link
Contributor Author

@flinkbot attention @tzulitai

@tzulitai
Copy link
Contributor

tzulitai commented Jun 5, 2019

@flinkbot approve description
@flinkbot approve consensus

Copy link
Contributor

@tzulitai tzulitai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work @sjwiesman!
The comments I have are mostly minor, with only 2 more important comments about classloaders and whether or not the work of reading savepoint metadata can be done earlier and done only once.

One thing to probably think about (maybe can be done as a follow-up PR):
Is the naming of Savepoint and ExistingSavepoint still sensible here, now that we renamed this series of work as "State Processor API"?

Otherwise, I have tried this manually, and it works for both types of backends, haven't discovered any problems so far. +1 to merge this once the comments are addressed.

ExistingSavepoint(ExecutionEnvironment env, String path, StateBackend stateBackend) {
this.env = env;
this.existingSavepoint = path;
this.stateBackend = stateBackend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing Preconditions.checkNotNull checks

String uid,
String name,
TypeInformation<T> typeInfo,
TypeSerializer<T> serializer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up, we can probably think about a variant where the user simply passes in a TypeSerializer and no TypeInformation. In this case, can we just wrap the given serializer into a "dummy" type info?
Not entirely sure what methods of the TypeInformation will be used in the batch processing API.

*/
public BroadcastStateInputFormat(String savepointPath, String uid, MapStateDescriptor<K, V> descriptor) {
super(savepointPath, uid, true);
this.descriptor = descriptor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing Preconditions.checkNotNull() check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check all other constructors of this problem, I think the issue occurs in multiple constructors.

*
* @param <OT> The type of the input.
*/
abstract class OperatorStateInputFormat<OT> extends SavepointInputFormat<OT, OperatorStateInputSplit> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Internal annotation

* @throws IOException If the savepoint path is invalid or the uid does not exist
*/
OperatorState getOperatorState() throws IOException {
final Savepoint savepoint = SavepointLoader.loadSavepoint(savepointPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking whether or not this operation can be done more earlier when loading a ExistingSavepoint, and maintained by the ExistingSavepoint class. Not entirely sure though, as that would require access to the DFS on the client side; not sure how feasible that is in practice.

Not too much of a deal, as this shouldn't be a heavy workload.

.resolveCheckpointPointer(savepointPath);

try (DataInputStream stream = new DataInputStream(location.getMetadataHandle().openInputStream())) {
return Checkpoints.loadCheckpointMetadata(stream, NullClassLoader.INSTANCE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, once we add type information of state into the savepoint metadata file, I don't think this NullClassLoader is correct anymore. We would potentially need the actual user classloader, since the metadata would contain user classes (e.g. user implemented TypeSerializerSnapshots)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one place where I couldn't get access to the user classloader but it wasn't being used so I decided to go this route. However, if we do the reading on the client the current thread class loader should be the user class loader so that makes it easier.

@sjwiesman
Copy link
Contributor Author

Thanks for the review @tzulitai !

I've addressed all the comments including moving the savepoint loader to run once on the client.

tzulitai added a commit to tzulitai/flink that referenced this pull request Jun 25, 2019
@asfgit asfgit closed this in 88d2e3c Jun 26, 2019
zentol pushed a commit to zentol/flink that referenced this pull request Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants