Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35025][Runtime/State] Abstract stream operators for async state processing #24657

Merged
merged 3 commits into from Apr 17, 2024

Conversation

Zakelly
Copy link
Contributor

@Zakelly Zakelly commented Apr 12, 2024

What is the purpose of the change

As part of the async execution model of disaggregated state management, this PR gives the basic definition of StreamingOperator integrated with async execution.

The philosophy behind this PR is we are trying NOT to invade the previous code path. The most strict thing is that it MUST NOT affect any hot path without this feature.

By the way, the type parameter of record around AsyncExecutionController is erased, since the stream operator does not have the knowledge of record type, and the AEC does not need it.

Brief change log

  • Erase the type parameter of record around AsyncExecutionController
  • Define the AbstractAsyncStateStreamOperator and AbstractAsyncStateStreamOperatorV2 corresponding to the AbstracStreamOperator and AbstracStreamOperatorV2 respectively.
  • Wire the record processing logic with new introduced abstract stream operators in RecordProcessorUtils.

Verifying this change

This change around AEC is already covered by existing tests. Basic UTs for new operators are added.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 12, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@yunfengzhou-hub yunfengzhou-hub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments as below.

*/
@Internal
@SuppressWarnings("rawtypes")
public abstract class AbstractAsyncStateStreamOperator<OUT> extends AbstractStreamOperator<OUT>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the async functionalities an abstract class would prevent the operator from extending from other abstract classes. For example, a RichMapFunction should be a subclass of AbstractUdfStreamOperator, and since it has extended from this abstract class, it cannot extend from AbstractAsyncStateStreamOperator at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async processing starts variant at AbstractStreamOperator and AbstractStreamOperatorV2. My brief idea is, if some operator need integrate with the async processing, they should change to extends AbstractAsyncStateStreamOperator instead of AbstractStreamOperator. Moreover, the AbstractAsyncStateStreamOperator is fully compatible with AbstractStreamOperator, if we make AbstractAsyncStateStreamOperator#isAsyncStateProcessingEnabled configurable to false, everything is not changed compared with AbstractStreamOperator. That's what this PR trying to introduce.

Back to the AbstractUdfStreamOperator, it is a subclass of AbstractStreamOperator. We have two options, the first of which is a new introduced identical class AbstractAsyncStateUdfStreamOperator extending AbstractAsyncStateStreamOperator, the second is to make AbstractUdfStreamOperator extend AbstractAsyncStateStreamOperator. I'd prefer the former, we can discuss this in following PR.


@Override
@SuppressWarnings("unchecked")
public final <T> ThrowingConsumer<StreamRecord<T>, Exception> getRecordProcessor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introducing setAsyncKeyedContextElement, postProcessElement and getRecordProcessor, how about the following implementation?

@Override
public void processElement(StreamRecord<T> record){
        lastProcessContext =
                asyncExecutionController.buildContext(
                        record.getValue(), keySelector.getKey(record.getValue()));
        lastProcessContext.retain();
        asyncExecutionController.setCurrentContext(lastProcessContext);

        super.processElement(record);

        lastProcessContext.release();

        // getRecordProcessor is removed, since the implementation in RecordProcessUtils should be enough
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processElement will be overriden by subclasses, not defined by the superclass. We should leave the processElement abstract and ship our implementation around this.


@Override
public final boolean isAsyncStateProcessingEnabled() {
// TODO: Read from config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to configure isAsyncStateProcessingEnabled globally on the whole Flink job instead of at operators' granularity, and in that case this method would be unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be and will be read from the global config. All operators will have the same option here.

Copy link
Contributor

@fredia fredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zakelly Thanks for the PR, I left some comments, PTAL

@Zakelly
Copy link
Contributor Author

Zakelly commented Apr 15, 2024

@fredia @yunfengzhou-hub Thanks a lot for your review. I have made another commit to introduce the element order of processElement as FLIP-425 said. This is strictly internal and not exposed to users for now.

Copy link
Contributor

@fredia fredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zakelly Thanks for the update, I left some minor comments.

Copy link
Contributor

@fredia fredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, overall LGTM.

Copy link
Contributor

@yunfengzhou-hub yunfengzhou-hub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR. Left some minor comments as below.

@Zakelly Zakelly force-pushed the f35025 branch 2 times, most recently from f2107a2 to 576ad6a Compare April 17, 2024 04:03
Copy link
Contributor

@yunfengzhou-hub yunfengzhou-hub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! The changes on the stream operators LGTM.

@Zakelly
Copy link
Contributor Author

Zakelly commented Apr 17, 2024

Thanks @fredia and @yunfengzhou-hub for your detailed review! Really appreciate it. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants