Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-30533][runtime] SourceOperator#emitNext() should push records to DataOutput in a while loop #21576

Merged
merged 1 commit into from Jan 4, 2023

Conversation

lindong28
Copy link
Member

@lindong28 lindong28 commented Dec 29, 2022

What is the purpose of the change

Improve Flink performance by reducing the average depth of the call stack needed to produce a record.

Note that this optimization is currently disabled for SourceOperator executed by either MultipleInputStreamTask or AbstractTwoInputStreamTask.

Brief change log

Updated SourceOperator#emitNext() to push records to the given DataOutput in a while loop. The loop will break when any of the following conditions are met:

  • The underlying source reader returns DataInputStatus#MORE_AVAILABLE
  • The mailbox used by the StreamTask has at least one mail.
  • SourceOperator is executed by either MultipleInputStreamTask or AbstractTwoInputStreamTask.

For the example program provided below, the following 4 function calls will be removed from the call stack needed to produce a record for most records.

  • StreamTask#processInput()
  • StreamOneInputProcessor#processInput()
  • StreamTaskSourceInput#emitNext()
  • SourceOperator#convertToInternalStatus()

Verifying this change

env.fromSequence(1, 1000000000L).map(x -> x).addSink(new DiscardingSink<>());

Here are the benchmark results obtained by running the above program with parallelism=1 and object re-use enabled. The results are averaged across 5 runs for each setup.

Prior to the proposed change, the average execution time is 46.1 sec with std=5.1 sec.
After the proposed change, the average execution time is 33.3 sec with std=0.9 sec.
The proposed change increases throughput by 38.4%.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 29, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@lindong28
Copy link
Member Author

@flinkbot run azure

@lindong28 lindong28 force-pushed the FLINK-30533 branch 3 times, most recently from df885c0 to d26585f Compare December 30, 2022 09:48
@lindong28 lindong28 changed the title [FLINK-30533][runtime] IteratorSourceReaderBase#pollNext() should push records to ReaderOutput in a while loop [FLINK-30533][runtime] SourceOperator#emitNext() should push records to DataOutput in a while loop Dec 31, 2022
@lindong28
Copy link
Member Author

@xintongsong @zhuzhurk Can you review this PR? Thanks!

Copy link
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR, @lindong28. The changes LGTM. I have only 1 minor comment. +1 for merging.

Copy link
Contributor

@zhuzhurk zhuzhurk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR! @lindong28
I'm a bit uncertain about how much this change will affect the input selection of multi-input operators. Currently it switches inputs/sources each record. Looks to me this change may cause it to drain one input first before moving to the next input, when inputs are fast enough.

@lindong28
Copy link
Member Author

lindong28 commented Jan 3, 2023

@xintongsong @zhuzhurk Thanks for the review!

@zhuzhurk After reading through related source code, I believe this PR won't affect the behavior of multi-input operators, due to the following reasons:

  • PR only changes the behavior of SourceOperator#emitNext(...).
  • SourceOperator will only be executed by SourceOperatorStreamTask (see this and this)
  • SourceOperatorStreamTask only uses StreamOneInputProcessor (see this), which keeps processing the same input in a loop.

Does this answer your question?

@lindong28
Copy link
Member Author

@zhuzhurk You are right. Currently for SQL program executed in batch mode, the program might create a MultipleInputStreamTask that reads from multiple SourceOperator.

I agree it might potentially cause performance for MultipleInputStreamTask. More discussion and benchmarks are needed to understand whether we should make this optimization for MultipleInputStreamTask.

I have updated PR to exclude MultipleInputStreamTask and AbstractTwoInputStreamTask from this optimization. Can you take another look?

@lindong28 lindong28 force-pushed the FLINK-30533 branch 2 times, most recently from 027772f to d58e84c Compare January 4, 2023 10:07
Copy link
Contributor

@zhuzhurk zhuzhurk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments.
The change looks good to me except for one last minor comment.

…to DataOutput in a while loop

This optimization is currently disabled for TwoInputStreamTask and MultipleInputStreamTask.
@lindong28 lindong28 merged commit fb2722c into apache:master Jan 4, 2023
swuferhong pushed a commit to swuferhong/flink that referenced this pull request Jan 6, 2023
…to DataOutput in a while loop

This optimization is currently disabled for TwoInputStreamTask and MultipleInputStreamTask.

This closes apache#21576.
chucheng92 pushed a commit to chucheng92/flink that referenced this pull request Feb 3, 2023
…to DataOutput in a while loop

This optimization is currently disabled for TwoInputStreamTask and MultipleInputStreamTask.

This closes apache#21576.
akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023
…to DataOutput in a while loop

This optimization is currently disabled for TwoInputStreamTask and MultipleInputStreamTask.

This closes apache#21576.
@lindong28 lindong28 deleted the FLINK-30533 branch April 3, 2023 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants