Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5464] Use BATCH_FORCED as the default ExecutionMode for batch pipeline #6897

Merged
merged 1 commit into from Nov 1, 2018

Conversation

angoenka
Copy link
Contributor

@angoenka angoenka commented Oct 31, 2018

Use BATCH_FORCED ExecutionMode for Flink batch pipelines to avoid flink scheduling dead lock.
Beam merges the chained ProcessBundles in ExecutionStage so this should not have a lot of over head.
Also introducing a parameter to change batch execution mode if needed.

Reference upstream jira https://issues.apache.org/jira/browse/FLINK-10672

Please add a meaningful description for your change here


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@angoenka
Copy link
Contributor Author

cc: @mxm @robertwb

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! It appears there is a scheduling bug for large pipelined jobs. It's great to have a workaround.

@Description(
"Flink mode for data exchange for batch pipeline. "
+ "Reference {@link org.apache.flink.api.common.ExecutionMode}")
@Default.Enum("BATCH_FORCED")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to change the default value here. This is also used by the non-portable FlinkRunner and the default is PIPELINED. We haven't heard from anyone having issues with the batch execution. I'd leave this at the Flink default until we have found out the exact issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ok for me to add a link to a JIRA issue to further investigate this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will update the default.
Flink Jira for reference https://issues.apache.org/jira/browse/FLINK-10672

"Flink mode for data exchange for batch pipeline. "
+ "Reference {@link org.apache.flink.api.common.ExecutionMode}")
@Default.Enum("BATCH_FORCED")
ExecutionMode getExecutionModeForBatch();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test in PipelineOptionsTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TheDefault Enum test is done in ProxyInvocationHandlerTest.java
Please let me know if you are referring to some other test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant PipelineOptionsTest to test the default values. I'm adding it in the merge commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be getFlinkExecutionModeForBatch, as it seems rather Flink-specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the global namespace of the option names that would be preferred. However, not a single other option so far has a Flink prefix. Perhaps we have to come up with a way to scope option names?

@angoenka angoenka force-pushed the tfx_add_execution_mode_option branch from 514f46a to 329e51f Compare October 31, 2018 17:42
@asfgit asfgit merged commit 329e51f into apache:master Nov 1, 2018
asfgit pushed a commit that referenced this pull request Nov 1, 2018
Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, the resulting PR just enables the option, rather than setting it as the title states, right?

"Flink mode for data exchange for batch pipeline. "
+ "Reference {@link org.apache.flink.api.common.ExecutionMode}")
@Default.Enum("BATCH_FORCED")
ExecutionMode getExecutionModeForBatch();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be getFlinkExecutionModeForBatch, as it seems rather Flink-specific?

@mxm
Copy link
Contributor

mxm commented Nov 5, 2018

Just to clarify, the resulting PR just enables the option, rather than setting it as the title states, right?

Yes, it just enables to set it. The default remains unchanged (PIPELINED). We didn't want to make changes to the legacy FlinkRunner and further investigate the issue. As a next step we could set BATCH_FORCED for the portable Runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants