[FLINK-20153] Add documentation for BATCH execution mode #14114

aljoscha · 2020-11-18T09:13:36Z

This adds documentation for the new BATCH execution mode. We also explain STREAMING execution mode because there is no central page that explains the basic behavior, so far.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no

Documentation

Documentation only.

flinkbot · 2020-11-18T09:16:22Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 70975eb (Wed Nov 18 09:16:21 UTC 2020)

Warnings:

Documentation files were touched, but no .zh.md files: Update Chinese documentation or file Jira ticket.
This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

docs/dev/datastream_execution_mode.md

flinkbot · 2020-11-18T09:55:35Z

CI report:

48066f6 UNKNOWN
b2d734e UNKNOWN
919ff00 UNKNOWN
4d1d0d1 UNKNOWN
5fb8fb7 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

docs/dev/datastream_execution_mode.md

aljoscha · 2020-11-18T11:37:48Z

Thanks for the speedy review! And I really appreciate the suggestions that I can just apply right here in Github.

I pushed a commit that should address most comments.

docs/dev/datastream_execution_mode.md

tillrohrmann

Thanks for creating this documentation PR @aljoscha. I had a couple of minor comments.

docs/dev/datastream_execution_mode.md

aljoscha · 2020-11-23T13:03:00Z

I addressed more comments and also added the important considerations section by Dawid. Could you please take another look?

aljoscha · 2020-11-23T13:27:16Z

I now also added sections by Klou and a section about state backends. Now all the content is theoretically in.

kl0u

Thanks for the work @aljoscha , I left some comments in the PR. Feel free to integrate whichever you agree with.

docs/dev/datastream_execution_mode.md

sjwiesman

+1 to @kl0u 's comments and few others

docs/dev/datastream_execution_mode.md

sjwiesman · 2020-11-23T15:58:33Z

docs/dev/datastream_execution_mode.md

+In the batch world though, we believe that such use-cases do not make much
+sense, as the input (both the elements and the control stream) are static and
+known in advance.


So what's the recommendation, to load the dataset in open?

I'm afraid so. But it's not that nice. We do want to add proper support for broadcast input in the next release, though.

tillrohrmann

I had one more comment about choosing the BATCH vs. the STREAMING execution mode.

tillrohrmann · 2020-11-23T16:14:13Z

docs/dev/datastream_execution_mode.md

+As a rule of thumb, you should be using `BATCH` execution mode when your program
+is bounded because this will be more efficient. You have to use `STREAMING`
+execution mode when your program is unbounded because only this mode is general
+enough to be able to deal with continuous data streams.


Here it sounds as if it does not really matter whether to choose BATCH or STREAMING for a bounded job from a correctness perspective. However, the FileSink won't commit the in-progress files at the end of the program when using the STREAMING execution mode. It might be worthwhile to document this behaviour somewhere.

Yes, this is unfortunate. Though the fact that we cannot do checkpoints as soon as at least one task has finished, which in turn means that we can't get a "final" checkpoint has been a feature/bug of DataStream execution since the beginning. I wouldn't document it here but we can think about adding this to a general "caveats" section. I'm sure there would be other corner cases that are worth documenting 😅

tillrohrmann · 2020-11-23T16:30:31Z

docs/dev/datastream_execution_mode.md

+This is possible because inputs are bounded.  This pushes the cost more towards
+the recovery, but makes the regular processing cheaper, because it avoids
+checkpoints.


The last paragraph of failure recovery reads as if BATCH execution improves the overall execution time of jobs but here it reads a bit differently. Concretely that BATCH recoveries are more costly than STREAMING recoveries.

Good catch! @dawidwys what was the original intention here?

My intention was to briefly remind the batch failure recovery model. For that I actually reused the description from: https://ci.apache.org/projects/flink/flink-docs-master/concepts/stateful-stream-processing.html#state-and-fault-tolerance-in-batch-programs

With the description in the failure recovery section, we can probably drop the first paragraph and start with the second one:

It is important to remember that because there are no checkpoints, as described above, certain ...

BTW shall we update it in #stateful-stream-processing.html#state-and-fault-tolerance-in-batch-programs? It is not easy to tell which model recovers "faster" as it very much depends on the state size, number of records to replay, number of tasks to recover etc.

Oh boy, this text was first added in 2016: https://github.com/apache/flink/blame/b04d51a129a3341887e7a0866557c9871f58e94c/docs/concepts/concepts.md. I copied it to the current concepts section from there. That's not at all up-to-date anymore.

That section needs an overhaul or should be removed because it's also misleading for DataSet programs or Table/SQL batch programs.

For this documentation here I think we can go with @dawidwys's suggestion and just drop that paragraph because I added some text about that above.

dawidwys

+1 from my side for these changes

aljoscha · 2020-11-24T15:15:57Z

I believe I addressed all comments. Please take another look. If there's no objection I would merge this by tomorrow because this PR/discussion is growing a bit unwieldy. Anything else we want to add we can still add later.

This adds documentation for the new `BATCH` execution mode. We also explain `STREAMING` execution mode because there is no central page that explains the basic behavior, so far.

…ataSet docs

aljoscha · 2020-11-25T14:18:35Z

Thanks for the reviews! I merged this now.

aljoscha requested review from alpinegizmo, sjwiesman, kl0u and dawidwys November 18, 2020 09:13

rmetzger added the review=description? label Nov 18, 2020

alpinegizmo reviewed Nov 18, 2020

View reviewed changes

dawidwys reviewed Nov 18, 2020

View reviewed changes

rmetzger added the component=API/DataStream label Nov 18, 2020

sjwiesman reviewed Nov 18, 2020

View reviewed changes

tillrohrmann reviewed Nov 20, 2020

View reviewed changes

kl0u reviewed Nov 23, 2020

View reviewed changes

sjwiesman reviewed Nov 23, 2020

View reviewed changes

tillrohrmann reviewed Nov 23, 2020

View reviewed changes

dawidwys approved these changes Nov 24, 2020

View reviewed changes

aljoscha and others added 5 commits November 24, 2020 17:11

[FLINK-20153] Add documentation for BATCH execution mode

0504bce

This adds documentation for the new `BATCH` execution mode. We also explain `STREAMING` execution mode because there is no central page that explains the basic behavior, so far.

[FLINK-20153] Add important considerations in execution mode docs

2660fad

[FLINK-20153] Describe time behaviour in execution mode docs

70c7463

[FLINK-20153] Add glossary entry for runtime execution mode

3c4905b

[FLINK-20302] Recommend DataStream API with BATCH execution mode in D…

f363079

…ataSet docs

aljoscha force-pushed the flink-20153-batch-documentation branch from 4d1d0d1 to f363079 Compare November 24, 2020 16:11

fixup! add license to svg

5fb8fb7

aljoscha closed this Nov 25, 2020

aljoscha deleted the flink-20153-batch-documentation branch November 26, 2020 12:56

[FLINK-20153] Add documentation for BATCH execution mode #14114

[FLINK-20153] Add documentation for BATCH execution mode #14114

Conversation

aljoscha commented Nov 18, 2020

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Nov 18, 2020

Automated Checks

Review Progress

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flinkbot commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aljoscha commented Nov 18, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aljoscha commented Nov 23, 2020

Uh oh!

aljoscha commented Nov 23, 2020

Uh oh!

kl0u left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjwiesman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

flinkbot commented Nov 18, 2020 •

edited

Loading