-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-6755][CLI] Support manual checkpoints triggering #23679
Conversation
Hi, @pnowojski @masteryhx Would you please take a look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Although I'm not the best person to review CLI changes, the change looks mostly good to me. I've left a couple of minor comments.
flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
Outdated
Show resolved
Hide resolved
flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java
Outdated
Show resolved
Hide resolved
flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontendParser.java
Outdated
Show resolved
Hide resolved
If you want to trigger a full checkpoint while the job periodically triggering incremental checkpoints, | ||
please use the `--full` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase it to sth like:
By default this command triggers the same type of checkpoint that is configured for the given job. If you want override this behaviour, you can force either full or incremental checkpoint to be performed via
--full
or--incremental
flags respectively.
(code example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some digging, I found when a job is configured to run with rocksdb and incremental checkpoint disabled, the --incremental
flag could not take effect. That is because the RocksNativeFullSnapshotStrategy
does not take CheckpointOptions
into account when taking async snapshots. So since only --full
flag works when incremental checkpoint enabled, I only describe this usage here. But I'm ok with your phrasing.
Also, we should reconsider the incremental file-sharing chain in this senario. How does a manually triggered incremental checkpoint reuse the files from previous full checkpoints, which exist in chk-x
instead of shared
directory? I think we need a more flexible checkpoint directory layout. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh, ok. Then yes, this description works better.
But in that case, what happens when user selects --incremental
flag with configured HashMapStateBackend
or full RocksDB
? In that case the CLI command should ideally fail (exception probably would have to be thrown from somewhere around CheckpointCoordinator
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User selects --incremental
flag in this case will trigger a full checkpoint. This behavior align with the REST API. I think a possible way is to remove the --incremental
flag (or just remove it from the help messages and documents). After we really support triggering a incremental checkpoint with configured full checkpointing state backend, we add this flag back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, let's remove --incremental
flag from the CLI. Adding some runtime check somewhere would make also sense anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll create another ticket adding some runtime check and remove the INCREMENTAL flag from REST API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post a JIRA ticket here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, here: https://issues.apache.org/jira/browse/FLINK-33498 . I will start working on this recently.
If you want to trigger a full checkpoint while the job periodically triggering incremental checkpoints, | ||
please use the `--full` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you squash the latest two fixup commits into the first two commits:
[[FLINK-6755] Support manual checkpoints triggering from CLI](https://github.com/apache/flink/pull/23679/commits/c563fd15191137aafc548b0a19fc09a28833aff5)
@[Zakelly](https://github.com/Zakelly/flink/commits?author=Zakelly)
Zakelly committed 2 days ago
[[FLINK-6755] Docs about manual checkpoints triggering from CLI](https://github.com/apache/flink/pull/23679/commits/bd404c8dad2c78dc5106fde39e3f81a689b90c85)
? LGTM otherwise :)
Already done. Many thanks! |
flink-clients/src/main/java/org/apache/flink/client/cli/CheckpointOptions.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! Merged. |
What is the purpose of the change
Since FLINK-24280, we have supported manual checkpoint triggering from REST API. This PR wire command-line interface and rest endpoint, supporting manual checkpoints triggering from CLI.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation