Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32514] Support configuring checkpointing interval during process backlog #22931

Merged
merged 1 commit into from
Aug 24, 2023

Conversation

yunfengzhou-hub
Copy link
Contributor

What is the purpose of the change

This pull request adds support that a Flink job can change its checkpointing interval depending on the process backlog status of the job.

Brief change log

  • Adds configuration execution.checkpointing.interval-during-backlog
  • Adds public method SplitEnumeratorContext#setIsProcessingBacklog
  • Makes HybridSource reports processing backlog depending on source stages
  • Supports changing checkpoint interval based on process backlog in CheckpointCoordinator

Verifying this change

This change added tests and can be verified as follows:

  • Added integration tests for changing checkpointing intervals.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs & JavaDocs

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 3, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 2 times, most recently from 57bf840 to 581e548 Compare July 3, 2023 10:49
@yunfengzhou-hub yunfengzhou-hub marked this pull request as ready for review July 3, 2023 10:50
@yunfengzhou-hub
Copy link
Contributor Author

@flinkbot run azure

@yunfengzhou-hub
Copy link
Contributor Author

@lindong28 could you please take a look at this PR?

Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments below.

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 3 times, most recently from bc7f391 to 4fe5d1b Compare July 20, 2023 02:16
@yunfengzhou-hub
Copy link
Contributor Author

@flinkbot run azure

1 similar comment
@yunfengzhou-hub
Copy link
Contributor Author

@flinkbot run azure

Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Left some comments below.

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 3 times, most recently from 17f9129 to 1039b03 Compare July 21, 2023 08:27
Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Left some comments below.

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 2 times, most recently from c521c4e to f0b9322 Compare July 24, 2023 08:22
@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 6 times, most recently from 2dde6a8 to 3a28ae2 Compare July 31, 2023 05:42
Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Left some comments below.

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 4 times, most recently from 0725f05 to 10e471b Compare July 31, 2023 09:03
@yunfengzhou-hub
Copy link
Contributor Author

@flinkbot run azure

Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Left some comments below.

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 2 times, most recently from 18a7adc to d6c7573 Compare July 31, 2023 11:00
Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Left some comments below.

@yunfengzhou-hub yunfengzhou-hub force-pushed the FLINK-32514 branch 3 times, most recently from 91c04e6 to 5da3732 Compare August 2, 2023 00:42
@lindong28
Copy link
Member

Thanks for the update. LGTM.

@lindong28 lindong28 merged commit 615e824 into apache:master Aug 24, 2023

context.lazyInitialize(globalFailureHandler, mainThreadExecutor);
context.lazyInitialize(globalFailureHandler, mainThreadExecutor, checkpointCoordinator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindong28 @yunfengzhou-hub Is initializing the OperatorCoordinatorHolder twice done deliberately here? 🤔

Copy link
Member

@lindong28 lindong28 Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XComp Thanks for catching this. No, we didn't do it intentionally. I believe it is introduced when Yunfeng rebased the PR. I didn't catch this issue because I didn't go over the entire PR end-to-end very carefully when there is only minor remaining comments in the last 2 rounds of review.

It seems that the extra invocation of lazyInitialize() would not introduce any visible performance or correctness issue. Maybe one of us can fix it in our next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification. Doing it as a hotfix commit in one other PR makes sense because of the reasons you mentioned. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants