Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-17016][runtime] Integrate pipelined region scheduling #13284

Merged

Conversation

zhuzhurk
Copy link
Contributor

@zhuzhurk zhuzhurk commented Aug 31, 2020

What is the purpose of the change

This PR is to integrate pipelined region scheduling with DefaultScheduler.
A new config option "jobmanager.scheduler.scheduling-strategy" is introduced to control whether to use the new "region" scheduling or to use the "legacy" eager/lazy-from-sources scheduling.
This change is based on #13181 and #13321

Brief change log

  • Change scheduling strategy for ScheduleMode.LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST
  • Fix breaking tests due to insufficient slots

Verifying this change

  • Added integration tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 6e68f6b (Mon Aug 31 09:35:10 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 31, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@zhuzhurk
Copy link
Contributor Author

zhuzhurk commented Sep 1, 2020

The failed e2e test "Elasticsearch (v6.3.1) sink" is a known issue FLINK-19093 which is unrelated to this change.
Re-run the tests.

Copy link
Contributor

@azagrebin azagrebin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR @zhuzhurk
I think it is good, I left some smaller comments

@zhuzhurk zhuzhurk force-pushed the FLINK_17016_enable_pipelined_scheduling branch 2 times, most recently from 21bf743 to e5a5f7d Compare September 2, 2020 11:30
@zhuzhurk zhuzhurk changed the title [FLINK-17016][runtime] Change blink planner batch jobs to run with pipelined region scheduling [FLINK-17016][runtime] Enable pipelined region scheduling Sep 3, 2020
@zhuzhurk zhuzhurk force-pushed the FLINK_17016_enable_pipelined_scheduling branch 6 times, most recently from 9ab00d3 to e95e0ab Compare September 8, 2020 09:58
@zhuzhurk zhuzhurk force-pushed the FLINK_17016_enable_pipelined_scheduling branch from e95e0ab to acc8ba0 Compare September 9, 2020 08:27
Copy link
Contributor

@azagrebin azagrebin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments @zhuzhurk
The PR looks good to me, only minor comments. There are test failures though

@@ -35,7 +35,7 @@
import static org.junit.Assert.assertThat;

/**
* Tests for {@link DefaultSchedulerFactory}.
* Tests for {@link DefaultSchedulerComponents}.
*/
public class DefaultSchedulerFactoryTest extends TestLogger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class DefaultSchedulerFactoryTest extends TestLogger {
public class DefaultSchedulerComponentsFactoryTest extends TestLogger {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -35,7 +35,7 @@
import static org.junit.Assert.assertThat;

/**
* Tests for {@link DefaultSchedulerFactory}.
* Tests for {@link DefaultSchedulerComponents}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Tests for {@link DefaultSchedulerComponents}.
* Tests for {@link DefaultSchedulerComponents#createSchedulerComponents}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

import java.util.function.Consumer;

/**
* Components to create a {@link DefaultScheduler}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Components to create a {@link DefaultScheduler}.
* Components to create a {@link DefaultScheduler} which depend on the configured {@link JobManagerOptions#SCHEDULING_STRATEGY}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@zhuzhurk zhuzhurk force-pushed the FLINK_17016_enable_pipelined_scheduling branch from 95fb9ba to b50f961 Compare September 10, 2020 08:45
@zhuzhurk zhuzhurk changed the title [FLINK-17016][runtime] Enable pipelined region scheduling [FLINK-17016][runtime] Integrate pipelined region scheduling Sep 10, 2020
@zhuzhurk
Copy link
Contributor Author

The commits to "enabled pipelined region scheduling by default" are dropped.
Will enable pipelined region scheduling by default in a separate task FLINK-19189, after we have verified the new scheduling is working well.

@zhuzhurk zhuzhurk force-pushed the FLINK_17016_enable_pipelined_scheduling branch from b50f961 to e62b168 Compare September 10, 2020 09:00
…tegy

It can be enabled via config option "jobmanager.scheduler.scheduling-strategy=region"
@zhuzhurk zhuzhurk force-pushed the FLINK_17016_enable_pipelined_scheduling branch from e62b168 to a687e9a Compare September 10, 2020 12:34
Copy link
Contributor

@azagrebin azagrebin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for address my comments @zhuzhurk LGTM

@zhuzhurk
Copy link
Contributor Author

Thanks for reviewing! @azagrebin
Merging.

@zhuzhurk zhuzhurk merged commit 4df2295 into apache:master Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants