[FLINK-5040] [jobmanager] Set correct input channel types with eager scheduling #2783

uce · 2016-11-10T16:33:36Z

When we do eager deployment all intermediate stream/partition locations are already known when scheduling an intermediate stream/partition consumer. Nonetheless we saw tasks with "unknown input channels" that were updated lazily during runtime. This was caused by a wrong producer execution state check requiring the producers to be in RUNNING or DEPLOYING state when creating consumer input channels. This is changed in the 2nd commit.

The 1st commit revert a bogus fix as part of FLINK-3232. With that "fix" we actually did not fix anything correctly and instead doubled the number of schedule or update consumer messages we sent.

Furthermore (3rd commit) we change the initial and max partition request back off to 100ms and 10secs respectively. Those numbers were hard coded before. As a safety net for very slow deployments, the values can be changed via the config. No user should need to change this config value in practice.

The reverted commit did not really fix anything, but hid the problem by brute force, sending many more schedule or update consumers messages.

…scheduling

The back offs were hard coded before, which would have made it impossible to react to any potential problems with them.

uce mentioned this pull request Nov 10, 2016

[backport] [FLINK-5040] [jobmanager] Set correct input channel types with eager scheduling #2784

Closed

uce force-pushed the eager_deployment branch from 9d186d9 to 96b2ee2 Compare November 10, 2016 17:40

uce added 3 commits November 10, 2016 22:59

Revert "[FLINK-3232] [runtime] Add option to eagerly deploy channels"

0d2e8b2

The reverted commit did not really fix anything, but hid the problem by brute force, sending many more schedule or update consumers messages.

[FLINK-5040] [jobmanager] Set correct input channel types with eager …

2742d5c

…scheduling

[FLINK-5040] [taskmanager] Adjust partition request backoffs

ca797d9

The back offs were hard coded before, which would have made it impossible to react to any potential problems with them.

uce force-pushed the eager_deployment branch from 96b2ee2 to ca797d9 Compare November 10, 2016 22:08

uce closed this Nov 11, 2016

uce deleted the eager_deployment branch February 16, 2017 09:27

rmetzger added the component=Runtime/Coordination label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-5040] [jobmanager] Set correct input channel types with eager scheduling #2783

[FLINK-5040] [jobmanager] Set correct input channel types with eager scheduling #2783

Uh oh!

uce commented Nov 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FLINK-5040] [jobmanager] Set correct input channel types with eager scheduling #2783

[FLINK-5040] [jobmanager] Set correct input channel types with eager scheduling #2783

Uh oh!

Conversation

uce commented Nov 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants