[FLINK-9102][FLIP6] disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment #5781

sihuazhou · 2018-03-28T12:24:55Z

What is the purpose of the change

When we start cluster locally with fixed TMS and build stream job with Flip6LocalStreamEnvironment, we should disable queued scheduling for JobGraph.

Brief change log

disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

no

sihuazhou · 2018-03-28T12:25:08Z

CC: @tillrohrmann

tillrohrmann · 2018-03-28T12:38:50Z

Hi @sihuazhou, I'm not sure whether this is possible, because the TMs might not be registered when you submit a job. Thus, we might need queued scheduling as well.

sihuazhou · 2018-03-28T12:44:55Z

Hi @tillrohrmann you are right that seems to be impossible...but I think maybe we should at lest fail the job immediately when the job's parallelism exceeds the max slots of the fixed cluster. What do you think?

tillrohrmann · 2018-03-28T13:59:10Z

We don't know a priori how many TaskExecutors have been started by the start up script. It could also be the case that some of them will be started manually at a later point in time.

sihuazhou · 2018-03-28T14:25:54Z

Hi @tillrohrmann , it's a bit sticky, do you have any idea for the current situation? That is when we submit a job and it's parallelism exceeds the max slots of the fixed cluster, we have to wait until timeout (default 5 min)... this seems not so nice to the user, and if user submit the job in detached mode they may even think that the job are running successfully

tillrohrmann · 2018-03-28T15:18:14Z

This is a good question and so far I couldn't find a good solution for it. We could have different slot timeouts for standalone vs non-standalone. But this might be a bit tricky to maintain. Moreover, this would be a kind of auto magic which in general should be avoided.

sihuazhou · 2018-03-28T15:33:01Z

Thanks a lot for you reply, closing this PR since it's invalid.

disable queued LocalStreamEnvironment.

343ffbb

sihuazhou closed this Mar 28, 2018

sihuazhou deleted the disableQueuedSchedulingForFlip6LocalStreamEnvironment branch June 7, 2018 02:43

rmetzger added the component=Runtime/Coordination label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-9102][FLIP6] disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment #5781

[FLINK-9102][FLIP6] disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment #5781

sihuazhou commented Mar 28, 2018

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018 •

edited

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018

sihuazhou commented Mar 28, 2018

[FLINK-9102][FLIP6] disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment #5781

[FLINK-9102][FLIP6] disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment #5781

Conversation

sihuazhou commented Mar 28, 2018

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018 • edited

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018

sihuazhou commented Mar 28, 2018

tillrohrmann commented Mar 28, 2018 •

edited