Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9102][FLIP6] disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment #5781

Conversation

sihuazhou
Copy link
Contributor

What is the purpose of the change

When we start cluster locally with fixed TMS and build stream job with Flip6LocalStreamEnvironment, we should disable queued scheduling for JobGraph.

Brief change log

  • disable queued scheduling for JobGraph in Flip6LocalStreamEnvironment

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • no

@sihuazhou
Copy link
Contributor Author

CC: @tillrohrmann

@tillrohrmann
Copy link
Contributor

Hi @sihuazhou, I'm not sure whether this is possible, because the TMs might not be registered when you submit a job. Thus, we might need queued scheduling as well.

@sihuazhou
Copy link
Contributor Author

Hi @tillrohrmann you are right that seems to be impossible...but I think maybe we should at lest fail the job immediately when the job's parallelism exceeds the max slots of the fixed cluster. What do you think?

@tillrohrmann
Copy link
Contributor

tillrohrmann commented Mar 28, 2018

We don't know a priori how many TaskExecutors have been started by the start up script. It could also be the case that some of them will be started manually at a later point in time.

@sihuazhou
Copy link
Contributor Author

Hi @tillrohrmann , it's a bit sticky, do you have any idea for the current situation? That is when we submit a job and it's parallelism exceeds the max slots of the fixed cluster, we have to wait until timeout (default 5 min)... this seems not so nice to the user, and if user submit the job in detached mode they may even think that the job are running successfully

@tillrohrmann
Copy link
Contributor

This is a good question and so far I couldn't find a good solution for it. We could have different slot timeouts for standalone vs non-standalone. But this might be a bit tricky to maintain. Moreover, this would be a kind of auto magic which in general should be avoided.

@sihuazhou
Copy link
Contributor Author

Thanks a lot for you reply, closing this PR since it's invalid.

@sihuazhou sihuazhou closed this Mar 28, 2018
@sihuazhou sihuazhou deleted the disableQueuedSchedulingForFlip6LocalStreamEnvironment branch June 7, 2018 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants