Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Scheduler method to compute target cores and memory #2258

Closed
wants to merge 1 commit into from

Conversation

mrocklin
Copy link
Member

This moves logic from Adaptive onto the main Scheduler and is a small
tentative step towards restructuring deployment.

cc @jcrist

This moves logic from Adaptive onto the main Scheduler and is a small
tentative step towards restructuring deployment.
Copy link
Member

@guillaumeeb guillaumeeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin is this related to #2235? Not sure I understand why you would want to make target_workers an handler. In your proposed design, adaptive intelligence is still running alongside the scheduler, isn't it?

You are planning things still further than this, with non python scheduler and so one?

@@ -957,7 +958,8 @@ def __init__(
'heartbeat_worker': self.heartbeat_worker,
'get_task_status': self.get_task_status,
'get_task_stream': self.get_task_stream,
'register_worker_callbacks': self.register_worker_callbacks
'register_worker_callbacks': self.register_worker_callbacks,
'target_workers': self.target_workers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why making this a handler method that can be called remotely?

@mrocklin
Copy link
Member Author

mrocklin commented Oct 2, 2018

Not sure I understand why you would want to make target_workers an handler. In your proposed design, adaptive intelligence is still running alongside the scheduler, isn't it?

The decision of "how many workers should the cluster have" is best made by the scheduler, which has all of the information necessary to make this decision.

However the actual creation and destruction of the workers may happen on a separate process (ClusterManager) and so it may want to ask the scheduler for the target number of workers.

You are planning things still further than this

There is still plenty of work to do, but I don't anticipate working on this topic personally in the next month.

with non python scheduler and so one?

I'm not sure I understand what you mean here by non-python scheduler

@guillaumeeb
Copy link
Member

However the actual creation and destruction of the workers may happen on a separate process (ClusterManager) and so it may want to ask the scheduler for the target number of workers.

From previous exchanges, I had the impression that it was the Scheduler that should ask the ClusterManager to scale to a given number of cores, not the ClusterManager asking how many workers to launch, but it may be the correct solution, I'm not sure. It depends where Adaptive logic must be run.

I'm not sure I understand what you mean here by non-python scheduler

In another issue, IIRC, you speak about the possibility of Scheduler non being a python piece of software in the future.

@mrocklin
Copy link
Member Author

mrocklin commented Oct 2, 2018

From previous exchanges, I had the impression that it was the Scheduler that should ask the ClusterManager to scale to a given number of cores, not the ClusterManager asking how many workers to launch, but it may be the correct solution, I'm not sure. It depends where Adaptive logic must be run.

Yeah, I don't have a strong opinion here. My guess though is that it will be easier for a ClusterManager to contact the Scheduler rather than the other way. The scheduler is already running a server and we're accustomed to contacting it. If we can avoid setting up another server for the ClusterManager that sounds ideal. The cluster manager will also start the scheduler, and so will probably know the address to contact it.

@guillaumeeb
Copy link
Member

Yep you are right, that's one important design décision for #2235:

  • ClusterManager will also run adaptive Logic
  • Scheduler must provide remote method to informé resources It needs.

@jrbourbeau
Copy link
Member

Closing as I think was implemented elsewhere

@jrbourbeau jrbourbeau closed this Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants