Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify a cluster scheduler listener port? #355

Closed
hawk-sf opened this issue Oct 16, 2019 · 8 comments · Fixed by #384
Closed

Specify a cluster scheduler listener port? #355

hawk-sf opened this issue Oct 16, 2019 · 8 comments · Fixed by #384
Labels
bug Something isn't working
Milestone

Comments

@hawk-sf
Copy link

hawk-sf commented Oct 16, 2019

I'm trying to use dask-jobqueue on an SGE cluster, and by default our worker nodes cannot communicate back to the login node. The admins have set aside a small range of ports us to use for this, but I can't seem to specify a port/port range for the scheduler to listen on. I see with dask.distributed's LocalCluster, you can specify the argument scheduler_port, but SGECluster does not seem to take the keyword (though it doesn't complain):

cluster = SGECluster(queue='short.q',
                     cores=10,
                     memory='20GB',
                     walltime='00:20:00',
                     scheduler_port=8804)

cluster.scheduler shows the scheduler still being assigned a random port (always around 40,000):

<Scheduler: "tcp://169.230.126.85:39008" processes: 0 cores: 0>

Is there a way to configure this?

Thanks.

@lesteve
Copy link
Member

lesteve commented Oct 17, 2019

I can reproduce this and it seems like a regression in 0.7.0. It works as it should with 0.6.3.

from dask_jobqueue import SGECluster

cluster = SGECluster(queue='short.q',
                     cores=10,
                     memory='20GB',
                     walltime='00:20:00',
                     scheduler_port=8804)

print(cluster.scheduler_address)

Output with 0.6.3:

tcp://192.168.0.11:8804

Output with 0.7.0 (random port):

tcp://192.168.0.11:35257

My previous mental model was that FooCluster **kwargs were forwarded until they reach the underlying LocalCluster cluster, but there is no underlying LocalCluster anymore with the SpecCluster refactor in #307.

This would needs to be looked at maybe @mrocklin you have a super rough guestimate how hard this would be to fix ? I am hoping this is a silly mistake that we did not catch and should be reasonably straightforward to figure out where the **kwargs fail to be forwarded to the right place.

At the moment I don't have the bandwidth to investigate, but a PR would certainly be welcome!

@hawk-sf
Copy link
Author

hawk-sf commented Oct 18, 2019

Downgrading to 0.6.3 did the trick, for me. Thanks. I'd certainly be willing to take a look at getting **kwargs forwarded correctly, and putting in a PR; is this something you'd want a non-core contributer working on? Are these the only contribution guidelines: https://jobqueue.dask.org/en/latest/develop.html?

@mrocklin
Copy link
Member

The relevant lines to affect are here:

scheduler = {
"cls": Scheduler, # Use local scheduler for now
"options": {
"protocol": protocol,
"interface": interface,
"host": host,
"dashboard_address": dashboard_address,
"security": security,
},
}

You want to make sure that the options dict here includes "port": .... These values get passed directly to the dask.distributed.Scheduler constructor (see that class's docstring for more information). For a more generic solution we might consider accepting a scheduler_options= keyword that accepted a dictionary of options?

@lesteve lesteve added the bug Something isn't working label Oct 19, 2019
@lesteve
Copy link
Member

lesteve commented Oct 19, 2019

Thanks. I'd certainly be willing to take a look at getting **kwargs forwarded correctly, and putting in a PR; is this something you'd want a non-core contributer working on?

Sure, help would be more than welcome on this! This is a regression in 0.7 and it would be nice to get it fixed.

Are these the only contribution guidelines: jobqueue.dask.org/en/latest/develop.html?

Yes (there is a link to dask contributing guidelines with more material), there was some talk about adding CONTRIBUTING.md in dask-community dask/community#17.

In any case I am not sure what you mean by "only", but if you spot possible improvements in the contributing documentation, feel free to open a separate PR about this as well!

@guillaumeeb
Copy link
Member

see also dask/dask-kubernetes#196 for something similar.

@guillaumeeb
Copy link
Member

@hawk-sf would you still be willing to do something about this?

@hawk-sf
Copy link
Author

hawk-sf commented Feb 25, 2020

Yes- thanks for the prompt. This had slipped down the priority list, but I’ll have time to start at the end of this week.

@lesteve
Copy link
Member

lesteve commented Mar 4, 2020

FYI I have a PR in #384.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants