dask-jobqueue binder #276

lesteve · 2019-05-24T07:48:35Z

The idea is to have a binder setup with a toy cluster so that people can play with dask-jobqueue a bit without having to set it up on their cluster.

Our SLURM CI setup uses a single Dockerfile, maybe this image could be used to have a binder.

Binder allows you to use a Dockerfile:
https://github.com/binder-examples/minimal-dockerfile

Questions:

how does this idea work in practice. Is 1-2GB RAM enough for a toy cluster ?
if I use binder.pangeo.io does it work better (there seems to be more RAM on pangeo.io?)

If this idea works, we could think about what kind of notebooks to create (related to #253).

lesteve · 2019-05-24T08:01:29Z

I am going to try to do this and see how far I can push it.

lesteve · 2019-05-28T12:23:27Z

I have some proof of concept here:
https://github.com/lesteve/test-binder

Here is the binder link:
https://mybinder.org/v2/gh/lesteve/test-binder/master

For now there is a single notebook simple.ipynb. Comments more than welcome @willirath @guillaumeeb!

Full disclosure: I have seen some sporadic problems with the processes supervised by supervisord (mostly mysqld does not start correctly for some reason I have not yet figured out ...). I think we can probably use some work-around for this.

guillaumeeb · 2019-06-08T20:28:19Z

Thanks @lesteve! This is nice!

I had trouble making the binder start, I needed to launch it 4 times... Don't know why. Then I have the mysqldb daemon not started, but thanks to your first cell I could start it easily.

I think the idea works and the RAM may not be a limitation for some simple examples. There may be more on Pangeo binder, but not sure this will make a big difference if we don"t use separated pods for the workers.

The first question that came to my mind then is : how using SlurmCluster is different from LocalCluster. That's the beauty of Dask, just change LocalCluster with SlurmCluster and the rest of the code is the same. What specific example can we set up for dask-jobqueue?

Is LocalCluster able to use adaptive logic?
Should we show the different args specific to a job queuing system, like local-directory or the memory resources?
Should we add some HPC-like example : Montecarlo simulation, like Pi computation?

lesteve · 2020-02-28T16:17:52Z

I had another look at this, I tweaked it a bit, and it looks like this is working better than my last attempt (not sure why ...). So maybe worth revisiting?

https://mybinder.org/v2/gh/lesteve/test-binder/master?filepath=simple.ipynb

For me the main point would be a quick intro into dask-jobqueue:

creating the cluster + client
cluster.scale
simple example of Dask Dataframe, delayed, and futures
cluster.job_script
look at the logs created by the workers
mention the dashboard
mention the different things to tweak the submission script, queue, walltime, job_extra, env_extra, etc ...
refer to Dask documentation for more details on Dask, mentioning that SLURMCluster and LocalCluster can be replaced by each other
refer to their local cluster doc for more details
maybe more stuff that I have missed

Comments more than welcome!

willirath · 2020-03-02T10:29:02Z

That's great news! I'll have a look.

willirath · 2020-03-02T10:33:37Z

Looks a lot more stable. Scaling the cluster up and down doesn't seem to break the Slurm scheduler anymore.

lesteve added the sprint A good issue to tackle during a sprint label May 24, 2019

guillaumeeb mentioned this issue May 24, 2019

Lessons learned / action items following the European Pangeo Meeting? pangeo-data/pangeo#646

Closed

lesteve added the documentation Documentation-related label Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dask-jobqueue binder #276

dask-jobqueue binder #276

lesteve commented May 24, 2019

lesteve commented May 24, 2019

lesteve commented May 28, 2019

guillaumeeb commented Jun 8, 2019

lesteve commented Feb 28, 2020

willirath commented Mar 2, 2020

willirath commented Mar 2, 2020