Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greedy cluster client #36

Merged
merged 7 commits into from
Feb 26, 2019
Merged

Conversation

ian-r-rose
Copy link
Collaborator

Work towards having all active notebooks and consoles get references to a client for the currently active Dask cluster. The current code injected into a kernel is

import dask; from dask.distributed import Client
dask.config.set({'scheduler-address': '${model.scheduler_address}'})
client = Client()

@ian-r-rose
Copy link
Collaborator Author

Fixes #35

@mrocklin
Copy link
Member

Oooh, nice

image

@mrocklin
Copy link
Member

Confirmed that this also works nicely when I restart. Pretty slick @ian-r-rose !

My guess is that we don't want this behavior on by default at first. Thoughts on the right way to manage configuration?

@mrocklin
Copy link
Member

For context. @ian-r-rose and I were talking about ways to enable administrators to create clusters for users that they never have to interact with.

For example I think that the combination of this PR along with #37 will allow for Pangeo or example.dask.org users to just start with import xarray or import iris and not think about starting Dask clusters or Dask clients at all. Hopefully we can hide that boilerplate from them.

cc @jacobtomlinson @yuvipanda @jhamman

@mrocklin
Copy link
Member

To be more explicit

image

@ian-r-rose
Copy link
Collaborator Author

I agree that this behavior should not be on by default. I think the best way to approach it would be via the JupyterLab settings system.

@ian-r-rose
Copy link
Collaborator Author

Okay, I think this is in reasonable shape. The current behavior:

  • There is a setting in the JupyterLab settings system called greedyClusterClient which defaults to false. If it is true, the behavior in this PR is enabled. It should be able to handle live-changes to that setting. I am open to suggestions for better names/descriptions for this setting.
  • There is an item in the settings menu and command palette to toggle this new behavior (Greedy Dask Client). Again, I am looking for ideas for a better short, descriptive name for this behavior.
  • When a new notebook or console is created, the code for connecting to the active cluster is injected.
  • When a kernel restarts, the client connection code is reinjected for all open notebooks and consoles.
  • When a new cluster is selected, the client connection code is reinjected for all open notebooks and consoles.
  • I have made it so that at least one cluster in the listing is always selected.
  • The currently active cluster in the listing is stored in the state database, so it should be remembered upon page refresh.

@ian-r-rose ian-r-rose changed the title [WIP] Greedy cluster client Greedy cluster client Dec 29, 2018
@jhamman
Copy link
Member

jhamman commented Dec 29, 2018

@ian-r-rose - this looks really cool. I'll try to give it a test spin next week. Also pinging @jacobtomlinson and @niallrobinson for comment since this is something they have been interested in.

@ian-r-rose
Copy link
Collaborator Author

@jhamman Did you get a chance to play with this?

@mrocklin
Copy link
Member

mrocklin commented Jan 9, 2019

I suggest the name auto-start rather than greedy.

Is there a way for us to control this in a Jupyter config file? If so, what does that config file look like?

@mrocklin
Copy link
Member

mrocklin commented Jan 9, 2019

Also, we can maybe drop the term Cluster from the name? Maybe add Dask (if things aren't already namespaced)

@ian-r-rose
Copy link
Collaborator Author

ian-r-rose commented Jan 9, 2019

Thanks for the suggestions @mrocklin. If I understand correctly, you are suggesting something like

  1. Name the setting autoStartClient
  2. Label the menu item Auto-Start Dask Client

Is that right?

This is driven by the client-side, so it can't be controlled in the server-side juptyer_notebook_config.py. That setting, however, is stored as a JSON file on disk, so it can be seeded into a docker image if needed.

@mrocklin
Copy link
Member

mrocklin commented Jan 9, 2019

I'm questioning the use of both the terms cluster and client. These seem to be specific to how dask works. I suspect that from a novice user's perspective they just want "Dask" to start. They may not know about clusters or clients or whatnot.

@ian-r-rose
Copy link
Collaborator Author

Sure, that makes sense to me. The setting name is a bit more targeted towards experts and admins, so I might advocate being a more specific there and retaining "Client." The menu item is much more user-facing, so removing "Client" works for me.

Regarding including "dask": the setting name is already namespaced by being in the Dask settings, but the menu label should have "Dask" in it, as you suggest.

@mrocklin
Copy link
Member

mrocklin commented Jan 9, 2019 via email

@rabernat
Copy link

Ping @jbusecke for a review of this. Julius is a postdoc at Princeton and a heavy user of pangeo on HPC. He mentioned that he was interested in this extension. Perhaps he can take this PR for a spin and give some feedback.

@jbusecke
Copy link

Oh this looks super sweet! I will definitely try to take this for a spin on the HPCs I am working on in the next days!

@jhamman
Copy link
Member

jhamman commented Jan 11, 2019

I setup a test binder with this branch today: Binder

Question: it sounds like this feature is meant to be off by default. How to I set it to on?

@ian-r-rose
Copy link
Collaborator Author

Hi @jhamman, you can toggle it in the Settings menu under "Auto-Start Dask".

@ian-r-rose
Copy link
Collaborator Author

Although I am having a hard time with that binder link because clusters are not starting for some reason.

@jhamman
Copy link
Member

jhamman commented Jan 11, 2019

The logs I'm getting on the binder is:

[W 04:12:24.249 LabApp] object KubeCluster can't be used in 'await' expression
[I 04:12:25.435 LabApp] Client sent subprotocols: ['']
[I 04:12:25.436 LabApp] Trying to establish websocket connection to ws://127.0.0.1:8787/individual-progress/ws?bokeh-protocol-version=1.0&bokeh-session-id=CVEoIFIvycC2DEGuFlS9jd9MsdPjBoirUFNwSAjkrT6n
[I 04:12:25.440 LabApp] Websocket connection established to ws://127.0.0.1:8787/individual-progress/ws?bokeh-protocol-version=1.0&bokeh-session-id=CVEoIFIvycC2DEGuFlS9jd9MsdPjBoirUFNwSAjkrT6n
[I 04:12:25.725 LabApp] Client sent subprotocols: ['']
[I 04:12:25.726 LabApp] Trying to establish websocket connection to ws://127.0.0.1:8787/individual-task-stream/ws?bokeh-protocol-version=1.0&bokeh-session-id=Ku90IbfLbhki6h46zNHhJCXdTA1Ge0Jvjzc1uVUSDxgL
[I 04:12:25.729 LabApp] Websocket connection established to ws://127.0.0.1:8787/individual-task-stream/ws?bokeh-protocol-version=1.0&bokeh-session-id=Ku90IbfLbhki6h46zNHhJCXdTA1Ge0Jvjzc1uVUSDxgL

should I be working off development versions of dask/distributed/dask-kubernetes?

@mrocklin
Copy link
Member

mrocklin commented Jan 11, 2019 via email

@mrocklin
Copy link
Member

dask/dask-kubernetes#116 may work? Untested

@mrocklin
Copy link
Member

You'll also need distributed master

@jhamman
Copy link
Member

jhamman commented Jan 11, 2019

I've switched the above binder to use the local cluster (for now) and to use distributed/master. Things still don't seem to be working so I'll keep investigating as time allows.

you can toggle it in the Settings menu under "Auto-Start Dask"

Is it possible to set this from outside the lab environment. Ideally, I could set the default value from the start script in my binder config.

@ian-r-rose
Copy link
Collaborator Author

@jhamman Yes, you can set it from outside of lab. The settings file is a JSON file on disk, so you can seed the docker image with that in place (I don't think the start script should even be necessary). I'll take a crack at it in your test branch today sometime.

@ian-r-rose
Copy link
Collaborator Author

here is a commit that auto-sets the setting in your binder example. It seems to work well, but there is still something broken with the cluster config...

@mrocklin
Copy link
Member

Just to update this PR I think that we should get async-kubernetes working in dask-kubernetes, and then revisit this.

It would also be good to test this with dask-jobqueue and dask-yarn. Mostly this makes the request that clusters can be started and stopped asynchronously. I suspect that this is already true for Dask-Jobqueue, but it would be good to test in practice (cc @guillaumeeb). I don't know the current status in dask-yarn (cc @jcrist )

@ian-r-rose
Copy link
Collaborator Author

No rush on this, happy to let people kick the wheels.

That being said, isn't the async cluster starting already in master (#37)?

@mrocklin
Copy link
Member

mrocklin commented Jan 23, 2019 via email

@guillaumeeb
Copy link
Member

Hi,

Not sure what is meant by

Mostly this makes the request that clusters can be started and stopped asynchronously

Unfortunately I did not have the time to test dask-labextension at all yet. I'm a little behind on this, we're still using standard Notebook with our Jupyterhub at CNES (due to a probably not really complicated problem with JLab I need to work on).

This looks interesting though, I'm often launching several clusters from several Notebooks, so if I could use only one, and automatically, that would be great! I'm putting this on my todo-list, but don"t wait on me to merge...

@ian-r-rose
Copy link
Collaborator Author

Hi @guillaumeeb, thanks for sharing your use-case! I'd like to hear how this feature works for you when you get the chance to try it.

Not sure what is meant by

Mostly this makes the request that clusters can be started and stopped asynchronously

This extension expects that different cluster implentations (e.g. LocalCluster or KubeCluster) can be started without blocking the main event loop. This means, for instance, that JupyterLab startup time won't be delayed by setting up any initial clusters: they can finish in their own time and be populated when they are ready.

We got a little bit ahead of ourselves by expecting that, however, since there is not a critical mass of implementers who have made cluster startup async yet. So now there is a bit of catch-up going on to make sure that works.

@mrocklin
Copy link
Member

mrocklin commented Feb 9, 2019

Note that I got dask-kubernetes mostly working asynchronously if anyone wants to give it a shot

pip install git+https://github.com/dask/dask-kubernetes@dev

@mrocklin
Copy link
Member

mrocklin commented Feb 9, 2019

Where by "I" I actually mean "Yuvi and I"

@guillaumeeb
Copy link
Member

guillaumeeb commented Feb 11, 2019

So I finally managed to deploy extensions into Jupyterlab behind my corporate proxy. I'm just beginning testing dask-labextension. Before I try this PR, I just wanted to know if my environment was working as intended. Typical steps I take right now are:

  1. Start a cluster inside my notebook
  2. Indicate scheduler URL in dask-labextension left panel
  3. Open views on my lab environment

Using Pangeo on binder, the notebook was started with an already defined layout: how do I do this?
I was under the impression that once a Cluster was started, dask-labextension views were automatically connected to them, isn't that true? Is the second step above always needed?

@ian-r-rose
Copy link
Collaborator Author

Hi @guillaumeeb, this functionality is intended to work with clusters managed by the extension (rather than ones launched in your notebook). It currently works with LocalCluster, and (possibly) KubeCluster. Basically, the cluster implementation needs to satisfy the Cluster interface, and be possible to start asynchronously (i.e. cluster = await MyCluster(*args)). We got a bit ahead of ourselves in requiring the async startup here, so it's possible that your deployment doesn't yet fit that usage.

If those preconditions are met, then all new notebooks should get a client for that cluster auto-injected into the current python kernel session (when the option turned on).

@guillaumeeb
Copy link
Member

and be possible to start asynchronously (i.e. cluster = await MyCluster(*args))

Okay, that helps a lot to know what is needed here.

and (possibly) KubeCluster

Why possibly? I personnaly maintain dask-jobqueue, which provides implementations of Cluster, not sure about the asynchronous part. Are these the only two requirements? How do we launch a cluster with the extension? I guess I should use the New button, but nothing happens when I try to click on it on my setup.

Maybe I should rather open another issue to discuss the questions from my previous comment on standard dask-labextension behavior, not really related to this PR?

@ian-r-rose
Copy link
Collaborator Author

and be possible to start asynchronously (i.e. cluster = await MyCluster(*args))

Okay, that helps a lot to know what is needed here.

and (possibly) KubeCluster

Why possibly?

I say possibly because @mrocklin and @yuvipanda just put a bunch of work into updating it, but I don't think it has been tested in this context yet.

I personnaly maintain dask-jobqueue, which provides implementations of Cluster, not sure about the asynchronous part. Are these the only two requirements? How do we launch a cluster with the extension? I guess I should use the New button, but nothing happens when I try to click on it on my setup.

The function to launch a new cluster is defined here:

async def make_cluster(configuration: dict) -> Cluster:
module = importlib.import_module(dask.config.get('labextension.factory.module'))
Cluster = getattr(module, dask.config.get('labextension.factory.class'))
cluster = await Cluster(*dask.config.get('labextension.factory.args'),
**dask.config.get('labextension.factory.kwargs'),
asynchronous=True)
configuration = dask.config.merge(
dask.config.get('labextension.default'),
configuration
)
adaptive = None
if configuration.get('adapt'):
adaptive = cluster.adapt(**configuration.get('adapt'))
elif configuration.get('workers') is not None:
cluster.scale(configuration.get('workers'))
return cluster, adaptive

The intention is that those are the only two requirements (implements Cluster and is awaitable), but we likely have some kinks to work out to make it widely usable. Are there any errors in the notebook logs when you click the "New" button?

Maybe I should rather open another issue to discuss the questions from my previous comment on standard dask-labextension behavior, not really related to this PR?

That would be great, thanks @guillaumeeb.

@mrocklin
Copy link
Member

This is in. My apologies for the long delay @ian-r-rose !

@michaelaye
Copy link

It is unclear to me where I get the seemingly hashed Dask Dashboard URL from that is visible in above screenshots? I tried to simply set it to http://127.0.0.1:8787/status, after which all the buttons lit up, and clicking on them opened a new jlab view, but no content was displayed. I guess I'm missing something? My default browser display of the status page works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants