## Working with Dask on Cheyenne

### Here I'll make a guide to using dask on Cheyenne, the way I use it at least, and some tools to make it more friendly

### Dask installation
I'll assume you have an Anaconda environment set up on Cheyenne with dask and dask_jobqueue installed. If your environment doesn't have these, here is how to install them:
>```conda install -c conda-forge dask dask-jobqueue```

### dask_labextension
Next, I'll make use of the really nice dask-labextension tool for Jupyter Lab. This lets you monitor the progress of your computation from within Jupyter. It takes a bit of set up, but you only need to do it once. I'll walk through it here

dask_labextension requires nodejs. By default, conda installs a version that is too old. But, you can make it install the right one by typing:
>```conda install -c conda-forge nodejs --repodata-fn=repodata.json```

You can then install dask_labextension by typing:
>```conda install -c conda-forge dask-labextension```

You then need to enable the extension in JupyterLab, which is achieved by typing:
>```jupyter labextension install dask-labextension```  
>```jupyter serverextension enable dask_labextension```  

Note the first command has a hypen in dask-labextension, while the second uses an underscore.

Finally, we need to edit one config file to allow dask dashboard to be accessible through the same ssh tunnell as JupyterLab
This is achieved by editing the file `~/.config/dask/distributed.yaml`. Find the following section:
```

#   ###################  
#   # Bokeh dashboard #  
#   ###################  

#   dashboard:
#     link: "http://{host}:{port}/status"
#     export-tool: False

```
then uncomment and edit the second to last line to make the section read:
```

#   ###################  
#   # Bokeh dashboard #  
#   ###################  

#   dashboard:
      link: "/proxy/{port}/status"
#     export-tool: False

```

### Using dask-labextension in JupyterLab

Now that it's installed, dask-labextension will appear as a button in the vertical toolbar on the far left of JupyterLab.
If you click on it, it will pull up a pane like this:

<img src="dask_pane.png" width ="250" >

To start a dask cluster on Casper, do the following:

In [2]:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(project='UWAS0052')
cluster.scale(10)

from dask.distributed import Client
client = Client(cluster)
client

  return f(*args, **kwds)


0,1
Client  Scheduler: tcp://10.12.205.28:38199  Dashboard: http://10.12.205.28:8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


You can adjust the number of CPUs by changing the number in the `cluster.scale()` command.

Then, in the search bar at the top of the dask labextension pane, enter:
```
http://localhost:8888/proxy/8787/status
```
Note, that 8787 is the default port that dask will try to launch the cluster on. If it is occupied by another user, it will use a different number. Simply replace 8787 by the number before `/status` in the url next to **Dashboard** in the output of the client above.

Once you enter the address, the buttons that were previously grey should turn orange.

You can then open various monitoring panes from the orange buttons. The most useful (to me at least) are the TASK STREAM, PROGRESS and MEMORY (WORKER) tabs