<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg"
     align="right"
     width="30%"
     alt="Dask logo\">

# Distributed - spread your data and computation across a cluster

As we covered at the beginning Dask has the ability to run work on multiple machines using the distributed scheduler.

Until now we have actually been using the distributed scheduler for our work, but just on a single machine.

When we instantiate a `Client()` object with no arguments it will attempt to locate a Dask cluster. It will check your local Dask config and environment variables to see if connection information has been specified. If not it will create an instance of `LocalCluster` and use that.

## Local Cluster

Let's explore the `LocalCluster` object ourselves and see what it is doing.

In [None]:
from dask.distributed import LocalCluster, Client

In [None]:
cluster = LocalCluster()
cluster

Creating a cluster object will create a Dask scheduler and a number of Dask workers. If no arguments are specified then it will autodetect the number of CPU cores your system has and the amount of memory and create workers to appropriately fill that.

Our cluster object has attributes and methods which we can use to access information about our cluster. For instance we can get the log output from the scheduler and all the workers with the `get_logs()` method.

In [None]:
cluster.get_logs()

We can access the url that the Dask dashboard is being hosted at.

In [None]:
cluster.dashboard_link

In order for Dask to use our cluster we still need to create a `Client` object, but as we have already created a cluster we can pass that directly to our client.

In [None]:
client = Client(cluster)
client

In [None]:
del client, cluster

## Remote clusters via SLURM
We use SLURM on Perlmutter

In [None]:
from dask_jobqueue import SLURMCluster


cluster = SLURMCluster(
    cores=2,
    memory="1GB",
    walltime="01:00:00",
    job_extra_directives=[f"--qos=shared", f"-C cpu"],
)

Let's get the URL of the Dask Dashboard

In [None]:
cluster.dashboard_link

And port-forward it to our own computer:
```
ssh -N -L 8000:localhost:8787 dnoll@perlmutter
```

And take a look at it in our local browser:
```
http://localhost:8000/status
```

### Scale the Cluster

With some cluster managers it is possible to scale the cluster dynamically.

You can do this explicitly using `cluster.scale` function:
```
cluster.scale(jobs=10)  # Launch 10 jobs
```
or via specifying the total amount of cores
```
cluster.scale(cores=48)
```
or memory
```
cluster.scale(memory="200 GB")
```

Or dynamically using `cluster.adapt`, find more [here](https://docs.dask.org/en/latest/adaptive.html).

In [None]:
cluster.scale(jobs=10)

### Connect a client and compute
Next, let's connect a client to this cluster

In [None]:
from dask.distributed import LocalCluster, Client

client = Client(cluster)
client

And do some computation

In [None]:
import time


def function(x):
    time.sleep(3)
    return x+1

futures = client.map(function, list(range(10)))
results = client.gather(futures)

## Close your Dask Cluster
Afterwards we scale down the cluster and close client and cluster

In [None]:
cluster.scale(0)
client.close()
cluster.close()

## More info

You can find more info and some 'official scripts' for Dask @ Perlmutter here https://docs.nersc.gov/analytics/dask/ and more examples here https://gitlab.com/NERSC/nersc-notebooks/-/tree/main/perlmutter/dask.