#  Dask HPC  - lite version

## Documentation
**Recommended youtube:** [Matthew Rocklin](https://www.youtube.com/watch?v=FXsgmwpRExM&t=121s)
https://www.youtube.com/watch?v=FXsgmwpRExM&t=121s

dask website  [https://www.dask.org](https://www.dask.org)

In particular, see this [section](https://docs.dask.org/en/stable/deploying-hpc.html) for Dask on HPC

## Installation

See the [dask website](https://docs.dask.org/en/stable/install.html) for installation (recommended conda or mamba)

## Interactive dask on Leonardo or Galileo

_On Leonardo_

Login, activate Rapids+Dask environment and launch server-side jupyter

```bash
ssh login.leonardo.cineca.it  ## logon to leonardo  (e.g. WSL bash shell)
host=$(hostname) ## make note of hostname on leonardo
conda activate rapids-23.08   ## activate dask and rapids
jupyter-lab --no-browser --ip $host  ## set up browser for tunnelling
```

_On local PC_

Set up tunnel for jupyter (port 8888) and dask (port 8787) (change the login node below as appropriate)

```bash
## assume login was on login01
ssh -N -L 8787:login01.leonardo.local:8787 -L 8888:login01.leonardo.local:8888 login.leonardo.cineca.it
```

Open a browser window with the following address:
```
localhost:8888/lab
```

Now ready to go

--------------------------------------------------------------------------

### Jupyter-lab session

In [1]:
# Import dask and related modules.
import numpy as np
import pandas as pd
import dask
import dask.dataframe as dd
import dask.array as da
import dask.bag as db

In [None]:
# For Leonardo and HPC systems need batch scheduler interface
from distributed import Client
from dask_jobqueue import SLURMCluster

In [None]:
# Next step is to start a SLURM Cluster
# we define the characteristics of eack worker
cluster = SLURMCluster(cores=1,
                       processes=1,
                       memory="16GB",
                       account="cin_staff",
                       walltime="00:30:00",
                       interface="ib0",
                       job_extra_directives=['--tasks-per-node=1']
                       )

In [None]:
# define the client - from now on all caclulations will be done with the client
client = Client(cluster)

# check SLURM job script is valid with job_script()
print(cluster.job_script()) 


In [None]:
# Add more workers (i.e. SLURM jobs)
cluster.scale(2)


In [None]:
## Now do calculations - example
x=da.random.random((30000,30000),chunks=(1000,1000))
print(x)
x=x.persist()            # compute object, but keep it distributed among partitions
y=(x+x.T)-x.mean(axis=0)
y=y.persist()


> Remember to check the dashboard from ` http://127.0.0.1:8787/status`