# SlurmCluster on JUWELS

 
## Configuring and running a Slurm Cluster

 - Here we use a client for clusters on Juwels that uses dask in the backend.
 - The client runs commands inside Singularity containers. The path to the Singularity image has to be given as an environemnt variable `a6.parallel.slurm.SINGULARITY_IMAGE_ENV_VAR`, e.g. as
   ```Python
   import os

   os.environ[a6.parallel.slurm.SINGULARITY_IMAGE_ENV_VAR] = "<path to Singularity image file>"

   ```
   The executable of the dask workers can also be overwritten, though. 
   E.g. a virtual environment could be loaded and the Python executable be used:
   ```Python
   client = a6.parallel.slurm.JuwelsClient(
       ...
       extra_job_commands=[". /<path to venv>/bin/activate"],
       python_executable=["python3"],
   )
   ```
 - Dask will convert all configurations to a batch script which can be inspected via `JuwelsClient.job_script`.
 - The `port` argument of `JuwelsClient` defines the port used for the dask dashboard. The dashboard can be accessed via `https://jupyter-jsc.fz-juelich.de/user/<user_name>/<lab_name>/proxy/<port>/status`.
 
## Running a distributed program

 - The `JuwelsClient` instance is initialized with 2 workers by default, running on one node. `with client.scale(...):` enables the user to scale the cluster to the desired capacity. In the notebook, just typing `client.cluster` and hitting return will provide you with a widget in which you can scale your cluster as well.
 - Dask supports lazy evaluation; more concretely, graph building and graph execution are separated. In your notebook, you can define any function, that computes on a `dask.array`. The function can be called with arguments to give you the graph. `func(*args, **kwargs).compute()` will trigger actual execution. If a cluster is configured and requested in a context via `with client.scale(...)`, the execution is distributed by dask.
 

## Running on Juwels Booster

When running on Juwels Booster, i.e. the `booster` partition, the client needs an extra argument for requesting resources since Juwels Booster has only GPU nodes. The GPU nodes in the JUWELS Booster feature four NVIDIA A100 GPUs. The number of requested GPUs must be specified using the `--gres=gpu:X` argument with `X` in the range one to four. Example for 1 GPU:
```Python
client = a6.parallel.slurm.JuwelsClient(
    ...
    extra_slurm_options=["--gres=gpu:1"],
)
```
 

In [None]:
import functools
import os
import itertools

import numpy as np

import a6

a6.utils.log_to_stdout()

os.environ[
    a6.parallel.slurm.SINGULARITY_IMAGE_ENV_VAR
] = "/p/scratch/deepacf/emmerich1/jupyter-a6/jupyter-kernel.sif"

data_path = "/p/scratch/deepacf/maelstrom_data/4cast-application6/ml/temperature_level_128_daily_averages_2017_2020.nc"

In [None]:
queue = "batch"
project = "deepacf"

client = a6.parallel.slurm.JuwelsClient(
    queue=queue,
    project=project,
    cores=64,
    processes=4,
    memory=64,
    walltime="01:00:00",
)

In [None]:
variance_ratio = [None]
n_clusters = [29]
use_varimax = [True]

method = functools.partial(
    a6.benchmark.wrap_benchmark_method_with_logging(a6.pca_and_kmeans),
    data_path,
)
arguments = itertools.product(variance_ratio, n_clusters, use_varimax)

In [None]:
with client.scale(workers=4):
    results = client.execute(method, arguments)