# SlurmCluster on E4

Available systems:

- Arm Frontend: 172.18.16.70 (tlnode01.e4red)
- Intel Frontend*: 172.18.19.216 (ilnode01.e4red)
- AMD Frontend*:* 172.18.16.79 (alnode01.e4red)

## Measuring power consumption

On E4, power consumption can be measured, but only `icnode01` is connected to a power meter. When queueing, this host has to be selected by passing the `--nodelist icnode01` flag to Slurm. The power measurements can be written to a CSV file via the `/opt/share/sw/zes/power.py`script, which has to be run with `sudo`. Using the `-f` flag, the file path can be specified. It has to be started before submitting the job and writes the power consumption on the fly.

```commandline
sudo /opt/share/sw/zes/power.py -f <path to outfile>.csv

```

## Account-partition options

The command `sacctmgr show user <user name> withassoc` shows all available account-partition combinations that can be used with Slurm. 
  - `*-hw` partitions are designed for short jobs (<=1h) and have 2 nodes
  - `*-lw` partitions are designed for long jobs (1h-3d) and have 4 nodes
  - `ice-nc` partition is designed for power consumption measurement and has 2 nodes.
  - `*-builder` partitions are designed for very small jobs that only require very few resources such as compiling a program or compressing an archive. Each job can request no more than 6 cores and 16GB of memory.

Partitions:
  - Intel architecture: `casc-hw`, `casc-lw`, and `ice-nc` (for power consumption measurement)
  - AMD architecture: `mil-hw` and `mil-lw`

Accounts:
  - The name of the project account is `maelstrom`.
  - The name of the account that can be used with the `*-builder` partitions is `builder`.
  
 
## Configuring and running a Slurm Cluster

 - Here we use a client for clusters on E4 that uses dask in the backend.
 - The client runs commands inside Singularity containers. The path to the Singularity image has to be given as an environemnt variable `lifetimes.parallel.slurm.SINGULARITY_IMAGE_ENV_VAR`, e.g. as
   ```Python
   import os

   os.environ[lifetimes.parallel.slurm.SINGULARITY_IMAGE_ENV_VAR] = "<path to Singularity image file>"

   ```
   The executable of the dask workers can also be overwritten, though. 
   E.g. a virtual environment could be loaded and the Python executable be used:
   ```Python
   client = lifetimes.parallel.slurm.E4Client(
       ...
       extra_job_commands=[". /<path to venv>/bin/activate"],
       python_executable=["python3"],
   )
   ```
 - Dask will convert all configurations to a batch script which can be inspected via `E4Client.job_script`.
 - The `port` argument of `E4Client` defines the port used for the dask dashboard. The dashboard can be accessed via `https://<host ip>:<port>`.
 
## Running a distributed program

 - The `E4Client` instance is initialized with 2 workers by default, running on one node. `with client.scale(...):` enables the user to scale the cluster to the desired capacity. In the notebook, just typing `client.cluster` and hitting return will provide you with a widget in which you can scale your cluster as well.
 - Dask supports lazy evaluation; more concretely, graph building and graph execution are separated. In your notebook, you can define any function, that computes on a `dask.array`. The function can be called with arguments to give you the graph. `func(*args, **kwargs).compute()` will trigger actual execution. If a cluster is configured and requested in a context via `with client.scale(...)`, the execution is distributed by dask.

In [None]:
import functools
import itertools
import os

import numpy as np

import lifetimes

lifetimes.utils.log_to_stdout()

os.environ[lifetimes.parallel.slurm.SINGULARITY_IMAGE_ENV_VAR] = "/home/femmerich/.singularity/jupyter-kernel.sif"

data_path = "/data/maelstrom/a6/temperature_level_128_daily_averages_2017_2020.nc"

In [None]:
client = lifetimes.parallel.slurm.E4Client(
    queue="ice-nc",
    project="maelstrom",
    cores=16,
    memory=64,
    processes=4,
    walltime="02:00:00",
    extra_slurm_options=["--nodelist icnode01"],
)

In [None]:
variance_ratio = [None]
n_clusters = [29]
use_varimax = [True]

method = functools.partial(
    lifetimes.benchmark.wrap_benchmark_method_with_logging(lifetimes.pca_and_kmeans),
    data_path,
)

arguments = itertools.product(variance_ratio, n_clusters, use_varimax)

In [None]:
with client.scale(workers=1):
    results = client.execute(method, arguments)

In [None]:
results