# Using `f3dasm` on a High-Performance Cluster Computer

Your `f3dasm` workflow can be seamlessly translated to a high-performance computing cluster.  
The advantage is that you can parallelize the total number of experiments among the nodes of the cluster.  
This is especially useful when you have a large number of experiments to run.

> This example has been tested on the following high-performance computing cluster systems:
> 
> - The [hpc06 cluster of Delft University of Technology](https://hpcwiki.tudelft.nl/index.php/Main_Page), using the [TORQUE resource manager](https://en.wikipedia.org/wiki/TORQUE).
> - The [DelftBlue: TU Delft supercomputer](https://www.tudelft.nl/dhpc/system), using the [SLURM resource manager](https://slurm.schedmd.com/documentation.html).
> - The [OSCAR compute cluster from Brown University](https://docs.ccv.brown.edu/oscar/getting-started), using the [SLURM resource manager](https://slurm.schedmd.com/documentation.html).


In [1]:
from time import sleep

import numpy as np

from f3dasm import HPC_JOBID, ExperimentData
from f3dasm.design import make_nd_continuous_domain

# We will create the following data-driven process:

- Create a 20D continuous `Domain`.
- Sample from the domain using a Latin-hypercube sampler.
- With multiple nodes, use a data generation function, which will be the `"Ackley"` function from the benchmark functions.

<div style="text-align: center;">
    <img src="../../img/f3dasm-workflow-example-cluster.png" alt="Block" title="Block" width="60%" />
</div>

We want to ensure that the sampling is done only once, and that the data generation is performed in parallel.  
Therefore, we can divide the different nodes into two categories:

- The first node (`f3dasm.HPC_JOBID == 0`) will be the **master** node, responsible for creating the design-of-experiments and sampling (the `create_experimentdata` function).


In [2]:
def create_experimentdata():
    """Design of Experiment"""
    # Create a domain object
    domain = make_nd_continuous_domain(
        bounds=np.tile([0.0, 1.0], (20, 1)), dimensionality=20)

    # Create the ExperimentData object
    data = ExperimentData(domain=domain)

    # Sampling from the domain
    data.sample(sampler='latin', n_samples=10)

    # Store the data to disk
    data.store()

- All the other nodes (`f3dasm.HPC_JOBID > 0`) will be **process** nodes, which will retrieve the `ExperimentData` from disk and proceed directly to the data generation function.

<div style="text-align: center;">
    <img src="../../img/f3dasm-workflow-cluster-roles.png" alt="Block" title="Block" width="60%" />
</div>

In [3]:
def worker_node():
    # Extract the experimentdata from disk
    data = ExperimentData.from_file(project_dir='.')

    """Data Generation"""
    # Use the data-generator to evaluate the initial samples
    data.evaluate(data_generator='Ackley', mode='cluster')


The entrypoint of the script can now check the jobid of the current node and decide whether to create the experiment data or to run the data generation function:

In [4]:
if __name__ == '__main__':
    # Check the jobid of the current node
    if HPC_JOBID is None:
        # If the jobid is none, we are not running anything now
        pass

    elif HPC_JOBID == 0:
        create_experimentdata()
        worker_node()
    elif HPC_JOBID > 0:
        # Asynchronize the jobs in order to omit racing conditions
        sleep(HPC_JOBID)
        worker_node()

## Running the Program

You can run the workflow by submitting the bash script to the HPC queue.  
Make sure you have [miniconda3](https://docs.anaconda.com/free/miniconda/index.html) installed on the cluster and that you have created a conda environment (in this example named `f3dasm_env`) with the necessary packages.

### TORQUE Example

---

```bash
#!/bin/bash
# Torque directives (#PBS) must always be at the start of a job script!
#PBS -N ExampleScript
#PBS -q mse
#PBS -l nodes=1:ppn=12,walltime=12:00:00

# Make sure I'm the only one that can read my output
umask 0077

# The PBS_JOBID looks like 1234566[0].
# With the following line, we extract the PBS_ARRAYID, the part in the brackets []:
PBS_ARRAYID=$(echo "${PBS_JOBID}" | sed 's/\[[^][]*\]//g')

module load use.own
module load miniconda3
cd $PBS_O_WORKDIR

# Activate my conda environment:
source activate f3dasm_env

# Limit the number of threads
OMP_NUM_THREADS=12
export OMP_NUM_THREADS=12

# If the PBS_ARRAYID is not set, set it to None
if ! [ -n "${PBS_ARRAYID+1}" ]; then
    PBS_ARRAYID=None
fi

# Execute my Python program with the jobid flag
python main.py --jobid=${PBS_ARRAYID}
```

---

### SLURM Example

---

```bash
#!/bin/bash -l

#SBATCH -J "ExampleScript"                # Name of the job
#SBATCH --get-user-env                    # Set environment variables

#SBATCH --partition=compute
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --cpus-per-task=1
#SBATCH --mem=0
#SBATCH --account=research-eemcs-me
#SBATCH --array=0-2

source activate f3dasm_env

# Execute my Python program with the jobid flag
python3 main.py --jobid=${SLURM_ARRAY_TASK_ID}
```

---

You can run the workflow by submitting the bash script to the HPC queue.  
The following command submits an array job with 3 jobs where `f3dasm.HPC_JOBID` takes values of 0, 1, and 2.

### TORQUE Example

```bash
qsub pbsjob.sh -t 0-2
```

### SLURM Example

```bash
sbatch --array 0-2 pbsjob.sh
```