# RADICAL-Cybertools Tutorial

RADICAL-Cybertools support the execution of ensemble applications at scale on high performance computing (HPC) platforms. Ensemble applications enable using diverse algorithms to coordinate the execution of up to $10^6$ tasks on all the processors (CPU/GPU) of an HPC machine. This type of applications are common in biophysical systems, climate science, seismology, and polar science domains. RADICAL-Cybertools address challenges of scale, diversity and reliability.

Adaptive ensemble are a particularly interesting type of ensemble applications in which adaptivity is used to determine the behavior of the application at runtime. For example, many biomolecular sampling algorithms are formulated as adaptive: replica-exchange, Expanded Ensemble, etc. Introducing adaptivity, improved simulation efficiency of up to a factor three but implementing adaptive ensemble applications is challenging due to the complexity of the required algorithms.

## RADICAL-EnsembleToolkit (EnTK)

RADICAL-Cybertools offers [RADICAL-EnsembleToolkit (EnTK)](https://radicalentk.readthedocs.io/en/stable/index.html), a workflow engine specifically designed to support the execution of (adaptive) ensemble applications at scale on HPC platforms. EnTK allows users to separate adaptive logic and simulation/analysis code, while abstracting away from the users issues of resource management and resource management and runtime execution coordination. 

EnTK exposes a simple application programming interface (API), implemented in Python and with two (Pythonic) collections of objects and three classes:
* Set: contains objects that have no relative order with each other
* Sequence/List: contains objects that have a linear order, i.e. object 'i' depends on object 'i-1'
* Task: description of executing kernel
* Stage: set of Tasks, i.e. all tasks of a stage may execute concurrently
* Pipeline: sequence of Stages, i.e., Stage 2 may only commence after Stage 1 completes

Thus, in EnTK an ensemble application is described as a set of pipelines, in which each pipiline has a sequence/list of stages, and each stage has a set of tasks. The following figure shows an example of ensemble application in which tasks are represented by arrows:

![](images/pst-model.jpg "Application model of RADICAL-EnsembleToolkit: pipeline, stage and task")

## Preparing the Execution Environment

As we will be executing this tutorial within a Jupyter notebook, we install EnTK directly into the notebook kernel via `pip`, but we could also equally use `conda`.

<div class="alert alert-block alert-info"><b>Note:</b> We "mute" the output of the cell with `%%capture capt` to not pollute the notebook output.</div>

Depending on the execution environment, you may want to use the Spack package or the container provided by Exaworks SDK, or load the module provided by the administrators of the high performance computing (HPC) platform on which you are executing this tutorial.

In [None]:
%%capture capt

%pip install radical.entk

Currently, EnTK and its runtime system RADICAL-Pilot require a RabbitMQ and MongoDB server. Those serves need to be deployed and made publicaly available before using EnTK. Here we set the access parameters for the servers:

In [None]:
import os

rmq_host = os.environ.get('RMQ_SERVER')
rmq_port = os.environ.get('RMQ_PORT')
rmq_name = os.environ.get('RMQ_NAME')
rmq_pswd = os.environ.get('RMQ_PSWD')

mdb_host = os.environ.get('MDB_SERVER')
mdb_port = os.environ.get('MDB_PORT')
mdb_name = os.environ.get('MDB_NAME')
mdb_pswd = os.environ.get('MDB_PSWD')
mdb_dtbs = os.environ.get('MDB_DTBS')

%env RADICAL_PILOT_DBURL=mongodb://$mdb_name:$mdb_pswd@$mdb_host:$mdb_port/$mdb_dtbs

# Example 1: Ensemble of Simulation Pipelines

The following example application shows the execution of a simple ensemble of simulations. Each ensemble member is in itself a pipeline of three different stages:

1. generate a random seed as input data
2. evolve a model based on that input data via a set of ensembles
3. derive a common metric across the model results

Similar patterns are frequently found in molecular dynamics simulation workflows. For the purpose of this tutorial, the stages are:

- random seed  : create a random number
- evolve model : N tasks computing n'th power of the input
- common metric: sum over all 'model' outputs

The final results are then staged back and printed on STDOUT.

The following image offers a representation of the application we are going to code and then run for Example 1.

![](images/entk_example1_app.png "Example 1: Ensemble application with 2 pipelines.")

The two pipelines execute concurrently and, as per EnTK API definitions, each stage inside each pipeline executes sequentially. Importantly, when a stage contains **multiple** tasks, all those tasks can execute concurrently, assuming that enough resources are available. Given a set of resources, EnTK always executes the ensemble application with the highest possible degree of concurrency but, when not enough resources are available, the tasks of a stage may be executed sequentially. All this is transparent to the user that is left free to focus on the ensemble algorithm without having to deal with parallelism and resource management.

First we import EnTK Python module in our application so to be able to use its API.

In [None]:
import radical.entk as re

The following function generates a single simulation pipeline, i.e., a new ensemble member. The pipeline structure consisting of three steps as described above.

In [None]:
def generate_pipeline(uid):

    # all tasks in this pipeline share the same sandbox
    sandbox = uid

    # first stage: create 1 task to generate a random seed number
    t1 = re.Task()
    t1.executable = '/bin/sh'
    t1.arguments  = ['-c', 'od -An -N1 -i /dev/random']
    t1.stdout     = 'random.txt'
    t1.sandbox    = sandbox

    s1 = re.Stage()
    s1.add_tasks(t1)

    # second stage: create 10 tasks to compute the n'th power of that number
    s2 = re.Stage()
    n_simulations = 10
    for i in range(n_simulations):
        t2 = re.Task()
        t2.executable = '/bin/sh'
        t2.arguments  = ['-c', 'echo "$(cat random.txt) ^ %d" | bc' % i]
        t2.stdout     = 'power.%03d.txt' % i
        t2.sandbox    = sandbox
        s2.add_tasks(t2)

    # third stage: compute sum over all powers
    t3 = re.Task()
    t3.executable = '/bin/sh'
    t3.arguments  = ['-c', 'cat power.*.txt | paste -sd+ | bc']
    t3.stdout     = 'sum.txt'
    t3.sandbox    = sandbox

    # download the result while renaming to get unique files per pipeline
    t3.download_output_data = ['sum.txt > %s.sum.txt' % uid]

    s3 = re.Stage()
    s3.add_tasks(t3)

    # assemble the three stages into a pipeline and return it
    p = re.Pipeline()
    p.add_stages(s1)
    p.add_stages(s2)
    p.add_stages(s3)

    return p

Now we write the ensemble application. We create an EnTK's application manager which executes our ensemble.

In [None]:
appman = re.AppManager(hostname=rmq_host, 
                       port=rmq_port, 
                       username=rmq_name,
                       password=rmq_pswd)

We assign resource request description to the application manager using three mandatory keys: target resource, walltime, and number of cpus:

In [None]:
appman.resource_desc = {
    'resource': 'local.localhost',
  # 'resource': 'local.localhost_flux',
    'walltime': 10,
    'cpus'    : 2
}

We create an ensemble of n simulation pipelines:

In [None]:
n_pipelines = 10
ensemble = set()
for cnt in range(n_pipelines):
    ensemble.add(generate_pipeline(uid='pipe.%03d' % cnt))

We assign the workflow to the application manager, then run the ensemble and wait for completion:

In [None]:
appman.workflow = ensemble
appman.run()

We check results which were staged back

In [None]:
for cnt in range(n_pipelines):
    data = open('pipe.%03d.sum.txt' % cnt).read()
    result = int(data)
    print('%3d -- %25d' % (cnt, result))