# Tutorial

In this tutorial, we will learn how to deploy `RiD-kit` with `dflow` and `kubenete` and run a simple case of alanine dipeptide.


## Installation of `dflow` and `rid-kit`
With the power of `dflow`, users can easily minitor the whole workflow of RiD tasks and dispatch their tasks to various computational resources. Before you use it, you should have `dflow` installed on your host computer (your PC or a remote server). 

It it necessary to emphasize that, the computational nodes and monitor nodes are seperated. With `dflow`, you can deploy `dflow` and `rid` on your PC and achieve expensive computation on other resources (like `Slurm` and Cloud Platform) without any further effort.

Instructions of `dflow` installation are peovided in detail on its [Github page](https://github.com/deepmodeling/dflow#Installdflow). Prerequisites of `dflow` usage are `Docker` and `Kubenete`, where their main pages ([Docker](https://docs.docker.com/engine/install/) & [Kubenete](https://kubernetes.io/docs/tasks/tools/)) include how you can install them. Besides, `dflow` repo also provides with easy-install shell scripts on [dflow/scripts](https://github.com/deepmodeling/dflow/tree/master/scripts) to install `Docker` & `Kubenete` & `dflow` and make port-forwarding.

Here, we try to use the easy scripts provided by `dflow` to install these dependencies. Download scripts at [dflow/scripts](https://github.com/deepmodeling/dflow/tree/master/scripts) and run  **with the privileges of the User**:


> **Note:**
> 
> Don't try to run `minikube` with root privileges, otherwise an error may occur:
> 
> `Exiting due to DRV_AS_ROOT: The "docker" driver should not be used with root privileges.`

In [1]:
# for users in China, please use `-cn.sh` version to accelerate the installation process.
! chmod 755 install-linux-cn.sh
! ./install-linux-cn.sh

[INFO] Found docker executable at /usr/bin/docker
[INFO] Found minikube binary at /usr/local/bin/minikube
[INFO] Minikube has been started
--2022-08-05 21:05:16--  https://raw.githubusercontent.com/deepmodeling/dflow/master/manifests/quick-start-postgres-stable-cn.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... ^C


Otherwise, you can run `pip install pydflow` and follow its instructions manually.

A further step to  configure `argo` service is to run:

In [None]:
! kubectl create ns argo
! kubectl apply -n argo -f https://raw.githubusercontent.com/deepmodeling/dflow/master/manifests/quick-start-postgres.yaml

Now you should have installed `Docker` and `minikube` properly. Run commands to check their status. For `minikube`, you should wait util all servers keep `running`. This may take a couple of minutes.

In [1]:
! minikube status

minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured



### Installation of RiD-kit

Now we install `rid-kit` on the host machine. To meet the minimum requirments, the prerequisites of third-party python package should be installed:

* tensorflow-cpu or gpu
* mdtraj
* numpy 
* scikit-learn

which are also listed in `rid-kit/requirements.txt`. Then change directory to `rid-kit` repo and run:

In [9]:
# the rid-kit repo path
! cd .. && pip install .

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /mnt/vepfs/yanze/dflow_project/rid-kit
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: rid
  Building wheel for rid (setup.py) ... [?25ldone
[?25h  Created wheel for rid: filename=rid-1.1.dev195+gb743cd0-py3-none-any.whl size=77609 sha256=9f325b928e7c8f74cab0bda043a026949c5a427e0a7db1832043b42667f31707
  Stored in directory: /home/yanze/.cache/pip/wheels/ec/59/93/934a1323bb160c606bf1000cac32eae183a198305cb06cb1b0
Successfully built rid
Installing collected packages: rid
  Attempting uninstall: rid
    Found existing installation: rid 1.1.dev195+gb743cd0
    Uninstalling rid-1.1.dev195+gb743cd0:
      Successfully uninstalled rid-1.1.dev195+gb743cd0
Successfully installed rid-1.1.dev195+gb743cd0


## Configuration of Computational Environment

In RiD workflow, `dflow` helps send computation tasks to resources with peoper environment configured. 

There are four main modules and several workflow steps in RiD procedures and each module or step needs different environments:

* `Exploration/Sampling`:  `Gromacs`, `PLUMED2` modified by `DeepFE.cpp`, `Tensorflow` C++ interface. (prefer GPU)
* `Selection`:  `Tensroflow` Python interface.
* `Labeling`:  `Gromacs`, `PLUMED2`. (prefer GPU)
* `Training`:  `Tensroflow` Python interface. (prefer GPU).
* `Workflow steps`:  `Python`.

`dflow` supports different resources including `Slurm` clusters, `K8S` local machines and `Cloud Server`.

* For `Slurm`, configure computational environments on your `Slurm` following the instructions of installation. With `dflow`, `rid-kit` send tasks to `Slurm` nodes from the host machines remotely without manually logging in the cluster.
* For local resources, just use the docker images we have built. No further manual configuration needed. We also provide `Dockerfile` of our images to enable flexible modification.
* For `Cloud Server`, like `Lebesgue`, use public images and no further manual configuration needed.

## Prepare machine configuration JSON.

`rid-kit` uses `JSON` file to manage resources. In `machine.json`, define your own `resources` and dispatch `tasks` to them. 

Generally, we would like to run low-cost tasks on cpu nodes or locally and submit high-cost tasks to `Slurm` or `Clouds`. So a `machine.json` may look like:

```JSON
{
    "resources": {
        "local_k8s": {
            "template_config" : {
                "image": "dp-rid-dflow:tf262-pytorch1.11.0-cuda11.3", 
                "image_pull_policy" : "IfNotPresent"
            }
        },
        "remote_slurm": {
            "executor": {
                "type": "slurm",
                "host": "",
                "port": 22,
                "username": "",
                "password": "",
                "header": [
                    "#!/bin/bash",
                    "#SBATCH --partition GPU_2080Ti",
                    "#SBATCH -N 1",
                    "#SBATCH --ntasks-per-node 8",
                    "#SBATCH -t 120:0:0",
                    "#SBATCH --gres=gpu:1",
                    "source your_rid_env"
                ]
            }
        }
    },

    "tasks": {
        "prep_exploration_config": "local_k8s",
        "run_exploration_config": "remote_slurm",
        "prep_label_config": "local_k8s",
        "run_label_config": "remote_slurm",
        "prep_select_config": "local_k8s",
        "run_select_config": "local_k8s",
        "prep_data_config": "local_k8s",
        "run_train_config": "remote_slurm",
        "workflow_steps_config": "local_k8s"
    }
}
```

* In key `resources`, you define your own resources types. Resource names and their numbers are custom.
* In key `tasks`, you distribute resources you have defined to tasks of RiD. Do not change task names in `tasks` as they are fixed in codes.

## Get Started!

Assume you have learn the basic knowledge of reinforced dynamics which we won't describe again here. 

Users can monitor workflows from browser UI. To enable that, you should forward ports of `argo` and `minio`. These could be achieved by `rid port-forward`.

In [14]:
! rid port-forward

2022-08-06 19:34:09 | INFO | rid.entrypoint.server | Port "agro-server" has been launched and running.
2022-08-06 19:34:09 | INFO | rid.entrypoint.server | Port "minio-server" has been launched and running.
2022-08-06 19:34:09 | INFO | rid.entrypoint.server | Port "minio-ui" has been launched and running.


In this case, we try to explore the phase space of alanine dipeptide. Prepare your initial conformation files in `.gro` format, topology file in `.top` format and configuration file `rid.json`. For convenience, we have prepared on at `rid/template/rid.json`. 
Remember also provide your own forcefield files. Collect all these files into a directory and feed its path to `rid-kit` by flag `-i`.

A minimum case was prepared in `rid-kit/tests/data/000`. Then run `rid submit`:

In [13]:
! rid submit -i ../tests/data/000 -c ../rid/template/rid.json -m /mnt/vepfs/yanze/dflow_project/test_dflow/template/machine.json

2022-08-06 19:29:53 | INFO | rid.entrypoint.main | Preparing RiD ...
Workflow has been submitted (ID: reinforced-dynamics-cx4rd)
2022-08-06 19:30:17 | INFO | rid.entrypoint.main | The task is displayed on "https://127.0.0.1:2746".
2022-08-06 19:30:17 | INFO | rid.entrypoint.main | Artifacts (Files) are listed on "https://127.0.0.1:9001".


INFO indicates that this task has been submitted succussfully. Record this workflow ID as we may use it later.

Visit the `url` given by the last two lines, all workflows and corresponding files are listed on UI.

Command lines are also supported. Run `rid ls` to list your workflows and their status.

In [15]:
! rid ls

2022-08-06 19:35:45 | INFO | rid.entrypoint.cli | 

	Reinforced Dynamics Workflow

NAME                        STATUS    AGE   DURATION   PRIORITY
reinforced-dynamics-cx4rd   Running   5m    5m         0
reinforced-dynamics-pi65n   Running   8h    8h         0
reinforced-dynamics-yk3wl   Failed    8h    2m         0
reinforced-dynamics-bsc7j   Failed    9h    31m        0



`rid-kit` is based on `dflow`, `argo` and `minikube`. So further complex and flexible managements of workflows can be achieved by their command lines. like `kubectl get pods -n argo` and `argo show`.

For failed tasks, you may want to remove them or resubmit them from the failure steps.

For `remove`:

In [None]:
# rid rm task-ID
! rid rm reinforced-dynamics-bsc7j 

For `resubmit` to modify and continue workflow:

In [None]:
! rid resubmit -i your_dir -c path_to_rid.json -m path_to_machine.json Workflow-ID