In this tutorial, we will learn how to deploy RiD-kit
with your own Kubenetes
enviroment and run a simple case of alanine dipeptide.
With the power of dflow
, users can easily minitor the whole workflow of RiD tasks and dispatch their tasks to various computational resources. Before you use it, you should have dflow
installed on your host computer (your PC or a remote server).
It it necessary to emphasize that, the computational nodes and monitor nodes are seperated. With dflow
, you can deploy dflow
and rid
on your PC and achieve expensive computation on other resources (like Slurm
and Cloud Platform) without any further effort.
Instructions of dflow
installation are provided in detail on its Github page. Prerequisites of dflow
usage are Docker
and Kubenetes
, where their main pages (Docker & Kubenetes) include how you can install them. Besides, dflow
repo also provides with easy-install shell scripts on dflow/scripts to install Docker
& Kubenetes
& dflow
and make port-forwarding.
Instead of using the easy-install scripts, it is recommended to configure your own minikube enviroment
Installation of minikube
is very easy, just follow the page Minikube based on your machine archetecture.
Then use minikube to start a k8s environment. This can be done by just type minikube start
, but since running rid-kit requires relatively huge amount of resources, it is recommended to allocate more resources when executing minikube start
, just as follows.
Note:
Don't try to run
minikube
with root privileges, otherwise an error may occur:
Exiting due to DRV_AS_ROOT: The "docker" driver should not be used with root privileges.
#choose the location of the minikube enviroment
export MINIKUBE_HOME=~/.minikube
#allocate enough memory based on your machine in case of high parallelism
minikube start --cpus 8 --memory 8192mb --kubernetes-version=1.23.9 --image-mirror-country='cn'
# mount the storage path if you are using machine with shared memory storage system, change the minio host path accordingly
minikube start --cpus 8 --memory 8192mb --kubernetes-version=1.23.9 --mount --mount-string="/path_on_your_machine:/data2" --image-mirror-country='cn'
A further step to configure argo
service is to run:
kubectl create ns argo
kubectl apply -n argo -f https://raw.githubusercontent.com/deepmodeling/dflow/master/manifests/quick-start-postgres.yaml
If you want to mount your local path into the minio path, change the minio host path in the yaml file
- hostPath:
path: /data2/minio
type: DirectoryOrCreate
Now you should have installed Docker
and minikube
properly. Run commands to check their status. For minikube
, you should wait util all servers keep running
. This may take a couple of minutes.
# check minikube status
minikube status
# check k8s status in argo namespace
kubectl get pods -n argo
Now we install rid-kit
on the host machine. To meet the minimum requirments, the prerequisites of third-party python package should be installed:
- tensorflow-cpu or gpu
- mdtraj
- numpy
- scikit-learn
- pydflow
- mdtraj
- dpdata
- parmed
- dpdispatcher
- lbg
- matplotlib
which are also listed in rid-kit/requirements.txt
. Then change directory to rid-kit
repo and run:
# the rid-kit repo path
git checkout dflow
pip install setuptools_scm
pip install .
In RiD workflow, dflow
helps send computation tasks to resources with peoper environment configured.
There are four main modules and several workflow steps in RiD procedures and each module or step needs different environments:
Exploration/Sampling
:Gromacs
,PLUMED2
modified byDeepFE.cpp
,Tensorflow
C++ interface. (prefer GPU)Selection
:Tensroflow
Python interface.Labeling
:Gromacs
,PLUMED2
. (prefer GPU)Training
:Tensroflow
Python interface. (prefer GPU).Workflow steps
:Python
.
dflow
supports different resources including Slurm
clusters, K8S
local machines and Cloud Server
.
- For
Slurm
, configure computational environments on yourSlurm
following the instructions in Environment settings. Withdflow
,rid-kit
send tasks toSlurm
nodes from the host machines remotely without manually logging in the cluster. - For local resources, just use the docker images we have built. No further manual configuration needed. We also provide
Dockerfile
of our images to enable flexible modification. - For
Cloud Server
, likeBohrium
, use public images and no further manual configuration needed.
We highly recommend using Bohrium
to do computation since no enviroment installation is required, you only need to use the public image.Specifically, the public images for rid-kit involves the following: registry.dp.tech/public/pkufjhdocker/rid-gmx-exploration:stable
for doing Exploration
, registry.dp.tech/public/pkufjhdocker/rid-gmx-plumed:stable
for doing Labeling
, registry.dp.tech/public/pkufjhdocker/rid-gmx-tf:stable
for doing Selection
, registry.dp.tech/public/pkufjhdocker/rid-tf-gpu:stable
for Training
and registry.dp.tech/public/pkufjhdocker/rid-tf-cpu:stable
for doing other steps.
rid-kit
uses JSON
file to manage resources. In machine.json
, define your own resources
and dispatch tasks
to them.
- In key
resources
, you define your own resources types. Resource names and their numbers are custom. - In key
tasks
, you distribute resources you have defined to tasks of RiD. Do not change task names intasks
as they are fixed in codes.
Generally, we would like to run low-cost tasks on cpu nodes or locally and submit high-cost tasks to Slurm
or Clouds
.
If submitting to mixed Slurm
and local
environment, a machine.json
example is in rid/template/machine_slurm_k8s.json:
If submitting to Bohrium
cloud platform, a machine.json
example is in rid/template/machine_bohrium.json.
If submitting to mixed Bohrium
and local
platform, a machine.json
example is in rid/template/machine_bohrium_k8s.json.
If submitting to Slurm
environment without k8s
enviroment, a machine.json
example is in rid/template/machine_slurm_local.json:
If you submit jobs to Slurm
enviroment, you will have to compile the computation enviroment on Slurm
server for yourself. While in Bohrium
cloud platform the computation images are all set, no extra compilation is needed. So we recommend to submit jobs to Cloud
enviroment like Bohrium
.
For more details about setting the machine resources, check Rid machine resourse
If you want to use gromacs
as sampler, rid/template/rid_gmx_dih.json is an example.
If you want to use gromacs
with DeepMD
potential as sampler, rid/template/rid_gmx_dp.json is an example.
If you want to use lammps
with DeepMD
potential as sampler, rid/template/rid_lmp_dp.json is an example.
For more details about setting the rid configuration, check Rid configuration
Assume you have learn the basic knowledge of reinforced dynamics which we won't describe again here.
Users can monitor workflows from browser UI. To enable that, you should forward ports of argo
and minio
. These could be achieved by rid port-forward
.
rid port-forward
In this case, we try to explore the phase space of alanine dipeptide. Prepare your initial conformation files in .gro
format, topology file in .top
format and configuration file rid.json
. For convenience, we have prepared on at rid/template/rid.json
.
Remember also provide your own forcefield files. Collect all these files into a directory and feed its path to rid-kit
by flag -i
.
A minimum case was prepared in rid-kit/tests/data/000
. After configurating your machine.json
, then run rid submit
:
rid submit -i ./tests/data/000 -c ./rid/template/rid_gmx_dih.json -m ./rid/template/machine_bohrium_k8s.json
You can specify the workflow name by providing WORKFLOW_ID after "-d", for example:
rid submit -i ./tests/data/000 -c ./rid/template/rid_gmx_dih.json -m ./rid/template/machine_bohrium_k8s.json -d ala-dipeptide-1
Note that the defined workflow-id should only contain lower case alphanumeric character, and specifal character "-".
INFO indicates that this task has been submitted succussfully. Record this workflow ID as we may use it later.
Visit the url
given by the last two lines, all workflows and corresponding files are listed on UI.
Command lines are also supported. Run rid ls
to list your workflows and their status.
! rid ls
rid-kit
is based on dflow
, argo
and minikube
. So further complex and flexible managements of workflows can be achieved by their command lines. like kubectl get pods -n argo
and argo show
.
For failed tasks, you may want to remove them or resubmit them from the failure steps.
For remove
:
# rid rm task-ID
rid rm reinforced-dynamics-bsc7j
For resubmit
to modify and continue workflow:
# suppose the original workflow id is OLD_ID
rid resubmit -i your_dir -c path_to_rid.json -m path_to_machine.json OLD_ID -d NEW_ID
If you want to resubmit from a particular iteration
and step
:
rid resubmit -i your_dir -c path_to_rid.json -m path_to_machine.json OLD_ID -t ITERATION-ID -p STEP-KEY -d NEW_ID
The Workflow-ID
is something like reinforced-dynamics-jq4jn
appeared in the argo UI. If the workflow is archived, its name will appear as somethinig like a8463748-e15a-4f2c-882b-bfd981a76dac
. If you want to resubmit a archived workflow, you have to provide the archived ID rather than the initial ID. The ITERATION-ID
is just n
th iteration the workflow has been executed. The STEP-KEY
in rid includes the following steps: prep-exploration
, run-exploration
, prep-select
, run-select
, prep-label
, run-label
, label-stats
, collect-data
, merge-data
, train
, model-devi
.