# Coarse-grained MD Simulation Data Provenance with ``aiida-gromacs``

## Introduction

### Who is this tutorial for?
This tutorial is aimed at molecular dynamics simulators who want to keep track of each step used to build molecular systems and perform simulations of biomolecular systems. 

### Why not just use a script to keep a record of simulation steps?
As simulation protocols vary between practictioners, it is often difficult to ascertain how a simulation was performed to generate the final dynamics of biomolecules

### How can we track complex simulation protocols in a FAIR way?
In this tutorial, we will setup and run a coarse-grained simulation of the active state GLP-1R protein via the ``aiida-gromacs`` plugin. We will show how commands are written and saved in realtime.
The provenance of each command will be recorded in realtime and 


## Software requirements

* [``martinize2``](https://pypi.org/project/vermouth/) is used to convert from atomistic to coarse-grained structures. 

* A modified [``insane``](https://github.com/Tsjerk/Insane) script is used to build the coarse-grained system, which is available here XXX.

* [GROMACS](https://www.gromacs.org/) is required to perform molecular dynamics simulations.

* [``aiida-gromacs``](https://aiida-gromacs.readthedocs.io/en/latest/user_guide/installation.html#plugin-installation) is used to keep track of all the commands used to setup and perform the simulation.

## AiiDA under the hood

We use ``aiida-gromacs`` to track commands run on the CLI, which is a plugin for AiiDA software. Here's a brief description of what's going on under the hood when running ``aiida-gromacs``; 
* AiiDA uses a [PostgreSQL](https://www.postgresql.org) database to store all data produced and the links between input and output files for each command run. Each submitted command is termed a process in AiiDA. 
* Communication between submitted processes are handled with [RabbitMQ](https://www.rabbitmq.com/) and submitted processes are handled with a deamon process that runs in the background. 
* AiiDA has a built-in CLI utility called ``verdi``, which we will use to view the status of the submitted process before moving onto the next step, you do this with the following command:

In [None]:
! verdi process list -a

A successfully finished process will exit with code [0]. 

## Building a coarse-grained system from an atomic structure

We have provided a starting atomic structure of the active state of the GPCR protein. 

TODO: Add details for how starting structure was created.

1. We use Martinize2 to coarse-grain the atomistic structure and produce a GROMACS topology file

In [None]:
%cd GLP1R_coarse-grained_files/martinize
! genericMD --code martinize2@localhost --command "-f GPCRdb_active_refined_opm.pdb -o GPCRdb_active_refined_opm.top -x GPCRdb_active_refined_opm.cg.pdb -ff martini3001 -nt -elastic -p backbone -maxwarn 1 -mutate HSD:HIS -mutate HSP:HIH -ignh -cys auto -scfix" \
--inputs GPCRdb_active_refined_opm.pdb \
--outputs GPCRdb_active_refined_opm.top --outputs GPCRdb_active_refined_opm.cg.pdb --outputs molecule_0.itp

2. Next, we use a custom insane python script to embed the protein into a lipid bilayer and solvate the system

In [None]:
! cp GPCRdb_active_refined_opm.cg.pdb GPCRdb_active_refined_opm.top molecule_0.itp ../insane
%cd ../insane
! genericMD --code python@localhost --command "insane_custom.py -f GPCRdb_active_refined_opm.cg.pdb -o solvated_insane.gro -p system.top -pbc rectangular -box 18,18,17 -l POPC:25 -l DOPC:25 -l POPE:8 -l DOPE:7 -l CHOL:25 -l DPG3:10 -u POPC:5 -u DOPC:5 -u POPE:20 -u DOPE:20 -u CHOL:25 -u POPS:8 -u DOPS:7 -u POP2:10 -sol W" \
--inputs insane_custom.py --inputs GPCRdb_active_refined_opm.cg.pdb \
--outputs solvated_insane.gro --outputs system.top

3. We can include the `insane_custom.py` script in our provenance database using:

In [None]:
! genericMD --code bash@localhost --command 'cat insane_custom.py' --inputs insane_custom.py --outputs insane_custom.py

4. Once the topology file is created, we need to include all the itp files that contain the force field parameters used to describe entity interactions. We use the `sed` command to edit the `system.top` file directly on the command-line and we submit this commad via `aiida-gromacs` as with the previous commands

In [None]:
sed_command1='sed -i -e "1 s/^/#include \\"toppar\/martini_v3.0.0.itp\\"\\n#include \\"toppar\/martini_v3.0.0_ions_v1.itp\\"\\n#include \\"toppar\/martini_v3.0.0_solvents_v1.itp\\"\\n#include \\"toppar\/martini_v3.0.0_phospholipids_v1.itp\\"\\n#include \\"martini_v3.0_sterols_v1.0.itp\\"\\n#include \\"POP2.itp\\"\\n#include \\"molecule_0.itp\\"\\n#include \\"gm3_final.itp\\"\\n/" '\
'-e "s/Protein/molecule_0/" '\
'-e "s/#include \\"martini.itp\\"/\\n/" system.top'
! genericMD --code bash@localhost --command '{sed_command1}' --inputs system.top --outputs system.top

5. We also need to edit the `molecule_0.itp` file generated from the Martinze2 step to include positional restraints on the coarse-grained beads

In [None]:
sed_command2='sed -i -e "s/1000 1000 1000/POSRES_FC    POSRES_FC    POSRES_FC/g" '\
'-e "s/#ifdef POSRES/#ifdef POSRES\\n#ifndef POSRES_FC\\n#define POSRES_FC 1000.00\\n#endif/" '\
'molecule_0.itp'
! genericMD --code bash@localhost --command '{sed_command2}' --inputs molecule_0.itp --outputs molecule_0.itp

6. Ions need to be added to the system and we can construct the GROMACS `.tpr` binary file that contains the system configuration, topology and input parameters for the next step. We use the `gmx_grompp` command (note the underscore), which is wrapper command to run `gmx` via `aiida-gromacs`

In [None]:
! cp molecule_0.itp solvated_insane.gro system.top ../gromacs
%cd ../gromacs
! gmx_grompp -f ions.mdp -c solvated_insane.gro -p system.top -o ions.tpr

7. The `gmx_genion` command is then used to add the ions to the system. As the `genion` command requires interactive user inputs, we can provide these in as an additional text file via the `--instructions` argument. Each interactive response can be provided on a new line in the input text file. In this example, we replace solvent `W` with ions

In [None]:
! gmx_genion -s ions.tpr -o solvated_ions.gro -p system.top -pname NA -nname CL -conc 0.15 -neutral true --instructions inputs_genion.txt

7. Lastly, we will use a `gmx_make_ndx` to create new index groups for the membrane and solute consituents

In [None]:
! gmx_make_ndx -f solvated.gro -o index.ndx --instructions inputs_index.txt

We can view the provenance graph of these processes, which shows how inputs and outputs of each process are connected to other processes. To save the provenance graph of all finished processes, replace the primary key value <PK> in the command below with that of the most recently run process.

In [None]:
! verdi node graph generate 705

At the end of a project, the AiiDA database can be saved as an AiiDA archive file (sqlite/zip format) for long term storage and to share your data and provenance with others. 

In [None]:
! verdi archive create --all archive.aiida

We hope to share further tutorials on loading, querying and displaying data from AiiDA archives. Watch this space!