# Recording Data Provenance with ``aiida-gromacs``

This tutorial follows the first six steps of Justin Lemkul’s [lysozyme tutorial](http://www.mdtutorials.com/gmx/lysozyme/). We will not explain each individual step as this can be found on Justin’s webpage, but we will link to each page and show the ``aiida-gromacs`` equivalant command.

Please also note the slight differences in commands between the tutorial and that by Justin Lemkul is simply down to the way we are recording provenance, which requires non-interactive input into the gromacs tools.

Also at each of the below steps you should run ``verdi`` (the command line interface utility for AiiDA) to view the status of the submitted process before moving onto the next step, you do this by:

In [2]:
! verdi process list -a

[22mPK    Created    Process label    Process State    Process status
----  ---------  ---------------  ---------------  ----------------[0m
[22m
Total results: 0
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 10h ago (at 21:08:53 on 2023-10-10)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In this notebook AiiDA, the aiida-gromacs plugin and dependent tools are pre-installed. Here's a brief description of the tools used; 
* AiiDA uses a [PostgreSQL](https://www.postgresql.org) database to store all data produced and the links between input and output files for each command run. Each submitted command is termed a process in AiiDA. 
* Communication between submitted processes are handled with [RabbitMQ](https://www.rabbitmq.com/) and submitted processes are handled with a deamon process that runs in the background. 
* ``aiida-gromacs`` requires an installation of [GROMACS](https://www.gromacs.org/) and the path to where it is installed.

1. We will start from the [pbd2gmx](http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html)  step of Justin’s tutorial, with the aiida-gromacs equivalent python wrapper ``gmx_pdb2gmx``:

In [3]:
! gmx_pdb2gmx -f gromacs_files/1AKI_clean.pdb -ff oplsaa -water spce -o 1AKI_forcefield.gro -p 1AKI_topology.top -i 1AKI_restraints.itp



A successfully finished process will exit with code [0]. 

In [4]:
! verdi process list -a

[22m  PK  Created    Process label       Process State    Process status
----  ---------  ------------------  ---------------  ----------------
 611  4s ago     Pdb2gmxCalculation  ⏹ Finished [0][0m
[22m
Total results: 1
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 3s ago (at 07:51:32 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


2. Next we will create the [box and then solvate](http://www.mdtutorials.com/gmx/lysozyme/03_solvate.html):

In [5]:
! gmx_editconf -f 1AKI_forcefield.gro -center 0 -d 1.0 -bt cubic -o 1AKI_newbox.gro



In [6]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
 611  12s ago    Pdb2gmxCalculation   ⏹ Finished [0]
 619  3s ago     EditconfCalculation  ⏹ Finished [0][0m
[22m
Total results: 2
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 07:51:41 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In [7]:
! gmx_solvate -cp 1AKI_newbox.gro -cs spc216.gro -p 1AKI_topology.top -o 1AKI_solvated.gro



In [8]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
 611  18s ago    Pdb2gmxCalculation   ⏹ Finished [0]
 619  9s ago     EditconfCalculation  ⏹ Finished [0]
 625  3s ago     SolvateCalculation   ⏹ Finished [0][0m
[22m
Total results: 3
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 07:51:48 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


3. & 4. Then add [ions](http://www.mdtutorials.com/gmx/lysozyme/04_ions.html) to neutralise the system after preprocessing the topology:

In [9]:
! gmx_grompp -f gromacs_files/ions.mdp -c 1AKI_solvated.gro -p 1AKI_topology.top -o 1AKI_ions.tpr



In [10]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
 611  25s ago    Pdb2gmxCalculation   ⏹ Finished [0]
 619  16s ago    EditconfCalculation  ⏹ Finished [0]
 625  9s ago     SolvateCalculation   ⏹ Finished [0]
 633  2s ago     GromppCalculation    ⏹ Finished [0][0m
[22m
Total results: 4
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 1s ago (at 07:51:55 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In [11]:
! gmx_genion -s 1AKI_ions.tpr -p 1AKI_topology.top -pname NA -nname CL -neutral true -o 1AKI_solvated_ions.gro



In [12]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
 611  31s ago    Pdb2gmxCalculation   ⏹ Finished [0]
 619  21s ago    EditconfCalculation  ⏹ Finished [0]
 625  15s ago    SolvateCalculation   ⏹ Finished [0]
 633  8s ago     GromppCalculation    ⏹ Finished [0]
 639  2s ago     GenionCalculation    ⏹ Finished [0][0m
[22m
Total results: 5
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 07:52:00 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


5. & 6. Then [minimise](http://www.mdtutorials.com/gmx/lysozyme/05_EM.html) the system after preprocessing the topology:

In [13]:
! gmx_grompp -f gromacs_files/min.mdp -c 1AKI_solvated_ions.gro -p 1AKI_topology.top -o 1AKI_minimised.tpr



In [14]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
 611  37s ago    Pdb2gmxCalculation   ⏹ Finished [0]
 619  28s ago    EditconfCalculation  ⏹ Finished [0]
 625  21s ago    SolvateCalculation   ⏹ Finished [0]
 633  14s ago    GromppCalculation    ⏹ Finished [0]
 639  9s ago     GenionCalculation    ⏹ Finished [0]
 647  2s ago     GromppCalculation    ⏹ Finished [0][0m
[22m
Total results: 6
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 1s ago (at 07:52:07 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In [15]:
! gmx_mdrun -s 1AKI_minimised.tpr -c 1AKI_minimised.gro -e 1AKI_minimised.edr -g 1AKI_minimised.log -o 1AKI_minimised.trr



In [18]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
 611  1m ago     Pdb2gmxCalculation   ⏹ Finished [0]
 619  54s ago    EditconfCalculation  ⏹ Finished [0]
 625  47s ago    SolvateCalculation   ⏹ Finished [0]
 633  40s ago    GromppCalculation    ⏹ Finished [0]
 639  35s ago    GenionCalculation    ⏹ Finished [0]
 647  28s ago    GromppCalculation    ⏹ Finished [0]
 653  22s ago    MdrunCalculation     ⏹ Finished [0][0m
[22m
Total results: 7
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 9s ago (at 07:52:25 on 2023-10-11)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


We can view the provenance graph of these processes, which shows how inputs and outputs of each process are connected to other processes. To save the provenance graph of all finished processes, replace the primary key value <PK> in the command below with that of the most recently run process.

In [19]:
! verdi node graph generate <PK>

[32m[1mSuccess: [0m[22mOutput written to `653.dot.pdf`[0m


At the end of a project, the AiiDA database can be saved as an AiiDA archive file (sqlite/zip format) for long term storage and to share your data and provenance with others. 

In [21]:
! verdi archive create --all lysozyme_tutorial.aiida

[34m[1mReport[0m: 
Archive Parameters
--------------------  -----------------------
Path                  lysozyme_tutorial.aiida
Version               main_0001
Compression           6

Inclusion rules
----------------------------  -----
Computers/Nodes/Groups/Users  All
Computer Authinfos            False
Node Comments                 True
Node Logs                     True

[34m[1mReport[0m: Validating Nodes                                           
[34m[1mReport[0m: Creating archive with:
---------  ---
users        1
computers    2
nodes      137
links       62
---------  ---
[34m[1mReport[0m: Finalizing archive creation...                             
[34m[1mReport[0m: Archive created successfully
[32m[1mSuccess: [0m[22mwrote the export archive file to lysozyme_tutorial.aiida[0m


We hope to share further tutorials on loading, querying and displaying data from AiiDA archives. Watch this space!