# Recording Data Provenance with ``aiida-gromacs``

This tutorial follows the first six steps of Justin Lemkul’s [lysozyme tutorial](http://www.mdtutorials.com/gmx/lysozyme/). We will not explain each individual step as this can be found on Justin’s webpage, but we will link to each page and show the ``aiida-gromacs`` equivalant command.

<center><img src="lysozyme_files/images/lemkul.png" alt="Lysozyme structure" width="50%" /></center>

Please also note the slight differences in commands between the tutorial and that by Justin Lemkul is simply down to the way we are recording provenance, which requires non-interactive input into the gromacs tools.

Also at each of the below steps you should run ``verdi`` (the command line interface utility for AiiDA) to view the status of the submitted process before moving onto the next step, you do this by:

In [10]:
! verdi process list -a

[22mPK    Created    Process label    Process State    Process status
----  ---------  ---------------  ---------------  ----------------[0m
[22m
Total results: 0
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: never[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In this notebook AiiDA, the aiida-gromacs plugin and dependent tools are pre-installed. Here's a brief description of the tools used; 
* AiiDA uses a [PostgreSQL](https://www.postgresql.org) database to store all data produced and the links between input and output files for each command run. Each submitted command is termed a process in AiiDA. 
* Communication between submitted processes are handled with [RabbitMQ](https://www.rabbitmq.com/) and submitted processes are handled with a deamon process that runs in the background. 
* ``aiida-gromacs`` requires an installation of [GROMACS](https://www.gromacs.org/) and the path to where it is installed.

1. We will start from the [pbd2gmx](http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html)  step of Justin’s tutorial, with the ``aiida-gromacs`` equivalent python wrapper ``gmx_pdb2gmx``:

In [11]:
! gmx_pdb2gmx -f lysozyme_files/inputs/1AKI_clean.pdb -ff oplsaa -water spce -o 1AKI_forcefield.gro -p 1AKI_topology.top -i 1AKI_restraints.itp



A successfully finished process will exit with code [0]. 

In [12]:
! verdi process list -a

[22m  PK  Created    Process label       Process State    Process status
----  ---------  ------------------  ---------------  ----------------
   5  3s ago     Pdb2gmxCalculation  ⏹ Finished [0][0m
[22m
Total results: 1
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 13:50:38 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


2. & 3. Next we will create the [box and then solvate](http://www.mdtutorials.com/gmx/lysozyme/03_solvate.html):

In [13]:
! gmx_editconf -f 1AKI_forcefield.gro -center 0 -d 1.0 -bt cubic -o 1AKI_newbox.gro



In [14]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
   5  11s ago    Pdb2gmxCalculation   ⏹ Finished [0]
  14  3s ago     EditconfCalculation  ⏹ Finished [0][0m
[22m
Total results: 2
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 13:50:46 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In [15]:
! gmx_solvate -cp 1AKI_newbox.gro -cs spc216.gro -p 1AKI_topology.top -o 1AKI_solvated.gro



In [16]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
   5  21s ago    Pdb2gmxCalculation   ⏹ Finished [0]
  14  13s ago    EditconfCalculation  ⏹ Finished [0]
  21  4s ago     SolvateCalculation   ⏹ Finished [0][0m
[22m
Total results: 3
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 3s ago (at 13:50:55 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


4. & 5. Then add [ions](http://www.mdtutorials.com/gmx/lysozyme/04_ions.html) to neutralise the system after preprocessing the topology:

In [17]:
! gmx_grompp -f lysozyme_files/inputs/ions.mdp -c 1AKI_solvated.gro -p 1AKI_topology.top -o 1AKI_ions.tpr



In [18]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
   5  36s ago    Pdb2gmxCalculation   ⏹ Finished [0]
  14  28s ago    EditconfCalculation  ⏹ Finished [0]
  21  19s ago    SolvateCalculation   ⏹ Finished [0]
  30  3s ago     GromppCalculation    ⏹ Finished [0][0m
[22m
Total results: 4
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 13:51:11 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In [19]:
! gmx_genion -s 1AKI_ions.tpr -p 1AKI_topology.top -pname NA -nname CL -neutral true -o 1AKI_solvated_ions.gro



In [20]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
   5  44s ago    Pdb2gmxCalculation   ⏹ Finished [0]
  14  36s ago    EditconfCalculation  ⏹ Finished [0]
  21  27s ago    SolvateCalculation   ⏹ Finished [0]
  30  11s ago    GromppCalculation    ⏹ Finished [0]
  38  3s ago     GenionCalculation    ⏹ Finished [0][0m
[22m
Total results: 5
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 13:51:19 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


6. & 7. Then [minimise](http://www.mdtutorials.com/gmx/lysozyme/05_EM.html) the system after preprocessing the topology:

In [21]:
! gmx_grompp -f lysozyme_files/inputs/min.mdp -c 1AKI_solvated_ions.gro -p 1AKI_topology.top -o 1AKI_minimised.tpr



In [22]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
   5  55s ago    Pdb2gmxCalculation   ⏹ Finished [0]
  14  47s ago    EditconfCalculation  ⏹ Finished [0]
  21  38s ago    SolvateCalculation   ⏹ Finished [0]
  30  22s ago    GromppCalculation    ⏹ Finished [0]
  38  14s ago    GenionCalculation    ⏹ Finished [0]
  47  3s ago     GromppCalculation    ⏹ Finished [0][0m
[22m
Total results: 6
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 2s ago (at 13:51:30 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


In [23]:
! gmx_mdrun -s 1AKI_minimised.tpr -c 1AKI_minimised.gro -e 1AKI_minimised.edr -g 1AKI_minimised.log -o 1AKI_minimised.trr



In [25]:
! verdi process list -a

[22m  PK  Created    Process label        Process State    Process status
----  ---------  -------------------  ---------------  ----------------
   5  1m ago     Pdb2gmxCalculation   ⏹ Finished [0]
  14  1m ago     EditconfCalculation  ⏹ Finished [0]
  21  59s ago    SolvateCalculation   ⏹ Finished [0]
  30  43s ago    GromppCalculation    ⏹ Finished [0]
  38  35s ago    GenionCalculation    ⏹ Finished [0]
  47  24s ago    GromppCalculation    ⏹ Finished [0]
  54  15s ago    MdrunCalculation     ⏹ Finished [0][0m
[22m
Total results: 7
[0m
[34m[1mReport[0m: [22mlast time an entry changed state: 5s ago (at 13:51:48 on 2024-06-12)[0m
[34m[1mReport[0m: [22mChecking daemon load... [0m[32m[1mOK[0m
[34m[1mReport[0m: [22mUsing 0% of the available daemon worker slots.[0m


We can view the provenance graph of these processes, which shows how inputs and outputs of each process are connected to other processes. To save the provenance graph of all finished processes, replace the primary key value <PK> in the command below with that of the most recently run process.

In [26]:
! verdi node graph generate 54

[32m[1mSuccess: [0m[22mOutput written to `54.dot.pdf`[0m


The graph should look something like this:

<center><img src="lysozyme_files/images/54.dot.png" alt="Lysozyme structure" width="90%" /></center>

To view all the currently run commands in each process, the input files used in each command and the output files produced from each command, we can use:

In [27]:
! verdi data provenance show


Step 1.
	command: gmx pdb2gmx -f 1AKI_clean.pdb -ff oplsaa -water spce -o 1AKI_forcefield.gro -p 1AKI_topology.top -i 1AKI_restraints.itp 
	executable: gmx
	input files: 
		1AKI_clean.pdb
	output files: 
		pdb2gmx.out
		1AKI_forcefield.gro
		1AKI_topology.top
		1AKI_restraints.itp

Step 2.
	command: gmx editconf -f 1AKI_forcefield.gro -center 0 -d 1.0 -bt cubic -o 1AKI_newbox.gro 
	executable: gmx
	input files: 
		1AKI_forcefield.gro <-- from Step 1.
	output files: 
		editconf.out
		1AKI_newbox.gro

Step 3.
	command: gmx solvate -cp 1AKI_newbox.gro -cs spc216.gro -p 1AKI_topology.top -o 1AKI_solvated.gro 
	executable: gmx
	input files: 
		1AKI_topology.top <-- from Step 1.
		1AKI_newbox.gro <-- from Step 2.
	output files: 
		solvate.out
		1AKI_solvated.gro
		1AKI_topology.top

Step 4.
	command: gmx grompp -f ions.mdp -c 1AKI_solvated.gro -p 1AKI_topology.top -o 1AKI_ions.tpr 
	executable: gmx
	input files: 
		1AKI_solvated.gro <-- from Step 3.
		1AKI_topology.top <-- from Step 3.
		io

At the end of a project, the AiiDA database can be saved as an AiiDA archive file (sqlite/zip format) for long term storage and to share your data and provenance with others. 

In [28]:
! verdi archive create --all lysozyme_tutorial.aiida

[34m[1mReport[0m: 
Archive Parameters
--------------------  -----------------------
Path                  lysozyme_tutorial.aiida
Version               main_0001
Compression           6

Inclusion rules
----------------------------  -----
Computers/Nodes/Groups/Users  All
Computer Authinfos            False
Node Comments                 True
Node Logs                     True

[34m[1mReport[0m: Validating Nodes                                           
[34m[1mReport[0m: Creating archive with:
---------  --
users       1
computers   1
nodes      62
links      70
---------  --
[34m[1mReport[0m: Finalizing archive creation...                             
[34m[1mReport[0m: Archive created successfully
[32m[1mSuccess: [0m[22mwrote the export archive file to lysozyme_tutorial.aiida[0m


We hope to share further tutorials on loading, querying and displaying data from AiiDA archives. Watch this space!