# Structural biology - Practical day 02

## Preparation of input files for MD simulations

Document written by [Adrián Diaz](mailto:adrian.diaz@vub.be) & [David Bickel](mailto:david.bickel@vub.be).

The principal information needed to start the simulation is the **protein structure**. In the following steps this structure will be processed, to obtain a realistic representation of the natural enviroment.

1. Parse the input structure
2. Define the periodic boundary box
3. Solvate the system (add water molecules)
4. Add ions (0.15 M NaCl solution)

***

Let's first make sure, you have all needed modules loaded:
* GROMACS/2021.3-foss-2021a
* Panedr/0.7.0-foss-2021a


Now, copy your top-ranked AlphaFold model to the current working directory by executing the following lines in a terminal.

In [2]:
%%bash
# Copy the top ranked AlphaFold2 model from day 1 (edit the command to match your file)
MYPROTEIN="protein"
cp ../../1_alphafold/output/...  $MYPROTEIN.pdb

/home/dbickel/teaching/structbio/day_02/1_prepare_input


***

## Reading the input PDB ##

In this step, the input structure is parsed and split into **two parts**:

1. **Topology** (Containing information about atom types, bonds, angles, ...)
2. **Coordinates** (x-, y-, and z-coordinates of each atom)

Both the topology and the coordinates store different types of information. To run the MD simulation, always a topology and a coordinate file will be needed as input.

> ***Why are these two types of information stored in separate files?***
>
> Think about, what happens and changes, while running simulations?

Enter the command below in a terminal. A dialog (+ the course instructor) will guide you through the generation process of the MD input files.

In [None]:
%%bash
# PDB2GMX
# =======
#
# Parses a PDB file and generates topology and coordinate files from it, to run 
# MD simulations.
#
# Parameters:
#   -f <file.pdb>   The input PDB file.
#   -o <file.gro>   The output coordinate file (Gromos87 format)
#   -p <file.top>   The output topology file.
#   -ter            Option for special parsing behaviour of PDB files.
#   -ignh           Remove all hydrogens from the PDB and add the yourself.

gmx pdb2gmx \
    -f $MYPROTEIN.pdb \
    -o $MYPROTEIN.4gmx.gro \
    -p $MYPROTEIN.top \
    -ter -ignh

In [None]:
import glob
import nglview as nv

grofile = glob.glob("*.4gmx.gro").pop()

view = nv.show_structure_file(grofile, default_representation=False)
view._remote_call("setSize", target="Widget", args=["800px", "600px"])
view.add_cartoon("protein", color_scheme="atomindex")
# view.add_surface("protein", opacity=0.15)
view.center() # Center and zoom molecule
view

***

### Setting up periodic boundary conditions ###

In this step, **periodic boundary conditions (PBC)** are applied to the protein by adding a periodic boundary box arround it. 
The PBC is is a 'mathematical trick' to simulate proteins dissolved in a bulk of water, while in reality only simulating the tiniest droplet of water.

> ***What is next to the droplet of water?***
>
> * nothing... (a vacuum?! But then the water would just disperse and expose the protein to the vacuum, too!)
> * a wall... (That would keep the water molecules in the droplet. But everytime they bounce back after crashing into the wall, they do not follow normal diffusion any more.)
> *  more water... (We really do not want to simulate millions of water molecules.)

PBC solve that issue, in an elegant way. The droplet is surrounded by infinite copies of itself. Therefore, from the protein's perspective it is in an infinite solution of water, with copies of itself (or biochemically speaking, a 0.1 M solution ;-)). This has the cartoon-like side effect, that if an atom exits the box on one side, it automatically reenters the same box on the opposing site.

<img src="https://isaacs.sourceforge.net/phys/images/these-seb/pbc-seb.png" alt="Periodic boundary conditions" width="250" />
<br>

> ***What does the flag `-d 1.2` in the command below mean?***
>
> Use `gmx editconf -h` to find out.

<br>

> ***What shape does our periodic boundary box has? Why was that shape chosen?***
>
> Use `gmx editconf -h` to find out. Discuss.

In [None]:
%%bash
# EDITCONF
# =======
#
# Modifies a structure file to include periodic boundary conditions.
#
# Parameters:
#   -f <file.gro>   The input structure.
#   -o <file.gro>   The output structure.
#   -bt             ... ?
#   -d              ... ?

gmx editconf \
    -f $MYPROTEIN.4gmx.gro \
    -o $MYPROTEIN.4gmx.box.gro \
    -bt dodecahedron \
    -d 1.2

***

## Solvating the system ##

Finally we fill all empty space in our periodic boundary box with with water molecules.

In [None]:
%%bash 
# SOLVATE
# =======
#
# Add water molecules to the system
#
# Parameters:
#   -cp <file.gro>      The input structure.
#   -cs <water.gro>     The structure to use for the solvent (i.e., water).
#   -o  <file.gro>      The output structure with the solvent molecules.
#   -p  <file.top>      The topology file. This is overwritten to include the
#                       water molecules.

gmx solvate \
    -cp $MYPROTEIN.4gmx.box.gro \
    -cs spc216.gro \
    -o $MYPROTEIN.4gmx.box.slv.gro \
    -p $MYPROTEIN.top

Let's have a look at the solvated structure.

> ***What does the shape of the periodic boundary box look like?***
>
> Visualize the structure, and compare what you see to what you generated before.

In [None]:
import glob
import nglview as nv

grofile = glob.glob("*.4gmx.box.slv.gro").pop()

view = nv.show_structure_file(grofile, default_representation=False)
view._remote_call("setSize", target="Widget", args=["800px", "600px"])
view.add_cartoon("protein", color_scheme="residueindex")
view.add_representation("line", selection="water")
view.add_surface("protein", opacity=0.15)
view.remove_spacefill()
view.center() # Center and zoom molecule
view

***

## Adding ions ##

Normally phsiological processes do not take place in destilled water. Therefore, it makes sense to use physiological ion concentrations in the simulation as well.
Moreover, most proteins are charged. However, in nature, there is no charge without a counter charge. Thus, we can use ions to neutralize our system.

In [None]:
%%bash

gmx grompp \
    -f add_ions.mdp \
    -c $MYPROTEIN.4gmx.box.slv.gro \
    -p $MYPROTEIN.top \
    -o add_ions.tpr

# When asked, which molecules should be replaced ions, select SOL (solvent molecules)
gmx genion \
    -s add_ions.tpr \
    -p $MYPROTEIN.top \
    -o $MYPROTEIN.4gmx.box.slv.ion.gro \
    -np 0 -pname NA -pq 1 \
    -nn 0 -nname CL -nq -1 \
    -conc 0.15 -neutral

In [None]:
import glob
import nglview as nv

grofile = glob.glob("*.4gmx.box.slv.ion.gro").pop()

view = nv.show_structure_file(grofile, default_representation=False)
view._remote_call("setSize", target="Widget", args=["800px", "600px"])
view.add_cartoon("protein", color_scheme="residueindex")
view.add_representation("line", selection="water")
view.add_representation("spacefill", selection="ion")
# view.add_surface("protein", opacity=0.15)
view.center() # Center and zoom molecule
view

***

## Conclusion

With that the preparation of the structural model is complete. 
We did:

* ... generate molecular dynamics parameters for our protein complex
* ... solvate the protein complex in water under periodic boundary conditions
* ... add ions to the water to emulate a 0.15 M sodium cloride solution