# Molecular Dynamics: Protein-Ligand

Before continue, you should have:

- installed GROMACS
- put CHARMM36 force field in the correct location

In [None]:
import os

os.environ['PATH'] = '/usr/local/gromacs/bin:' + os.environ['PATH']

In [None]:
from pathlib import Path
from matplotlib.ticker import ScalarFormatter
import pandas as pd
from matplotlib import pyplot as plt

In [None]:
PROTEIN_CLS = "RAS"
PROTEIN_SID = "FM13"
LIGAND_ID = "GDP"
COMPLEX_ID = f"{PROTEIN_SID}-{LIGAND_ID}"
COMPLEX_PDB = Path(f"../docking/wd/RAS/FM13-GDP/FM13-GDP.docked.01.pdb")

# Create a new working directory!
WD = Path(f"wd-md2/{PROTEIN_CLS}/{COMPLEX_ID}")
WD.mkdir(parents=True, exist_ok=True)

# TOOLS: Change the path accordingly
GROMACS_PATH = "~/WS/ProtMatch/gromacs"

We then need to prepare the protein topology and the ligand topology separately.

1. If you start with a PDB file of a complex, you may want to strip out the crystal waters and unwanted ligands.
   Replace UNL in files with your correct ligand name.

In [None]:
!cp {COMPLEX_PDB} {WD}/{COMPLEX_PDB.name}
COMPLEX_PDB = Path(COMPLEX_PDB.name)

LIGAND_PDB = Path(f"{LIGAND_ID}.pdb")
PROTEIN_PDB = Path(f"{PROTEIN_SID}.pdb")

# ligand
!cd {WD} && grep UNL {COMPLEX_PDB} > {LIGAND_PDB}
# !cd {WD} && grep {LIGAND_ID} {COMPLEX_PDB} > {LIGAND_PDB}
# protein
!cd {WD} && grep -v UNL {COMPLEX_PDB} > {PROTEIN_PDB}.tmp
# !cd {WD} && grep -v HETATM {COMPLEX_PDB} > {PROTEIN_PDB}.tmp
!cd {WD} && grep -v CONECT {PROTEIN_PDB}.tmp > {PROTEIN_PDB}
!cd {WD} && rm -f {PROTEIN_PDB}.tmp

del COMPLEX_PDB

2. Or you have already prepared seperated PDB files for protein and its ligand, just specify the paths.

    If you are simulating generated structures, make sure you have
    - predicted the sequence of the generated structure
    - modelled and added side chains to the structure

In [None]:
# PROTEIN_PDB = f"{PROTEIN_SID}.pdb"

## Protein Topology with CHARMM36 Force Field

In [None]:
PROTEIN_GRO = Path(f"{PROTEIN_SID}.gro")
PROTEIN_TOP = Path(f"{PROTEIN_SID}.top")
FF_PATH = Path("../../gromacs/charmm36-jul2022.ff")
!cp -r {FF_PATH} {WD}
!cd {WD} && gmx pdb2gmx -f {PROTEIN_PDB} -o {PROTEIN_GRO} -p {PROTEIN_TOP} -water tip3p -ff charmm36-jul2022 -ignh
# Any errors check https://manual.gromacs.org/2021.4/user-guide/run-time-errors.html.
# If a fatal error occurred due to H, this atom may not defined in the FF.
# Consider to remove this atom from PDB, or (not recommended) ignore hydrogen atoms with `-ignh`.
# If atom C1 not found in buiding block 1MET while combining tdb and rtp, you may need `-ter`.
# Check more at https://gromacs.bioexcel.eu/t/newest-charmm36-port-for-gromacs/868/11.

You may want to move the protein and its ligand(s) to the origin before continue.

## Ligand Topology

Refer to http://www.mdtutorials.com/gmx/complex/.

1. Add hydrogen atoms and convert <ligand>.sdf to <ligand>.mol2, using tools like openbabel or Avogadro.
2. Fix the residue names and numbers.
   Use sort_mol2_bonds.pl (http://www.mdtutorials.com/gmx/complex/Files/sort_mol2_bonds.pl) to sort bonds in ascending order.
3. Generate the ligand topology with the CGenFF server. [https://app.cgenff.com/homepage]
   CGenFF server will return a topology in the form of a CHARMM "stream" file (.str).
   Save its contents into a file called <ligand>.str.
4. Convert CHARMM stream file to GROMACS format.
   https://github.com/Lemkul-Lab/cgenff_charmm2gmx
   (You may need python 3.5 to run this script)

In [None]:
SORTMOL2 = "../../gromacs/sort_mol2_bonds.pl"
!cp {SORTMOL2} {WD}
!cd {WD} && perl sort_mol2_bonds.pl {LIGAND_ID}.mol2 {LIGAND_ID}.fix.mol2

In [None]:
CHARMM2GMX = "../../gromacs/cgenff_charmm2gmx_py3_nx2.py"
!cp {CHARMM2GMX} {WD}
# In working directory, run the following with Python 3.5
# python cgenff_charmm2gmx_py3_nx2.py GDP GDP.mol2 GDP.str charmm36-jul2022.ff

## Build the Complex

In [None]:
LIGAND_GRO = f"{LIGAND_ID}.gro"
# change *_ini.pdb accordingly
!cd {WD} && gmx editconf -f {LIGAND_ID.lower()}_ini.pdb -o {LIGAND_GRO}

In [None]:
COMPLEX_GRO = "complex.gro"
!cd {WD} && cp {PROTEIN_GRO} complex.gro

(see more in http://www.mdtutorials.com/gmx/complex/02_topology.html)
1. Copy <protein>.gro to a new file complex.gro
2. Copy the coordinate section of <ligand>.gro and paste it into complex.gro
3. Increment the second line of complex.gro to reflect this change.

## Build the Topology
1. Insert line `#include "<ligand>.itp"` into <protein>.top after the position restraint file is included.

   ```
   ; Include Position restraint file
    #ifdef POSRES
    #include "posre.itp"
    #endif
    
    ; Include ligand topology <-- ADD
    #include "<ligand>.itp" <-- ADD
    
    ; Include water topology
    #include "./charmm36-jul2022.ff/tip3p.itp"
    ```
   
2. At the TOP of <protein>.top, insert an #include statement to add these parameters:

   ```
   ; Include forcefield parameters
    #include "./charmm36-jul2022.ff/forcefield.itp"
    
    ; Include ligand parameters <-- ADD
    #include "<ligand>.prm" <-- ADD
    
    [ moleculetype ]
    ; Name            nrexcl
    Protein_chain_A     3
   ```

3. At the END of <protein>.top, insert

   ```
   [ molecules ]
   ; Compound        #mols
   Protein_chain_A     1
   GDP                 1 <-- ADD
   ```

## Setting Up the Simulation Box and Solvating the System

In [None]:
BOX_GRO = "box.gro"
!cd {WD} && gmx editconf -f {COMPLEX_GRO} -o {BOX_GRO} -c -d 1.5 -bt octahedron

In [None]:
SOLV_GRO = "solv.gro"
!cd {WD} && gmx solvate -cp {BOX_GRO} -cs spc216.gro -p {PROTEIN_TOP} -o {SOLV_GRO}

## Adding Ions

In [None]:
IONS_TPR = "ions.tpr"
MIN_SD_MDP = Path(GROMACS_PATH) / "min_sd.mdp"
!cd {WD} && gmx grompp -f {MIN_SD_MDP} -c {SOLV_GRO} -p {PROTEIN_TOP} -o {IONS_TPR} -maxwarn 1

In [None]:
SOLV_IONS_GRO = "solv_ions.gro"
!cd {WD} && echo SOL | gmx genion -s {IONS_TPR} -o {SOLV_IONS_GRO} -conc 0.15 -pname NA -nname CL -neutral -p {PROTEIN_TOP}

## Energy Minimization

In [None]:
EM_TPR = "em.tpr"
!cd {WD} && gmx grompp -v -f {MIN_SD_MDP} -c {SOLV_IONS_GRO} -p {PROTEIN_TOP} -o {EM_TPR}

In [None]:
!cd {WD} && gmx mdrun -v -deffnm {Path(EM_TPR).stem} -c {SOLV_IONS_GRO} -gpu_id 0

In [None]:
EM_EDR = "em.edr"
EM_XVG = "em.xvg"
!cd {WD} && echo "11" | gmx energy -f {EM_EDR} -o {EM_XVG} -xvg none

fig, ax = plt.subplots(figsize=(3, 3))
df = pd.read_csv(WD / EM_XVG, sep='\s+', header=None, names=['step', 'energy'])
plt.plot(df["step"], df["energy"], color="black")
plt.xlabel("step")
plt.ylabel("energy (kJ/mol)")
ax.yaxis.set_major_formatter(ScalarFormatter(useMathText=True))
ax.ticklabel_format(style='sci', axis='y', scilimits=(0, 0))
plt.grid(False)
plt.show()

## Restraining the Ligand

1. Generate a position restraint topology for the ligand. 
   First, create an index group for the ligand that contains only its non-hydrogen atoms:

In [ ]:
stop!

In [None]:
RES_LIGAND_NDX = f"{LIGAND_ID}.res.ndx"
options = "0 & ! a H*\nq\n"
!cd {WD} && echo gmx make_ndx -f {LIGAND_GRO} -o {RES_LIGAND_NDX}

2. Execute the genrestr module and select this newly created index group.

In [ ]:
stop

In [None]:
RES_LIGAND_ITP = f"{LIGAND_ID.lower()}.res.itp"
options = "2" # for Group 2 (Ligand)
!cd {WD} && echo {options} | gmx genrestr -f {LIGAND_GRO} -n {RES_LIGAND_NDX} -o {RES_LIGAND_ITP} -fc 1000 1000 1000

3. Include this information in the topology <protein>.top
   If we simply want to restrain the ligand whenever the protein is also restrained, add the following lines to the topology in the location indicated:
   ```
    ; Include ligand topology
    #include "gdp.itp"
    
    ; Ligand position restraints <--
    #ifdef POSRES <--
    #include "gdp.res.itp" <--
    #endif <--
    
    ; Include water topology
    #include "./charmm36-jul2022.ff/tip3p.itp"
   ```
   
## Thermostats

Do not couple every single species in your system separately.

Since <ligand> and the protein are physically linked very tightly, it is best to consider them as a single entity. 

In [None]:
THERMOSTATS_NDX = "thermostats.ndx"
# Typical options: Protein | <ligand>
!cd {WD} && echo gmx make_ndx -f {SOLV_IONS_GRO} -o {THERMOSTATS_NDX} # remove echo

Set `tc-grps = Protein_<ligand> Water_and_ions`.

## NVT Equlibration

In [None]:
NVT_MDP = Path("../../gromacs/nvt.gdp.mdp")
!cp {NVT_MDP} {WD}

NVT_MDP = NVT_MDP.name
NVT_NN = f"{PROTEIN_PDB.stem}_NVT"
NVT_TPR = Path(f"{NVT_NN}.tpr")

!cd {WD} && gmx grompp -v -f {NVT_MDP} -c {SOLV_IONS_GRO} -r {SOLV_IONS_GRO} -p {PROTEIN_TOP} -n {THERMOSTATS_NDX} -o {NVT_TPR}
!cd {WD} && gmx mdrun -v -deffnm {NVT_NN} -s {NVT_TPR} -gpu_id 0

## NPT Equilibration

In [None]:
NPT_MDP = Path("../../gromacs/npt.gdp.mdp")
!cp {NPT_MDP} {WD}

NPT_MDP = NPT_MDP.name
NPT_NN = f"{PROTEIN_PDB.stem}_NPT"
NPT_TPR = Path(f"{NPT_NN}.tpr")

!cd {WD} && gmx grompp -v -f {NPT_MDP} -c {NVT_NN}.gro -t {NVT_NN}.cpt -r {NVT_NN}.gro -p {PROTEIN_TOP} -n {THERMOSTATS_NDX} -o {NPT_TPR}
!cd {WD} && gmx mdrun -v -deffnm {NPT_NN} -s {NPT_TPR} -gpu_id 0

## Productin

In [None]:
PROD_MDP = Path("../../gromacs/prod.gdp.mdp")
!cp {PROD_MDP} {WD}

PROD_MDP = PROD_MDP.name
PROD_NN = f"{PROTEIN_PDB.stem}_PROD"
PROD_TPR = Path(f"{PROD_NN}.tpr")

!cd {WD} && gmx grompp -v -f {PROD_MDP} -c {NPT_NN}.gro -t {NPT_NN}.cpt -p {PROTEIN_TOP} -n {THERMOSTATS_NDX} -o {PROD_TPR}
!cd {WD} && gmx mdrun -v -deffnm {PROD_NN} -s {PROD_TPR} -gpu_id 0

In [None]:
!rm -rf {WD}/charmm36-nov2016.ff

In [None]:
MAX_TIME = 200  # You can get this from the previous command.
os.mkdir(WD / "trajectories")
for i, t in enumerate(range(0, MAX_TIME, 10)):
    !cd {WD} && printf "System" | gmx trjconv -s {PROD_TPR} -f {PROD_NN}.xtc -o trajectories/trj{i}.pdb -dump {t}