# Tutorial for MS$\lambda$D Small Molecule Ligand Perturbations

This tutorial assumes that you have pyCHARMM and pyALF working within your Conda environment and that the MMTSB toolset is installed and accessible.

### Prior to starting this tutorial:  
- Your protein-ligand complex should be solvated in a periodic box of water (e.g. a cube, etc.). The FAAH system used in this tutorial has been solvated in TIP3P water with an ionic buffer using the CHARMM-GUI. These files can be found in `../4msld-py-prep/faah_prep/faah_charmmgui`


- Your congeneric ligand series should have been processed with msld-py-prep to generate the hybrid multiple topology and parameter files needed to work with MSλD in CHARMM. A copy of these files can be found in `./faah_prep/msld-py-prep`

## 1) Create `protein` and `water` MS$\lambda$D  working directories

Copy the `template_protein` and `template_water` directories for MS$\lambda$D into new `protein` and `water` directories, respectively

In [None]:
! cp -r template_protein protein
! cp -r template_water water

## 2) Set up the `prep` directories

First, let's set up the protein/prep directory. Copy the `faah_prep/msld-py-prep/build.faah` directory into `protein` as `prep`. The pdb coordinates in these ligand files were previously aligned with the solvated protein-ligand complex from the CHARMM-GUI before msld-py-prep was originally run.

In [None]:
! cp -r faah_prep/msld-py-prep/build.faah ./protein/prep

Next, we need to generate pdb files for the rest of the protein-ligand complex, specifically for: the protein, water molecules, and ions. One way we can approach this (and there are other ways) is to split apart the solvated system pdb file (`step2_solvator.pdb`) that was obtained from the CHARMM-GUI into separate pdb files. For convenience, `step2_solvator.pdb` can be found in `faah_prep/split_complex`. An additional script, `list_segids.py`, is also found in that directory. This script reads an input pdb file and breaks it apart into separate pdb files by segid. (The script assumes it is given a CHARMM-formmatted pdb file). Running this script on `step2_solvator.pdb` will provide us with protein (proa.pdb), water (solv.pdb), and ion (ion.pdb) pdb files. Next, we need to copy these files into our `protein/prep` directory.

In [None]:
import os
os.chdir('./faah_prep/split_complex') #! cd faah_prep/split_complex
! python list_segids.py step2_solvator.pdb
! cp proa.pdb ../../protein/prep/
! cp solv.pdb ../../protein/prep/
! cp ions.pdb ../../protein/prep/
os.chdir('../../protein') #! cd ../../protein/

Within the `protein` directory, we'll finish setting up the `prep` directory by (manually) editing the following files and moving them into `prep`:

**1) alf_info:**
This script contains important variables for running ALF
 - edit the `name` key value to be 'protein' *(this corresponds to the `protein.py` file)*
 - edit the `nsubs` key value to match the number of substituents you're planning to sample with MS$\lambda$D (e.g. 6 for this tutorial)
 - check other key values. In most cases, these will remain unchanged. In some cases, the temperature (`temp`) or the `enginepath` may be different. For using pyCHARMM, the `enginepath` is set to `''`.
 
**2) protein.py:**
This is a pyCHARMM MS$\lambda$D script that loads the full alchemical system into CHARMM prior to running minimization or  dynamics (those functions are called in `msld_flat.py` and `msld_prod.py` as part of running ALF); therefore, we often consider the `protein.py` script to be *system specific* and it must be edited to match the needs of your system. For example, the following variables or sections are often edited:
 - the box and pme-grid sizes (variables named `box` and `pmegrid`, respectively)
 - the `pert` dictionary, which defines which ligand (or protein side-chain) mutations should be sampled. For ligand perturbations, we list the substituent numbers, the ligand segid, and the ligand resid.
 - the sections where the protein, solvent, and ions are read into CHARMM (denoted with `#read in protein` or `#read in solvent and ions` comment lines). Pay special attention to file names and segid names in these sections. And, in some systems, more than one protein, solvent, or ion file may need to be read in.
 - `image.setup_segment` or `image.setup_residue` lines should be updated. Proteins and ligands are modeled with `image.setup_segment`, and solvent or ions are modeled with `image.setup_residue`. Adjust the segid names as appropriate.
 - the rest of protein.py sets up the BLOCK module within CHARMM. For most ligand perturbations, no edits are required here.

In [None]:
# In this tutorial, these files are already pre-edited for the FAAH example 
# system, and they can be moved directly into prep
! mv alf_info.py prep/
! mv msld_patch.py prep/
! mv protein.py prep/
os.chdir('../') #! cd ../

Next, let's set up the `water/prep` directory. Back in `faah_prep/msld-py-prep`, we need to run `Lg_Solvate.sh`. This script uses the *convpdb.pl* utility from the MMTSB to solvate the ligand (& its substituents) in a cubic box of TIP3P water. All ligand pdb files in `build.faah` are translated into the center of the newly generated solvent box, and all other toppar files are copied over. A `solv_prep` directory is created in this process, and we need to copy it into our `water` MS$\lambda$D working directory. It is also important to record the size of the generated box. This information will be used later when modifying the `water/water.py` script.

In [None]:
os.chdir('./faah_prep/msld-py-prep') # cd ./faah_prep/msld-py-prep
# the 12.0 in the next line specifies the buffer distance from the molecule to the edge of the box
! bash Lg_Solvate.sh 12.0
! cp -r solv_prep ../../water/prep
os.chdir('../../water') # cd ../../water

Similar to the steps we took above in `protein`, we'll finish setting up the `prep` directory within `water` by (manually) editing the following files and moving them into `prep`:

**1) alf_info:**
This script contains important variables for running ALF
 - edit the `name` key value to be 'water' *(this corresponds to the `water.py` file)*
 - edit the `nsubs` key value to match the number of substituents you're planning to sample with MS$\lambda$D (e.g. 6 for this tutorial)
 - check other key values. In most cases, these will remain unchanged. In some cases, the temperature (`temp`) or the `enginepath` may be different. For using pyCHARMM, the `enginepath` is set to `''`.
 
**2) water.py:**
This is a pyCHARMM MS$\lambda$D script for the unbound-solvated ligand simulation. Similar to `protein.py`, `water.py` should be edited to match the system being modeled. For example, the following variables or sections are often edited:
 - the box and pme-grid sizes (variables named `box` and `pmegrid`, respectively)
 - the `pert` dictionary, which defines which ligand (or protein side-chain) mutations should be sampled. For ligand perturbations, we list the substituent numbers, the ligand segid, and the ligand resid.
 - the sections where the solvent and ions are read into CHARMM (denoted with a `#read in solvent and ions` comment line). Pay special attention to file names and segid names in these sections. And, in some systems, more than one solvent or ion file may need to be read in. Also note that lines related to reading in protein files are commented out.
 - `image.setup_segment` or `image.setup_residue` lines should be updated. Ligands are modeled with `image.setup_segment`, and solvent or ions are modeled with `image.setup_residue`. Adjust the segid names as appropriate.
 - the rest of water.py sets up the BLOCK module within CHARMM. For most ligand perturbations, no edits are required here.

In [None]:
# In this tutorial, these files are already pre-edited for the FAAH example 
# system, and they can be moved directly into prep
! mv alf_info.py prep/
! mv msld_patch.py prep/
! mv water.py prep/
os.chdir('../') #! cd ../

## 3) Customize runtime scripts

Now that the `protein/prep` and `water/prep` directories are set up, we're almost ready to start running both ALF/MS$\lambda$D simulations.

However, before we can start ALF, `runflat.sh`, `runprod.sh`, and `postprocess.sh` files need to correctly load your Conda virtual environment. In each file, edit lines 3 and 4 so that the scripts will correctly load your python environment when they are called by `subsetAll.sh`.

`subsetAll.sh` also needs to be edited to reflect your local cluster's queuing system. The current scripts are set up to work with Slurm. In this case, edit lines 4 and 5 the your cluster's walltime restrictions, partition name, and gpu specifications (if appropriate). If your cluster does not use slurm, edit lines 4 and 5, as well as all other lines that use the `sbatch` slurm command, and replace these with commands appropriate for your system.

After these edits have been made, the ALF/MS$\lambda$D calculations can begin by executing `subsetAll.sh`.

In [None]:
! echo "In both protein and water directories"
! echo "Edit runflat.sh, runprod.sh, postprocess.sh to correctly load your local python virtual environment"
! echo "Edit subsetAll.sh to match your cluster's queuing setup (slurm, etc.)"
! echo "When ready, submit the job by executing the subsetAll.sh script"

## 4) Run ALF/MS$\lambda$D

To submit the jobs, change into `protein` and `water` directories and run:
`bash subsetAll.sh > job.log; more job.log`

In [None]:
os.chdir('./protein')
! bash subsetAll.sh > job.log; more job.log
os.chdir('../water')
! bash subsetAll.sh > job.log; more job.log
os.chdir('../')

## 6) ALF/MS$\lambda$D Output

In the current setup, ALF will perform 113 short (100 ps -1 ns-long) MS$\lambda$D simulations for adaptive flattening. These simulations are performed in directories called `run1`, `run2`, ..., `run113`. Next, a brief 5 ns pre-production run is performed with 5 independent trials (labeled `run114a`, `run114b`, ..., `run114e`). And finally, two full production runs (`run115` and `run116`) are performed with 5 independent trials (`a-e`) for 25 ns and 50 ns, respectively. 

Upon completion of all ALF/MS$\lambda$D calculations, free energy differences are computed and reported in `analysis115/Result.txt` and `analysis116/Result.txt`.

## 7) Compute the $\Delta$$\Delta$G(binding) results

<img src="binding_thermodynamic_cycle.png" width="400">

Using the above thermodynamic cycle for calculating relative differences in binding affinities between two or more ligands, our $\Delta$$\Delta$*G*(bind) can be computed by taking the results in `protein/analysis115/Result.txt` and subtracting from them the reults  in `water/analysis115/Result.txt`.



An example `analysis115` directory can be found in `finished_example`. The `collect_ddg.py` script can be used to calculate the $\Delta$$\Delta$*G*(bind) results


In [None]:
os.chdir('./finished_example')
! python collect_ddg.py 115
os.chdir('../')

## Finished