# Prepare Mcl-1 for MD simulation.

Here we demo the use of a workflow based on **Ambertools** for the preparation of systems for MD simulation. **Amber** has a particularly rich eco system of utilities for this process, but other simulation packages have their own tools too.

In a couple of places you will also be using some 3rd party utilities (not part of **Ambertools**) to help.

Most of this workshop will be done on the CCP-BioSim Jupyter Hub, but you will also need to 
create a folder on your own laptop to store files you download for visualization with Chimera.

## A. Find a suitable starting structure in the Protein Data Bank.

1. Visit the PDB website: https://www.rcsb.org
2. Type 'mcl-1' into the search box and press return
You will get a list of over 200 different entries in the PDB. Deciding which of these can provide the best "starting material" for a simulation of Mcl-1 in the apo- (unliganded) form
is beyond the scope of this workshop, but for no special reson we are going to use entry **5FDR** for this demo.

5. Type '5fdr' into the search box, and click on the same code in the pop-up that appears.
6. Examine the entry. In particular, note the cartoon picture in the top left (titled 'Biological Assembly 1').
7. Click on '<' or '>' to cycle through *three* other biological assemblies that are part of this PDB entry. They correspond to other molecules of the protein found in the unit cell of the crystal structure. You need to evaluate which of these is going to be the best to use as a start point for your model building.

## B. Analyse and evaluate the PDB entry.

**Do this section in your Jupyter Lab environment**

1. If you have not already done so, open a terminal session in your lab environment, and check you are in the home folder for the workshop.

2. If you type `ls` you will see the file `5fdr.pdb`. This has been downloaded from the PDB website for you. You will also see it in the file browser window.

3. (Optional) Out of curiosity, take a look at the contents of this file. You have some options here:

    1. Use the unix `less` command in your terminal window.
    2. Double-click on the file name in the browser window.

You need to establish which of the four biological assemblies in this PDB entry is going to be the best starting point for your model building. Protein structures in PDB entries are often incomplete. Atoms - sometimes entire amino acids - can be missing because they could not be resolved. You need to check the completeness of each assembly in the entry.

4. In the terminal window, type:
```

alpha_check -i 5fdr.pdb

```
The `alpha_check` utility finds the [UnitProt accession number](https://www.uniprot.org/help/accession_numbers) corresponding to the sequence in each *chain* (biological assembly), then finds the predicted 3D structure for this ID in the [Alphafold structure database](https://alphafold.com/). It then compares the crystal structure with the alphafold model, and reports on the differences.
   
       
5. Decide which of the four copies of the protein in the crystal structure seems most complete.
   We would suggest this is probably chain A - do you agree?

## C. Remediate the chain A structure.

Use residues and atoms from the alphafold model to fill in any "gaps" in the crystal structure-derived data. 
1. Use the "alpha_fix" utility
   for this:
```

       alpha_fix -i 5fdr.pdb -c A -o 5fdr_A_fixed.pdb

   ```
   
3. Confirm that missing atoms and residues have now been inserted:
```

       alpha_check -i 5fdr_A_fixed.pdb

   ```
   
5. (Optional) Download a copy of the remediated structure **to your laptop** and open it in *Chimera*. Confirm it is
   as expected (e.g. examine some of the residues that needed remediation).

## D. Prepare the remediated structure for simulation.

**Back in your Jupyter Lab environment now**

   The remediated structure is complete at the heavy-atom level, but contains no hydrogen atoms.
   These cannot be added until the tautomeric state of histidine residues has been chosen.
   
1. Use the **AmberTools** `pdb4amber` tool to analyse the protein structure, and which tautomeric form
   is most likely for each histidine (based on the pattern of hydrogen-bonding interactions it
   generates:
   ```
   
       pdb4amber -i 5fdr_A_fixed.pdb --reduce -o 5fdr_A_prepared.pdb


    ```
2. A number of output files are produced in addition to `5fdr_A_prepared.pdb` but we will not discuss them here.
3. Open `5fdr_A_prepared.pdb` in a text editor (or use `less`) and see how the structure now also contains hydrogen atoms, and the histidine residues, that used to have the residue name "HIS", are now named "HID" or "HIE" depending on which tautomer has been predicted as more likely in each environment.
4. (Optional) Download the structure to your laptop and view in Chimera

## E. Generate the required input files for AMBER MD simulation from the prepared protein.

   This final stage involves adding all remaining components of the simulation system
   (The solvent and maybe ions to represent the aqueous environment), and
   generating the two files that **AMBER** simulations require: one with the coordinates of the 
   system, the other with required topological and forcefield data.
1. Decide on details of the simulation system to be constructed. In particular, how will the
   aqueous environment be represented. Most usually the protein will be modelled in a periodic
   box of water molecules. The size and type of periodic box must be decided. It must be decided
   if it's also necessary to add a certain concentration of inorganic salt (typically NaCl) to
   mimic a physiological environment.
   Then the forcefields to use for the simulation must be chosen.
   
2. Here we will:

    1.  Use a truncated octahedral box of water that extends at least 12 Angstroms beyond any protein atom.
    2.  Add enough salt (Na+ and Cl- ions) to ensure the final system is electrically neutral and has a ion molarity of about 0.15M.
    3.  Use the FF19SB forcefield for the protein, and OPC model for the water.
This step is performed using the **Ambertools** `tleap` tool, but writing the required input file for `tleap` can be a little awkward, so we will use the `make_leap` utility to simplify the process:
```
    
       make_leap --inpdb 5fdr_A_prepared.pdb  --outinpcrd 5fdr_A.inpcrd \ 
           --outprmtop 5fdr_A.prmtop --solvate oct \
           --forcefields protein.ff19SB --padding 12.0 \
           --ion_molarity 0.15 > tleap.in

```  

4. Now use the **Ambertools** `tleap` tool to process this script and create the *coordinate* and *topology* files that are required for MD simulations using **Amber**:
```

        tleap -f tleap.in

```  
   
5. For visualization purposes, create a PDB format file for the completed system from the 
   AMBER-format files generated in the last step. The **Ambertools** `ambpdb` tool does this:
```

       ambpdb -p 5fdr_A.prmtop < 5fdr_A.inpcrd > 5fdr_A_amber.pdb

```

7. Download this file **to your laptop** and take a look in *Chimera*.

## All done!

With the files `5fdr_A.inpcrd` and `5fdr_A.prmtop` prepared, you are ready to start running MD simulations. It's generally possible to convert these files into alternative formats, so you can run your systems with **Gromacs** or **OpenMM** for example, not just **Amber**.
