## Setting up Lysoszyme to run standard MD smulation
This notebook will guide you through how to use FESetup for setting up a protein simulation and generate output for various different simulation engines. 


The notebook forms part of the CCPBio-Sim workshop **Tackling Alchemistry with FESetup and Sire SOMD** run on the 10th of April 2018 at the University of Bristol.

*Author: Anotnia Mey   
Email: antonia.mey@ed.ac.uk*

**Reading time of the document: xx mins**

### Imports

In [None]:
%pylab inline
import nglview as nv
import BioSimSpace as BSS

## Setting up Lysozyme 
FESetup has a nice feature, where it will produce compatible output for different MD engines. It currently support *NAMD*, *AMBER*, *Gromacs* and *DLPOLY*, provided the executables for each of the simulation packages are actually installed. In the following we will look at how to set up lysozyme. The first part of this tutorial is found in the directory Task01. 

In [None]:
cd Task01

Let's start by loading the molecule.

In [None]:
view = BSS.viewMolecules('proteins/181L/protein.pdb')
view

As for the ethane methanol simulations we have an input file for FESetup. This time it has a new section named [protein]. This contains all the relevant information for setting up a protein simulation. Let's have a look at this very basic setup file. 

In [None]:
!head -n 30 setup.in

We want to put the protein 181L into a box of length 12 add some solvent and neutralize the box and run a set of minimisation steps. Executing this will generate the ouptut file `_proteins` which will contain the minimised protein structure. 

In [None]:
!FESetup setup.in

In [None]:
ls _proteins/181L

This directory now contains your solvated protein ready for an equilibration or other type of simulation run. This is the simplest way of preparing a protein simulation. Of course you can again use 

### Setup with Gromacs
We can use the same setup to run with gromacs. The only thin that will have to be changed is the line specifying the MD engine:   
`
mdengine = gromacs, mdrun
`   
If Gromacs is not installed the setup with fail. This is currently the case on this server, but we can still run an example file until we get the error. This time we use `182L` for the setup. 

In [None]:
!FESetup setup_gromacs.in

While we have recieed an error, because we don't have Gromacs installed and can't run the minimisation, we can have a look in the `_proteins/182L` directory. You will find it contains a `solvated.gro` and `solvated.top` file, which are the respective coordinate and topology files associated with Gromacs and can then be used to run your simulation with gromacs.  

In [None]:
ls _proteins/182L

**Task: What would you have to change in `setup.in` to prepare files compatible with NAMD?**

## Setting up Lysozyme with a ligand 
Setting up a single protein isn't necessarily such a complcated task. But setting up a protein and a ligand together might require more effort. The ligand will often lack parameters so partial charges for the forcefield need to be generated and doing this by hand is cumbersome. FESetup can help with this. Let's look at Task02 working with both protein and ligand. 

### Benzene as a ligand of Lysozyme

In [None]:
cd ../Task02

You now have a proteins and ligands directory. The proteins directory contains again the structure of 181L and the ligands directory that of benzene.    

**Task: visualise both the protein and the ligand to make sure that the ligand has the right coordinates**

In [None]:
#change appropriate code below
import mdtraj as md
protein = md.load('path/to/protein/file')
ligand = md.load('path/to/ligand/file')
from nglview import NGLWidget
view = NGLWidget()
view.add_trajectory(protein)
view.add_trajectory(ligand)
view

Now we want to actually greate a complex of the ligand and protein and solvate it. For this there is another `setup.in` file prepared. Let's look at this file a bit more closely:

In [None]:
!cat setup.in

We again have a ligand section which defines the moleucle  and also makes sure that benzene will be set up in a solvation box. Then again the same protein section, but now in addition there is the complex section, which defines the fact that we want to create a comple of 181L and benzene, but that in a box solvate it and minimise it. 

In [None]:
# Insert the code here to run the setup. 
!FESetup setup.in

In [None]:
ls _complexes/181L:benzene/

Now you should be able to located a directory called `_complex` that was generated by FESetup. It should contain the complex in vacuum and in solvent. 

In [None]:
# Insert code to visualise the complex in vacuum
view = BSS.viewMolecules('_complexes/181L:benzene/vacuum.pdb')


**Tip: Play around with the representations in order to be able to view both the ligand and the protein**

## Butylbenzene as a ligand of Lysosyme
Often you are in a position where you might want to run a simulation with a ligand where you only have a smiles string or sdf file. So how can we go from a smiles file to a solvated complex. One easy way to do this is using babel. 
Take a look at `ligands/ortho-xylene/ligand.smi`. This file contains the following smiles string: `CC1=C(C)C=CC=C1`. What we need to do it generate a 3D structure.   

The next cell uses babel to convert the input smiles string to a 3D structure of format mol2. 

In [None]:
!obabel -ismi ligands/ortho-xylene/ligand.smi  -opdb ligands/ortho-xylene/ligand.pdb --gen3D -h  > ligands/ortho-xylene/ligand.pdb

Double check that the ligand molecule has been generated!

Next there is already a perapred input file for ortho-xylene! Let's just run the setup!

In [None]:
!FESetup setup_xylene.in

Let's have a look at the generated complex for ortho-xylene. There seems to be a problem with the way the ligand is positioned!

In [None]:
view = BSS.viewMolecules('_complexes/181L:ortho-xylene/vacuum.pdb')

**Questions: Can you think of ways to align the ligand to the correct binding site of the protein? Write down a few ideas in the next cell:**

**Advanced task**: *Come back to this if you still have time at the end of the tutorial and implement a way to put xylene in an appropirate position in the binding site. Alternative the BioSimSpace Workshop will cover an easy way of how to do this.*

## Running MD of Lysozyme with benzene bound
We can now use the generated complex to run standard MD simulations. This could be done with different simulation software. Here we will be using Sire's SOMD. In the directory `Task03/simulation` an input file has already been perpared. SOMD takes standard Amber input files as input. Have a look at what has been provided. 

In [None]:
cd ../Task03

In [None]:
pwd

We will take a closer look at the config file. `solvated.rst7` and `solvated.parm7` are the coordinate file and topology file respectively. The complex was prepared with an additional equilibration unsing and NVT and NPT equilibration which we skipped during task two to avoid very long waiting times for molecular struactures to be generated. Prepping these kind of files on a cluster or powerful computer should only take a few minutes though. For convenience please use the files we provided and are ready for a production simulation. 

In [None]:
!cat sim.cfg

The above runs 200 fs of dynamics for the protein benzene complex using the NPT ensemble. Most directives of the config files should be quite self explanatory. the buffered coordinates frequency set to 0 means that every single frame is saved which is fine for a simulation of 100 frames. The executable for running straight up MD simulations is `somd` and comes with Sire. Usually it is adviced to have the platform set to `CUDA` or `OpenCL` as `somd` uses `OpenMM` to accelrate integration on a GPU. This cloud server does not have any GPUs available, therefore we must default to the CPU platform. 

In [None]:
!somd -C sim.cfg

As for the simulation, we will run 50 steps of molecular dynamics. 

In [None]:
protein = md.load('traj000000001.dcd', top='solvated.pdb')
view = NGLWidget()
view.add_trajectory(protein)
view

In [None]:
print (protein)

**Advanced Task: Can you compute and plot the RMSD of the benzene ligands over the saved coordinates?**

In [None]:
#Insert code for advance task here



## Setting up ligand perturbations with a protein
If you want to compute relative binding free energies of a set of ligands to a protein using an alchemical approach you will need to generate a pertubration map. In Task04 you will generate a set of ligands that are all similar and you need to define a perturbation map for them. Just think of what was discussed in the lecture in terms of perturbation maps. You want:
- many cycles, to make sure you can calculate robustness
- easy perturbations, avoid breaking large rings, if needed add intermediates
- always consider forward and backward perturbations. 

In [None]:
cd ../Task04

In [None]:
ls images/

We have the following ligands:
- indene,
- benzene,
- benzofurane
- indole
- o-xylene
- p-xylene
Have a look at their structures:
![foo](images/structures.png)

Tasks:
1. Can you change molsetup.in in such a way that all molecules are setup with FESetup and a complex is created the 181L structure without running a proper equilibration to avoid long execution times. 
2. can you suggest a set of perturbations and add them to morph.in to generate a perturbation map that is sensible in order to assess relative binding free energies of the above 6 molecules. 
3. Make sure you run molsetup.in and morph.in setup and look at the generated `_perturbations` directory which can then be used for running somd alchemical free energy calculations

In [None]:
# adjust molesetup.in before running this cell
!FESetup molsetup.in

In [None]:
# adjust morph.in before running this cell
!FESetup morph.in

In [None]:
# Add code to visualise some of your generated output. 




## Advanced topics: explicit mapping 
Sometimes the proposed atom mapping for the morphing is not quite what you might want to do. For this purpose FESetup has a built in feature that lets you explicitly map the atoms. This may become imporant with chiral molecules or symmetric molecules or where you think a mapping makes more sense than what is proposed by the automatic algorithm. Think of this for example:
![foo](images/mapping.png)
![foo](images/mapping2.png)
Let's look at Task05 and see how we can achieve and explicit atom mapping. 

In [None]:
cd ../Task05

You can explicity state the mapping in the `setup.in` file in this way:   
```
[ligand]
basedir = ligands
morph_pairs = benzene > benzofuran /1=3/2=2  # indices start from 1
```

Or you can use a `.map` file that sits in your ligands base directory to do the mapping. 

The content of the equivalent map file would look like this:   
```
# example mapping file benzene~benzofuran.map in the basedir ligands/
# explicitly map the following atom indexes onto each other
1 3 # this mapping and...
2 2 # ...this one will fix the orientation of benzofuran in space

```

**Tasks: **
1. Run FESetup using an explicit mapping or 1=3 and 2=2 in the setup.in file.
2. Run FESetup using a map file to do the same mapping attaching to 3 and 4 of the benzene C atoms

Please modify the existing file `setup.in` and `ligands/benzene~benzofurane.map` files to complete the task!

In [None]:
#The changes to the files will have to be done in the terminal, but feel free to add the code to execute FESetup here




## A little quizz for the end
Please have a quick look at this quizz to see what you have learned from this notebook:
https://goo.gl/forms/hSsPTYxnWvKjWj2B3