# Using ProteinSwap to Investigate Mutation

ProteinSwap is very similar to LigandSwap. The simulation swaps a single ligand between two different proteins. This gives a free energy that indicates which protein binds the ligand most strongly. This is useful to investigate the effect of mutation. For example, you could swap a ligand between the wildtype and mutant protein. If the proteinswap free energy is negative, this indicates that the mutant binds the ligand more strongly. If the proteinswap free energy is positive, then this indicates that the mutant binds the ligand more weakly, and is therefore potentially resistant to that ligand. 

Like ligandswap, proteinswap is distributed as part of Sire. The program is found in the `$SIRE/bin` directory, assuming that `$SIRE` has been set to the directory in which Sire is installed. This notebook relies on the proteinswap that comes with version 2018.1 or later of Sire.

You use proteinswap in a similar way to ligandswap. Like ligandswap, proteinswap comes with detailed help via `--help` and `--help-config`.

In [1]:
! $SIRE/bin/proteinswap --help

Starting /opt/conda/bin/proteinswap: number of threads equals 2

usage: proteinswap [-h] [--description] [-H] [--author] [--version]
                   [-l [LIGAND]] [-t0 [TOPOLOGY_FILE0]] [-t1 [TOPOLOGY_FILE1]]
                   [-c0 [COORDINATE_FILE0]] [-c1 [COORDINATE_FILE1]]
                   [-C [CONFIG]]
                   [--lambda_values LAMBDA_VALUES [LAMBDA_VALUES ...]]
                   [-n [NUM_ITERATIONS]]

Calculate relative binding free energies using proteinswap

optional arguments:
  -h, --help            show this help message and exit
  --description         Print a complete description of this program.
  -H, --help-config     Get additional help regarding all of the parameters
                        (and their default values) that can be set in the
                        optionally-supplied CONFIG file
  --author              Get information about the authors of this script.
  --version             Get version information about this script.
  -l [LIGAND], --lig

## Understanding the input

Proteinswap calculates the relative binding free energy of a single ligand to the two different proteins (the reference and perturbed proteins). As input it needs;

* An amber-format topology file and coordinate file for the ligand bound to the reference protein, in a cubic periodic box of water.
* An amber-format topology file and coordinate file containing the same ligand bound to the perturbed protein, in a cubic periodic box of water.
* Optionally A configuration file that overrides the default parameters for the proteinswap calculation.

The proteinswap directory that comes with this notebook contains all of the files that are needed to calculate the relative free energy of the ligand oseltamivir binding to the wildtype and R292K mutant of the protein neuraminidase.

The files in this directory that are needed for proteinswap are;

* An amber-format topology (OWT.top) and coordinate file (OWT.rst7) of oseltamivir (called OSE) bound to wildtype neuraminidase

* An amber-format topology (OR2K.top) and coordinate file (OR2K.rst7) of oseltamivir (also called OSE) bound to teh R292K mutant of neuraminidase

As before, these input files are compressed to save disk space. Unpack them now and take a look...

In [2]:
! bunzip2 -k *.rst7.bz2 *.top.bz2; ls

01_running_proteinswap.ipynb   images	      OWT.rst7.bz2
02_understanding_output.ipynb  OR2K.rst7      OWT.top
03_analysis.ipynb	       OR2K.rst7.bz2  OWT.top.bz2
04_interactive_analysis.ipynb  OR2K.top       README.md
05_components.ipynb	       OR2K.top.bz2
example_output.tar.bz2	       OWT.rst7


## Running a ProteinSwap Calculation

The general way to run a proteinswap calculation is by typing;

```
$SIRE/bin/proteinswap -t0 reference.top -c0 reference.crd -t1 perturbed.top -c1 perturbed.crd -l NAME -C config
```

where

* reference.top is the name of the topology file containing the reference protein bound to the ligand in a periodic cubic box of water
* reference.crd is the corresponding coordinate file to reference.top
* perturbed.top is the name of the topology file containing the perturned protein bound to the ligand in a periodic cubic box of water
* perturbed.crd is the corresponding coordinate file to perturbed.crd
* NAME is the residue name of one of the residues in the ligand. Note that only one molecule in reference.top and perturbed.top should have this name.
* config is the name of the (optional) config file to control proteinswap.

In our case, we could run a proteinswap calculation using the following command;

In [None]:
! $SIRE/bin/proteinswap -t0 OWT.top -c0 OWT.rst7 -t1 OR2K.top -c1 OR2K.rst7 -l OSE

## WARNING

The above command will take *a really long time (days)* and uses a lot of memory (>4 GB). If you are running this notebook in the cloud then the proteinswap calculation will likely be killed. For the purposes of this workshop kill the proteinswap calculation yourself by clicking "Kernel | Interupt" in the menu at the top.