# Input model generation 

## Generation of base cWza structure

**PyMOL commands**

* Get raw PDB coordinates for D4 domain from Wza crystal structure

Extracted coordinates of D4 domain of Wza from raw PDB `2j58`. Just selected residues from 345 to 376 with commands in PyMOL console

```python
fetch 2j58
select MyProtein, resi 345-376
save wza-d4_raw.pdb, MyProtein
```

Generate structure D4 (transmembrane) domain of Wza

IMPORTANT NOTE 1: This first structure is made with ALA`378, instead of PRO to allow
modification of dihedral angle for alpha-helical conformation of missing residues

```bash
pymol prepare_Wza.py -- wza-d4_raw.pdb wza-d4_No-PRO378.pdb
```

IMPORTANT NOTE 2: For chain G, a persistent clash appears between the side chain of TYR-373 and TPR-377
The only solution to remove this clash is to use PyMOL 'Sculpting'function manually, by selecting TRP sidechain
After that, we can save the file in a PDB, re-open it with PyMOL or VMD, and check that the clash, did not make
spurious bonds between the TYR and TRP sidechain atoms.


```bash
# We need to mutate PDB above to set PRO`378
pymol -qc mutate_Wza.py -- wza-d4_No-PRO378.pdb 'ALA`378' 'PRO' wza-d4.pdb

```

Generate PDB of concensus Wza-d4 structure

Point mutations used to match consensus sequence:

 * `SER-362 to THR(T)`
 * `MET-367 to LEU(L)`
 * `ARG-376 to THR(T)`

Generated subsequent PDB with added mutations, running PyMOL script `mutate_Wza.py` in terminal

```bash
pymol -qc mutate_Wza.py -- wza-d4.pdb 'SER`362' 'THR' wza-d4_S362T.pdb
pymol -qc mutate_Wza.py -- wza-d4_S362T.pdb 'MET`367' 'LEU' wza-d4_S362T_M367L.pdb
pymol -qc mutate_Wza.py -- wza-d4_S362T_M367L.pdb 'ARG`376' 'THR' cwza.pdb
```

**<span style="color:red">NOTE</span>**: Talk about removal of C-terminal residues, since conformation unclear. Plus, these may have drastic effects on MD simulations with applied electric voltage.

**PyMOL script for preparation of PDB**: `prepare_Wza.py`
```python
#!/usr/bin/env python
from pymol import cmd
import sys

in_protein = sys.argv[1]
out_protein = sys.argv[2]
to_build = ['trp','ala','asn']
to_build = to_build + ['nhh']

cmd.load(in_protein, "MyProtein")

# Alter exceptional selections in PDB
cmd.select("MSEs", "resn MSE")
cmd.alter("MSEs", "type='ATOM' ")
cmd.alter("MSEs", "resn='MET' ")
cmd.delete("MSEs")

cmd.select("SEs", "name SE")
cmd.alter("SEs", "name='SG' ")
cmd.alter("SEs", "elem='S' ")
cmd.delete("SEs")

# Get original chain names
Chains = cmd.get_chains()

# Fix dihedral angles of residues near C-termini
for chn in Chains:

        # Fix Psi dihedral angle of LYS-375 to alpha-helix
        res0 = 375
        atom2 = "/MyProtein//"+chn+"/"+str(res0)+"/N"
        atom3 = "/MyProtein//"+chn+"/"+str(res0)+"/CA"
        atom4 = "/MyProtein//"+chn+"/"+str(res0)+"/C"
        atom5 = "/MyProtein//"+chn+"/"+str(res0+1)+"/N"

        cmd.set_dihedral(atom2,atom3,atom4,atom5, -45)

        # Set Phi dihedral angle of ARG-376 to alpha-helix
        res0 = 376
        atom1 = "/MyProtein//"+chn+"/"+str(res0-1)+"/C"
        atom2 = "/MyProtein//"+chn+"/"+str(res0)+"/N"
        atom3 = "/MyProtein//"+chn+"/"+str(res0)+"/CA"
        atom4 = "/MyProtein//"+chn+"/"+str(res0)+"/C"

        cmd.set_dihedral(atom1,atom2,atom3,atom4, -60)

        # Add missing residues to C-termini on ARG-376
        # NOTE: Amidate C-termini by default. For the last residue, we need a N to set Psi
        # So, this is why we cap peptides from the beginning

        cmd.select("MyAA", "/MyProtein//"+chn+"/ARG`376/C")
        cmd.edit("MyAA")

        for aa in to_build:
                cmd.editor.attach_amino_acid("pk1", aa)
        cmd.unpick()

        # Set dihedral angles fo new residues to alpha-helix
        for res0 in range(376,380):
                atom1 = "/MyProtein//"+chn+"/"+str(res0-1)+"/C"
                atom2 = "/MyProtein//"+chn+"/"+str(res0)+"/N"
                atom3 = "/MyProtein//"+chn+"/"+str(res0)+"/CA"
                atom4 = "/MyProtein//"+chn+"/"+str(res0)+"/C"
                atom5 = "/MyProtein//"+chn+"/"+str(res0+1)+"/N"

                # Set (Phi,Psi) dihedral angles per residue, respectively 
                cmd.set_dihedral(atom1,atom2,atom3,atom4, -60)
                cmd.set_dihedral(atom2,atom3,atom4,atom5, -45)

        # Acetylate N-termini on ALA-345 
        cmd.select("MyAA", "/MyProtein//"+chn+"/ALA`345/N")
        cmd.edit("MyAA")
        cmd.editor.attach_amino_acid("pk1",'ace' )
        cmd.unpick()

# Rename NHH to NH2 (GROMACS format)  
cmd.select("NH2s", "resn NHH")
cmd.alter("NH2s", "resn='NH2'")
cmd.delete("NH2s")

# Remove steric clashes between sidechains

## Select only sidechain of ARG-371 to ASN-379          
cmd.select("sidechains", "resi 370:379 and ! bb.")

## Protect rest of the protein from modifications by sculpting
cmd.protect('(not sidechains)')

## Carry out Sculpting for 7000 cycles
cmd.sculpt_activate('MyProtein')
cmd.sculpt_iterate('MyProtein', cycles=7000)
cmd.sculpt_deactivate('MyProtein')
cmd.deprotect()

# Save sculpted structure into output file
cmd.save(out_protein, "MyProtein")
```

##  Mutate cWza crystal structure  (PyMOL)

`/home/ba13026/research/bg/Wza_Modeling/L-structures/rosetta/bg_test/cwza_NoWPN/1-refine_structure/1-1_preparation/input`

```python
pymol -qc mutate_Wza.py -- cwza_NoWPN.pdb 'TYR`29' 'ALA' cwzay373a_NoWPN.pdb
```

> *Input file*: `cwza_NoWPN.pdb`

> *Output file*: `cwzay373a_NoWPN.pdb`

> *Mutable residue*: Tyrosine 29

> *Mutant residue*: Alanine

**PyMOL script for mutation of Wza-D4 crystal structure**: `mutate_Wza.py`

```python
#!/usr/bin/env python 

from pymol import cmd
import sys

in_protein   = sys.argv[1] # PDB file of protein
to_mutate = sys.argv[2] # Residue to mutate in chain
mutate_to = sys.argv[3] # Mutant residue
out_protein = sys.argv[4] # PDB file output

# Load PDB of protein to mutate
cmd.load(in_protein, 'MyProtein')
# Extract list of chain names
Chains = cmd.get_chains('MyProtein')

# Call Mutagenesis function of Wizard
cmd.wizard("mutagenesis")
cmd.refresh_wizard()
# Set name of residue to mutate to
cmd.get_wizard().set_mode(mutate_to)
cmd.get_wizard().set_hyd("none")

for chain in Chains:
        # Select residue to mutate in chain
        cmd.select("to_mutate","/MyProtein//"+chain+"/"+to_mutate)
        # Allow Wizard to identify selected residue
        cmd.get_wizard().do_select('''to_mutate''')
        # Generate mutation 
        cmd.get_wizard().apply()
        # Restart selection for next mutation
        cmd.select("to_mutate", 'none')
        
# Close Wizard
cmd.set_wizard()
cmd.delete("to_mutate")

res = int(to_mutate.split("`")[1])
around_res = str(res-1)+":"+str(res+1)
cmd.select("sidechains", "resi "+around_res+"and ! bb.")
## Protect rest of the protein from modifications by sculpting
cmd.protect('(not sidechains)')

## Carry out Sculpting for 5000 cycles
cmd.sculpt_activate('MyProtein')
cmd.sculpt_iterate('MyProtein', cycles=5000)
cmd.sculpt_deactivate('MyProtein')
cmd.deprotect()

# Save mutated protein
cmd.save(out_protein, "MyProtein")
```

##  Renumber to Rosetta format

```python
python2 $ROSETTA38_HOME/tools/protein_tools/scripts/clean_pdb.py input/cwzay373a_NoWPN.pdb ignorechain
```

> *Output file*: cwzay373a_NoWPN_ignorechain.pdb

## Make span file

```bash
Rosetta-generated spanfile from SpanningTopology object
8 256
antiparallel
n2c
        14      31
        46      63
        78      95
        110     127
        142     159
        174     191
        206     223
        238     255

```

## Transform PDB coordinates to lipid coordinates and optimise embedding 

```python
./mpi_mptransform.sh input/cwzay373a_NoWPN_ignorechain.pdb input/cwzay373a_NoWPN_ignorechain.span
```

> *Output file*: `cwzay373a_NoWPN_ignorechain_0001.pdb`

# Relaxation of crystal structure

## Use fast relax Rosetta app

`sbatch mpi-relax.slurm`

> *Input*: `cwzay373a_NoWPN_ignorechain_0001_clean.pdb`, `cwzay373a_NoWPN_ignorechain_0001_tweaked.span`

> *Output*: relaxed models stored in `output` folder and score file `.sc`

> *Files*: `membrane_relax.xml`, `relax_flags`

**Cluster submission script**:  `mpi-relax.slurm`

```bash
#!/bin/bash -login
#SBATCH -p cpu
#SBATCH --ntasks-per-node=16
#SBATCH -N 2
#SBATCH -t 1-12:30
#SBATCH -A S2.1
#SBATCH -o output/mpi-run_test.log
#SBATCH -e output/error_mpi-run_test.log

module add apps/rosetta/mpi/3.8

mpiexec $ROSETTA38_MPI/rosetta_scripts.mpi.linuxgccrelease -database $ROSETTA38_DB @relax_flags -mpi_tracer_to_file ./logs/run

```

**Rosetta relax app flags**:  `relax_flags`

```bash
-parser:protocol membrane_relax.xml
-in:file:s ./input/cwzay373a_NoWPN_ignorechain_0001_clean.pdb
-ignore_unrecognized_res true
-mp:scoring:hbond
-mp:setup:spanfiles ./input/cwzay373a_NoWPN_ignorechain_0001_tweaked.span
-mp:thickness 20
-mp:visualize:thickness 20
-nstruct 1000
-relax:fast
-relax:jump_move true
-out:path:pdb ./output
-out:file:scorefile relax_merged.sc
-out:path:score ./output
-packing:pack_missing_sidechains 0
```

**Rosetta script for relax app**:  `membrane_relax.xml`

```bash
<ROSETTASCRIPTS>
        <SCOREFXNS>
                <ScoreFunction name="mpframework_smooth_fa_2012" weights="mpframework_smooth_fa_2012.wts"/>
        </SCOREFXNS>
        <MOVERS>
                <AddMembraneMover name="add_memb"/>
                <MembranePositionFromTopologyMover name="init_pos"/>
                <FastRelax name="fast_relax" scorefxn="mpframework_smooth_fa_2012" repeats="8"/>
        </MOVERS>
        <PROTOCOLS>
                <Add mover="add_memb"/>
                <Add mover="init_pos"/>
                <Add mover="fast_relax"/>
        </PROTOCOLS>
        <OUTPUT scorefxn="mpframework_smooth_fa_2012"/>
</ROSETTASCRIPTS>
```

Rationale:

## Model selection after Rosetta Fast Relax

From output score file, filter out values corresponding to `total score` and `description` columns. Sort these out by total energy in reversed order to identify those with the lowest score (top 5).

```bash
awk 'NR>2{print $2,$NF}' output/relax_merged.sc | sort -nr | tail -5
```

Output:

```
-804.840 cwzay373a_NoWPN_ignorechain_0001_clean_0587
-804.847 cwzay373a_NoWPN_ignorechain_0001_clean_0096
-805.714 cwzay373a_NoWPN_ignorechain_0001_clean_0406
-806.347 cwzay373a_NoWPN_ignorechain_0001_clean_0366
-807.011 cwzay373a_NoWPN_ignorechain_0001_clean_0266
```

Take the last model to clean it, then rename claned file as `refined1.pdb`, and finally transform this back to membrane coordinates optimising embedding. 

```bash

python2 $ROSETTA38_HOME/tools/protein_tools/scripts/clean_pdb.py output/cwzay373a_NoWPN_ignorechain_0001_clean_0266.pdb ignorechain

mv cwzay373a_NoWPN_ignorechain_0001_clean_0266_ignorechain.pdb refined1.pdb

./mpi_mptransform.sh refined1.pdb input/cwzay373a_NoWPN_ignorechain_0001_tweaked.span
```

Output: `refined1_0001.pdb`

# Symmetric docking

## Generate symmetry input files

```bash 
./make_symminputs.sh refined1_0001.pdb output 8
```

Output:
```
refined1_0001_symm.pdb
refined1_0001_INPUT.pdb
refined1_0001_model_AB8.pdb
refined1_0001.kin
output/refined1_0001.c8.symm
```

**Script to generate of input files for MPSymDock app**:  `make_symminputs.sh`

```bash
protein=$1 # PDB file of refined model 
outputf=$2 # Output folder
N=$3 # Symmetry order 
fname=${protein##*/}
fname=${fname%.pdb}
$ROSETTA38_HOME/main/source/src/apps/public/symmetry/make_symmdef_file.pl -p $protein  -a A -r 50 -i B:$N > ${outputf}/${fname}.c${N}.symm
```

## Run symmetric docking: MPSymDock app 

`sbatch mpi-symdock1_AB-mu.slurm`

**Cluster submission script**:  `mpi-symdock1_AB-mu.slurm`

```bash
#!/bin/bash -login
#SBATCH -p cpu
#SBATCH --ntasks-per-node=16
#SBATCH -N 2
#SBATCH -t 2-12:30
#SBATCH -A S2.1
#SBATCH -o output/mpi-symdock1_AB-mu.log
#SBATCH -e output/error_mpi-symdock1_AB-mu.log

module add apps/rosetta/mpi/3.8

mpiexec $ROSETTA38_MPI/mp_symdock.mpi.linuxgccrelease -database $ROSETTA38_DB @flags_symdock1_AB-mu
```

**Flags**:  `flags_symdock1_AB-mu`

```bash
cat flags_symdock1_AB-mu
-in:file:s input/refined1_0001_INPUT.pdb
-in:file:native input/cwzay373a_NoWPN_ignorechain_0001.pdb
-ignore_unrecognized_res true
-mp:setup:spanfiles input/chainA.span
-mp:scoring:hbond
-nstruct 1000
-symmetry:symmetry_definition input/refined1_0001.c8.symm
-symmetry:initialize_rigid_body_dofs
-packing:pack_missing_sidechains 0 
-docking:dock_lowres_filter 5.0 10.0
-out:path:pdb ./output
-out:file:scorefile symdock1.sc
-out:path:score ./output
```