# Submodule 3 – Protein Modeling with COOT


## Overview
Once diffraction data have been processed into an interpretable map model building can begin. Whether the data were anomalous, and a preliminary structure was built using AutoSol, or molecular replacement with a search model was used, the process of model building and refinement is the final step to acquiring a completed structure. For this section we will be using a molecular graphics program designed for macromolecular model building called the Crystallographic Object-Oriented Toolkit, better known simply as Coot. It can run on Linux, PC, and Mac platforms. Running Coot on a PC simply involves downloading and running a WinCoot.exe file. The easiest way to get Coot running on a Mac is to download the Collaborative Computational Project Number 4 (CCP4) suite of programs. CCP4 includes several programs used in macromolecular crystallography in addition to COOT and integrates well with Mac OS. It can also be installed using the software package manager Homebrew, but this is more complicated. COOT has been around since 2004 and has widely replaced other model building programs due to its ease of use and stability. Amino acids can be added, deleted, and structurally changed in accordance with the electron density maps calculated from x-ray data. The maps are used as a guide on how to position the atoms in the structure. Each residue in the structure is visually inspected and corrected as needed to match the electron density map. Once the model has been built it is refined in Phenix. The outputs of this refinement process are an improved structure and electron density maps. Then the process starts over until further improvements to the structure cannot be made. The final structure is then validated for deposit into the Protein Data Bank (PDB).

## Learning Objectives
- Learn how to use COOT to build models using electron density maps from x-ray diffraction data. 
- Use Phenix to refine structures.
- Prepare a final model for deposition into the PDB.

## Prerequisites
- Phenix Account setup 
- Completion of submodules 1 & 2
- Background reading on COOT:
    + [COOT Documentation](https://phenix-online.org/documentation/coot.html)
    + [Emsley et al (2004)](https://pubmed.ncbi.nlm.nih.gov/15572765/)
    + [Emsley et al (2010)](https://pubmed.ncbi.nlm.nih.gov/20383002/)

------------
## Activity 1 - Model Building

#### Step 1 - Pull data files.
First we will need to pull the Lysozyme_incomplete.pdb, Lysozyme_phases.mtz, Lysozyme_fasta_sequence.txt. These can be obtained by running the following code to pull from a public S3 bucket.

In [None]:
!aws s3 cp s3://submodule3/Lysozyme --recursive

### Step 2 - Generate electron density maps. 
**Step 2.1** - Load the .pdb file into COOT by choosing: <br>
    `File -> Open Coordinates` 
+ Electron density maps are calculated using a .mtz file containing diffraction data. 
        
**Step 2.2** - Maps are automatically calculated by selecting: <br>
    `File -> Auto Open MTZ` 

**Step 2.3** - Observe the output. Two maps will appear, one with blue density, and another with green and red density. The blue map is called a 2FO-FC map and the green (positive) and red (negative) densities are from a FO-FC map, or difference map. The term FO  refers to phases taken from the observed data, which in x-ray crystallography is the diffraction data collected from crystals. The term FC refers to the phases calculated from the model that has been built. This can be done because any 3-dimentional structure can be converted to phases and amplitudes using a Fourier transform, just like how a Fourier transform was used to take phase and amplitude data to make maps and solve the structure. The first calculated structure was the model that resulted from solving the structure. It provides the phases for the FC portion of the 2FO-FC map. The 2 in front FO in the 2FO-FC map means the contribution of the observed phases is doubled and the minus sign shows that the calculated phases are subtracted. Ultimately what this map displays is where electron density is found in our structure. That map is used to build the atomic coordinates of amino acid residues and ligands like small molecules, ions, and water. An FO-FC map has less detail because it only shows where the observed and calculated phases differ. For example, if the observed and calculated data matched perfectly, this map would be completely empty. However, some areas of the observed and calculated phases differ slightly, so this kind of map can be useful for building in areas where the density from the 2FO-FC map alone is hard to interpret. In general, we use the Auto Open MTZ feature to quickly calculate electron density maps with the 2FO-FC map being used the most for model building. <br>

**Step 2.3** - Once the model and maps have been loaded into COOT, hit the `space bar` to go to the first residue on the N-terminus which is a glutamate in position 7. <br>

**Step 2.4** - Move the structure around in COOT by performing the following: 
+ Hit the `space bar` to advance the structure one amino acid residue forward.
+ Hit the `space bar` while holding down `Shift` to move the structure backward one amino acid residue.
+ Hold down `Control` while moving the mouse to allow for translation of the structure in the X and Y planes.
+ Hold down `Control` and the `right click button` while moving the mouse to slab in and out. Slabbing increases and decreases the thickness of the model being viewed. 

*Note: We can see from the image below that this area has clear density (white arrows) with no model built in yet.*

<center><img src='images/submod_3/submod3_image1.png'
     align='middle'
     width='600'/> </center>
     
<mark>Before proceeding, three particularly useful radio buttons are highlighted in the figure that streamline building and viewing a model. These are the Display Manager (red arrow), Go To Atom (orange arrow), and Go To (Next) Ligand (yellow arrow) - **Seems like this is missing an image**</mark>

### Step 3 - Navigate Display Manager. 
If we click on *Display Manager* we see there are two sections: *Maps* and *Molecules*. In the example shown below, the 2FO-FC map is listed as *Map 1* and has the scroll function enabled. The scroll function allows the user to increase or decrease the map contour level (related to the signal to noise) of the map with a mouse. Clicking on properties shows settings that include the *unit cell* and *symmetry parameters* and allows for other features to be changed such as the *map transparency* and *enabling of skeletonization*. 

<center><img src='images/submod_3/submod3_image2.png'
     align='middle'
     width='600'/> </center>

### Step 4 - Enable Skeletonization. 
Turning on the skeleton feature generates lines in the electron density that connect points in the map. These show up as thin yellow lines (white arrows) in the map in areas of built and unbuilt structure. The skeleton can be useful for building the model in areas where the electron density is poor since it shows the general direction of where residues and ligands need to be placed. 

<center><img src='images/submod_3/submod3_image3.png'
     align='middle'
     width='600'/></center>

### Step 5 - Navigate *Got To Atom* Window.
The Go To Atom window displays the molecule, chain and atom name of every atom in the model. There is an expandable option under each chain that will list every residue and ligand in the structure. This model only has one chain – Chain A – but if multiple chains are present they would be listed here under Chains by Chain ID. There is a dropdown menu next to Molecule that can be used to select the appropriate structure if more than one has been loaded into COOT.

<center><img src='images/submod_3/submod3_image4.png'
     align='middle'
     width='400'/></center>
     
### Step 6 - Navigate *Got To (Next) Ligand*.
This feature does not open another window. It simply orients the structure around the first ligand bound to the structure. Clicking on it repeatedly will toggle through all ligands found in the *.pdb* structure file. This particular structure of incomplete lysozyme has no ligands, but if the ligand is a substrate or inhibitor, this feature will orient the structure right at the binding site.

### Step 7 - Building into the electron density.
**Step 7.1** - To begin building into the electron density click: <br>
`Calculate -> Model/Fit/Refine` 
- Alternatively you can just press F5. 
- The window that appears has several features that are used to build the structure. 

**Step 7.2** - We can see that the structure has clear density for amino acid residues on the N-terminus. To build them in we choose: <br> 
    `Add Terminal Residue` <br>

**Step 7.3** - Next click on the current `N-terminal residue`. <br>
- This automatically adds an alanine to the model, and positions it in the electron density. Alanine is always used to add to the terminus of the model, but if alanine is not the correct residue it will have to be changed to the correct one in the next step.

<center><img src='images/submod_3/submod3_image5.png'
     align='middle'
     width='400'/></center>
     
### Step 8 - Adding and validating amino acid residues.
The β-carbon of the newly added alanine is pointing into another lobe of electron density larger than the -CH3 sidechain. Therefore, we can see this residue isn’t alanine, so it will have to be changed. It is a good idea to keep the sequence of the protein handy to check what the correct amino acid is while building. The FASTA sequence for this protein is:

<center><img src='images/submod_3/submod3_fasta.png'
     align='middle'
     width='600'/></center>

**Step 9.1** - From this sequence we can see that the correct residue is a cysteine. Click the following button to make another window appear with all the amino acids: <br>
`Mutate & Auto Fit` <br>

**Step 9.2** - Choose CYS (C) and `click on the N-terminal Glu residue (often it requires clicking twice)`, and the cysteine residue will automatically be added. Since this map is very clear, the residue is added to the density with the correct orientation. However, in areas where the density is poor, adding a terminal residue may not position it correctly. We will see examples of this later, and when that happens the residue must be manually moved to the correct atomic positions using the electron density map as a guide. <br>

**Step 9.3** - After adding the cysteine residue, it appears that there is still some electron density not accounted for next to the sulfur atom. This is because this structure has a cystine crosslink, and the other cysteine residue has not been built into the structure yet. Once the other Cys residue has been added both Cys residues will automatically be crosslinked.<br>

**Step 9.4** - <u>Follow this procedure for every remaining residue on the N-terminus until the map is no longer interpretable</u>. Sometimes it is possible to build the structure to the first residue, but often the N-terminus is disordered, so many of these residues must be left out of the final structure. Occasionally, if the protein has a tag on the N-terminus such as a His-tag or SUMOylation, there will be residues past the first that can be built as well. The residue numbers will continue to decrease by one as additional N-terminal residues are built. This can result in a zero and negatively numbered residues for the purpose of keeping track of which residues are part of the protein sequence and which are part of the tag. Build as much of the structure as possible since all of it is part of the final structure that will improve refinement statistics and maps. 

<center><img src='images/submod_3/submod3_image6.png'
     align='middle'
     width='400'/></center>
     
**Step 9.5** - Everything should run smoothly until residue 3 is added. If the phenyl ring of Phe does not get placed properly we will have to fix it. This is done by using the Real Space Refine Zone function. Click on: <br> 
`Real Space Refine Zone` in the `Model/Fit/Refine` window.<br>

**Step 9.6** - `Double click` on any atom in the *Phe* residue. It will automatically reposition the atoms of the *Phe* residue in the density. <br>

**Step 9.7** - If additional changes need to be made, the atoms can be moved to the correct place in the map by `clicking and dragging` the appropriate atom to its correct position.<br> 

**Step 9.8** - The `Real Space Refine Zone` feature can also be used for larger areas of the model that have to be adjusted. This is done by `clicking` on an atom on the *first* and *last* residues in the zone that needs adjustment. This will refine the entire zone between and including those two residues. Often this is necessary to adjust the backbone atoms since adjacent residues can affect its φ and ψ angles. 

<center><img src='images/submod_3/submod3_image7.png'
     align='middle'
     width='200'/></center>
     
**Step 9.9** - The final residue is a *Lys*. All but the amine group is oriented correctly. Fixing this can be done using the *Edit Chi Angles* feature in the *Model/Fit/Refine window*. Click on: <br>
`Edit Chi Angles`, then click on the `Lys` residue. <br>

**Step 9.10** - An *Edit Chi Angles window* pops up with each sigma bond indicated that can be rotated. `Clicking` on each box will highlight the bond to be rotated in the structure. For this example <u>chi4 must be selected</u> to move the amine group into the density. <br>

**Step 9.11** - Clicking in the window and moving the mouse will rotate about the selected bond. Once the orientation is correct click: <br>
`Apply` to finalize the change. 
- Sometimes it is necessary to look at the new structure from a different angle to be sure the rotation was done correctly. 
- Additional adjustments are done the same way.

<center><img src='images/submod_3/submod3_image8.png'
     align='middle'
     width='400'/></center>

---------
## Activity 2 - Model Refinement Using Phenix
Once all the residues have been added to the structure we can refine it using Phenix. 

### Before Starting
Go through the remaining molecule and build in any missing amino acid residues using the electron density map and FASTA sequence as guides. There is one large section missing, so use the add terminal residue feature to fill in the gap. Some residues will not be added with the correct orientation, so use the Real Space Refine Zone function to correct the atomic positions. <br>

#### Additional Question
What is the secondary structure of the newly built region? See if you can tell before adding any residues. <mark> what is the answer? Flashcard?</mark>

### Step 1. Save the coordinates.
First, save the coordinates. Coot will automatically give a different name to each new *.pdb* file so that no structure is overwritten. The output should read `Lysozyme_incomplete-coot-0.pdb`. 

### Step 2. Begin model refinement.
**Step 1.1** - Open Phenix and `select phenix.refine` under `Refinement`. This will open a window where files are added and parameters for refinement can be changed. <br>
**Step 1.2** - Enter the *.mtz* file that was used in COOT to make electron density maps as well as the `Lysozyme_incomplete-coot-0.pdb` file that has the new residues built in. <br>

<center><img src='images/submod_3/submod3_image9.png'
     align='middle'
     width='600'/> </center>

### Step 3. Select refinement settings.
As before, Phenix will automatically recognize each file and fill in the relevant information from each. Next, go to the `Refinement settings` tab and toggle on the `Update waters` option. This will add water molecules to the model and will improve the refinement. There are many other options to consider like using noncrystallographic symmetry (NCS works when there are more than one copy of a protein in the asymmetric unit. Here we only have one, so we cannot use it), and employing simulated annealing, as well as parameters in the water picking process that can be changed. 

<center><img src='images/submod_3/submod3_image10.png'
     align='middle'
     width='600'/> </center>

<center><img src='images/submod_3/submod3_image11.png'
     align='middle'
     width='600'/> </center>

### Step 3. Run refinement.
For now, we will leave everything in the default settings except for updating the waters. Once this is done click on `Run` and `Run now` for refinement to begin. Two windows will appear. The first is a graph tracking the refinement statistics, as shown below:

<center><img src='images/submod_3/submod3_image12.png'
     align='middle'
     width='600'/> </center>

The second window is COOT with the structure and electron density maps generated during the refinement process, as shown below:
<center><img src='images/submod_3/submod3_image13.png'
     align='middle'
     width='600'/> </center>
     
From the refinement window (shown below) we can see that the R-work and R-free values are initially *38.87* and *36.10*, respectively. These numbers will drop as the refinement process continues.
<center><img src='images/submod_3/submod3_image14.png'
     align='middle'
     width='600'/> </center>

### Step 4. Visualize refinement output.
Once the process has completed, we can see the refinement statistics have improved, and Phenix has written out refined .pdb and .mtz files. Clicking on `Open in Coot` (see image above) will display the refined structure and maps that can be used to make improvements to the model. The refinement process is continued until additional changes to the structure cannot be made, and there is no subsequent improvement in refinement statistics. 

Validation of the structure is performed as part of `phenix.refine` as well. The results can be seen under the `MolProbity`, `Real-space correlation` and `Atomic properties` tabs (as shown in the image above). 
- Molprobity checks for local errors in the structure and identifies residues that have poor geometry and steric clashes. 
- Atomic properties checks B-factors and occupancy, and real-space correlation performs bulk-solvent correction and scaling to make a 2FO-FC map and compares it to a map from the model alone. 
- All outputs from the validation processes should be inspected before deposition to the PDB, which has its own set of validation parameters that must be satisfied.

-------------
## 📖 Submodule 3 - Test Your Knowledge
Over time it gets easier to recognize amino acid residues by looking at the density alone. Identify which residue corresponds to each map density. Each example has the side chain removed, so look at the map density to determine the correct amino acid side chain. Some residues share similar structures (e.g. Asp and Asn, or Glu and Gln), so it can be difficult to get the correct answer. 


In [19]:
#import library to display quiz questions
from IPython.display import IFrame

#### Question 1
<center><u><b>Q1 Image</b></u>: Lactate dehydrogenase (PDBID: 8AB3); Resolution: 2.62 Å</center>

<center><img src='images/submod_3/quiz/q1.png'
     align='middle'
     width='400'/></center>

In [20]:
#Display Q1
IFrame('quiz_files/submod3/quiz3_1.html', width=600, height=400)

#### Question 2
<center><u><b>Q2 Image</b></u>: Haemoglobin (PDBID: 3MJU); Resolution: 3.50 Å</center>
<center><img src='images/submod_3/quiz/q2.png'
     align='middle'
     width='400'/><center>

In [14]:
#Display Q2
IFrame('quiz_files/submod3/quiz3_2.html', width=600, height=400)

#### Question 3
<center><u><b>Q3 Image</b></u>: Isocitrate lyase (PDBID: 7EBC); Resolution: 3.50 Å</center>
<center><img src='images/submod_3/quiz/q3.png'
     align='middle'
     width='400'/></center>

In [15]:
#Display Q3
IFrame('quiz_files/submod3/quiz3_3.html', width=600, height=400)

#### Question 4
<center><u><b>Q4 Image</b></u>: Rhodopsin (PDBID: 7Q36); Resolution: 2.60 Å</center>
<center><img src='images/submod_3/quiz/q4.png'
     align='middle'
     width='400'/></center>

In [16]:
#Display Q4
IFrame('quiz_files/submod3/quiz3_4.html', width=600, height=400)

#### Question 5
<center><u><b>Q5 Image</b></u>: Ras (PDBID: 7O83); Resolution: 2.38 Å</center>
<center><img src='images/submod_3/quiz/q5.png'
     align='middle'
     width='400'/></center>

In [17]:
#Display Q5
IFrame('quiz_files/submod3/quiz3_5.html', width=600, height=400)

----------
## Final Activity:
Some of the residues from each structure above have been removed, resulting in incomplete proteins. Complete the final activity by doing the following:
1. Use the appropriate FASTA sequence and structure factor files to make *maps*. 
2. Go through each model and build in the missing density.
3. Conduct a round of refinement in Phenix.
    + *Maps* are automatically calculated by going to: <br>
    `File -> Open MTZ, mmCIF, fcf, or phs or mmCIF data`.

### Necessary Files:
#### *.pdb* Files:
`8AB3_in.pdb`<br>
`3MJU_in.pdb`<br>
`7EBC_in.pdb`<br>
`7Q36_in.pdb`<br>
`7O83_in.pdb`<br>
#### *sf.cif* Files:
`8AB3-sf.cif`<br>
`3MJU-sf.cif`<br>
`7EBC-sf.cif`<br>
`7Q36-sf.cif`<br>
`7O83-sf.cif`<br>
#### *.fasta* Files: 
`8AB3.fasta`<br>
`3MJU.fasta`<br>
`7EBC.fasta`<br>
`7Q36.fasta`<br>
`7O83.fasta`<br>


----------
## Final Activity:
Some of the residues from each structure above have been removed, resulting in incomplete proteins. Complete the final activity by doing the following:
1. Use the appropriate FASTA sequence and structure factor files to make *maps*. 
2. Go through each model and build in the missing density.
3. Conduct a round of refinement in Phenix.
    + *Maps* are automatically calculated by going to: <br>
    `File -> Open MTZ, mmCIF, fcf, or phs or mmCIF data`.

### Necessary Files:
#### *8AB3* Files:
`8AB3_in.pdb`<br>
`8AB3-sf.cif`<br>
`8AB3.fasta`<br>

#### *3MJU* Files:
`3MJU_in.pdb`<br>
`3MJU-sf.cif`<br>
`3MJU.fasta`<br>

#### *7EBC* Files: 
`7EBC_in.pdb`<br>
`7EBC-sf.cif`<br>
`7EBC.fasta`<br>


#### *7Q36* Files: 
`7Q36_in.pdb`<br>
`7Q36-sf.cif`<br>
`7Q36.fasta`<br>


#### *7O83* Files: 
`7O83_in.pdb`<br>
`7O83-sf.cif`<br>
`7O83.fasta`<br>



In [None]:
# Command to get activity files from S3


---------
## Conclusion
Through this submodule, you have gained critical skills in utilizing the Crystallographic Object-Oriented Toolkit (COOT), an essential tool for preparing a final model suitable for submission to the Protein Data Bank (PDB). You have learned to interpret electron density maps, refine structures, and appreciate the iterative nature of this approach. These competencies not only enhance your ability to contribute to high-quality structural biology research and equip you with the foundational tools and knowledhe to conduct your own future exploration and discovery. <mark>REVIEW</mark>

## Clean Up
<div class="alert alert-block alert-warning"> <b>Attention:</b> Remember to shutdown VM and delete any relevant resources</a>. </div>