### Structural Alignment

#### Structural Alignment Overview
Structural alignment is a computational method used to compare and align the three-dimensional structures of proteins. In this process, proteins are superimposed in such a way that their similar structural regions overlap as closely as possible. The main goal is to identify regions of structural similarity, which can reveal evolutionary relationships, functional similarities, or differences between the proteins.

#### Understanding Root-Mean-Square Deviation (RMSD)
RMSD (Root-Mean-Square Deviation) is a common metric used to measure the structural similarity between two protein structures after alignment. It quantifies the average distance between corresponding atoms (usually the backbone or alpha carbon atoms) of the aligned proteins. A lower RMSD value indicates a better alignment and higher structural similarity, while a higher RMSD suggests more significant structural differences.

Mathematically, RMSD is calculated using the following formula:

$
\text{RMSD} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - y_i)^2}
$

where $N$ is the number of atoms compared, and $x_i$ and $y_i$ are the positions of corresponding atoms in the two protein structures.

#### Explanation of the Code
The provided code is used to install PyMOL in Google Colab and perform structural alignment between two acetylcholinesterase proteins: human acetylcholinesterase (hsAChE) and Torpedo californica acetylcholinesterase (tcAChE). Below is a step-by-step explanation of the code:

1. **Installing PyMOL in Colab:**
   The first part of the code installs PyMOL, an open-source molecular visualization tool, in a Google Colab environment using `mamba`, a faster alternative to `conda`:

   ```python
   !pip install -q condacolab
   import condacolab
   condacolab.install()
   ```

   After installing `condacolab`, the script uses `mamba` to install the PyMOL software:

   ```python
   %shell mamba install pymol-open-source --yes
   ```

2. **Loading Protein Structures:**
   The next part of the code loads the protein structures into PyMOL from files stored in Google Drive:

   ```python
   cmd.load("/content/drive/MyDrive/BioqFarma_24-25/hsAChE_apo.pdb", "hsAChE")
   cmd.load("/content/drive/MyDrive/BioqFarma_24-25/tcAChE_apo.pdb", "tcAChE")
   ```

   Here, `cmd.load()` is a PyMOL command that loads the PDB files of the proteins into PyMOL, assigning them the names "hsAChE" and "tcAChE".

  We may have to mount Drive to get the PDB files available for Colab:

  ```python
  from google.colab import drive
  drive.mount('/content/drive')
  ```

3. **Performing the Alignment:**
   The alignment is performed using PyMOL’s `align()` function, which aligns the second protein (`tcAChE`) to the first (`hsAChE`):

   ```python
   alignment_rmsd = cmd.align("tcAChE", "hsAChE")
   ```

   This function automatically identifies the best matching regions between the two proteins and superimposes them. The result, `alignment_rmsd`, is a tuple containing the RMSD value and the number of aligned atoms.

4. **Displaying the RMSD:**
   Finally, the code prints the RMSD value of the alignment:

   ```python
   print(f"RMSD between the aligned structures: {alignment_rmsd[0]:.3f} Å")
   ```

   This output gives you a quantitative measure of how well the two protein structures align.

#### Alternative Structural Bioinformatics Libraries and Their Limitations
While PyMOL is a popular tool for structural alignment due to its flexibility and ease of use, other libraries such as ProDy, MDAnalysis, and BioPython are also commonly used in structural bioinformatics:

1. **ProDy:**
   ProDy is a Python package designed for protein dynamics analysis. It offers tools for structure comparison, but it requires that the proteins have the same number of residues for direct structural alignment, which limits its use in cases where the proteins differ in sequence length.

2. **MDAnalysis:**
   MDAnalysis is used primarily for analyzing molecular dynamics (MD) simulations. It can perform structural alignments but, like ProDy, it typically requires matching residues, making it less flexible for aligning proteins with varying lengths.

3. **BioPython:**
   BioPython provides tools for computational biology, including structure alignment. However, its alignment functions also rely on sequence-based matching, making it difficult to handle proteins with different numbers of residues.

These libraries are powerful for various bioinformatics tasks, but their requirement for sequence correspondence limits their utility in structural alignment when proteins differ significantly in length or sequence.

PyMOL stands out in its ability to align proteins without needing identical residue counts, making it an invaluable tool for comparing proteins with significant structural or sequence differences. This capability, combined with an easy-to-use interface and robust visualization options, makes PyMOL a preferred choice for structural alignment tasks in biochemistry and structural biology.

#### Setup. Install PyMOL

In [None]:
!pip install biopython
!pip install py3Dmol

Collecting biopython
  Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: biopython
Successfully installed biopython-1.84
Collecting py3Dmol
  Downloading py3Dmol-2.4.0-py2.py3-none-any.whl.metadata (1.9 kB)
Downloading py3Dmol-2.4.0-py2.py3-none-any.whl (7.0 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.4.0


This code snippet installs Anaconda and then installs Open Source PyMOL.

In [None]:
from IPython.utils import io
import tqdm.notebook
import os
"""The PyMOL installation is done inside two nested context managers. This approach
was inspired by Dr. Christopher Schlick's (of the Phenix group at
Lawrence Berkeley National Laboratory) method for installing cctbx
in a Colab Notebook. He presented his work on September 1, 2021 at the IUCr
Crystallographic Computing School. I adapted Chris's approach here. It replaces my first approach
that requires seven steps. My approach was presentated at the SciPy2021 conference
in July 2021 and published in the
[proceedings](http://conference.scipy.org/proceedings/scipy2021/blaine_mooers.html).
The new approach is easier for beginners to use. The old approach is easier to debug
and could be used as a back-up approach.

Thank you to Professor David Oppenheimer of the University of Florida for suggesting the use mamba and of Open Source PyMOL.
"""
total = 100
with tqdm.notebook.tqdm(total=total) as pbar:
    with io.capture_output() as captured:

        !pip install -q condacolab
        import condacolab
        condacolab.install()
        pbar.update(10)

        import sys
        sys.path.append('/usr/local/lib/python3.10/site-packages/')
        pbar.update(20)

        # Install PyMOL
        %shell mamba install pymol-open-source --yes

        pbar.update(100)


  0%|          | 0/100 [00:00<?, ?it/s]

We mount Google Drive so Colab can access the files.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

#### Perform the structural alignment and measure the RMSD.

In [1]:
from pymol import cmd

# Load the structures into PyMOL
cmd.load("/content/drive/MyDrive/BioqFarma_24-25/hsAChE_apo.pdb", "hsAChE")
cmd.load("/content/drive/MyDrive/BioqFarma_24-25/tcAChE_apo.pdb", "tcAChE")


# Perform the alignment using PyMOL's align command
# This command aligns tcAChE to hsAChE using the best-matching regions automatically
alignment_rmsd = cmd.align("tcAChE", "hsAChE")

# Print the RMSD of the alignment
print(f"RMSD between the aligned structures: {alignment_rmsd[0]:.5f} Å")

# Save the aligned coordinates of the mobile structure (tcAChE) to a new PDB file
cmd.save("/content/drive/MyDrive/BioqFarma_24-25/tcAChE_aligned.pdb", "tcAChE")

# Delete all objects from the PyMOL session (optional, to clean up)
cmd.delete("all")

ModuleNotFoundError: No module named 'pymol'

### Visualise the alignment

We can use the Py3Dmol library to visualise both aligned structures.

In [None]:
import py3Dmol

# Read the fixed and aligned PDB files
with open("/content/drive/MyDrive/BioqFarma_24-25/hsAChE_apo.pdb", 'r') as file:
    fixed_pdb = file.read()
with open("/content/drive/MyDrive/BioqFarma_24-25/tcAChE_aligned.pdb", 'r') as file:
    mobile_pdb = file.read()

# Initialize the viewer
viewer = py3Dmol.view(width=800, height=600)

# Add the fixed structure (hsAChE) with a distinct color
viewer.addModel(fixed_pdb, 'pdb')
viewer.setStyle({'model': 0}, {'cartoon': {'color': 'blue'}})

# Add the aligned mobile structure (tcAChE) with another color
viewer.addModel(mobile_pdb, 'pdb')
viewer.setStyle({'model': 1}, {'cartoon': {'color': 'red'}})

# Zoom to fit both models
viewer.zoomTo()

# Display the viewer
viewer.show()