#  Workshop: Multiple Sequence Alignment and Algorithms

## Introduction to Multiple Sequence Alignment (MSA)

**Multiple Sequence Alignment** is a fundamental technique in bioinformatics used to align three or more biological sequences — typically proteins or nucleic acids — to identify regions of similarity.  

These similarities may indicate **structural, functional, or evolutionary relationships** among the sequences. MSA is widely used for:  
- Detecting conserved residues that are critical for protein function  
- Inferring phylogenetic relationships between species  
- Identifying functional motifs and domains  
- Generating **consensus sequences** to represent the most common amino acids or nucleotides at each position  

**Key concepts in MSA:**  
- **Conserved Regions:** Sequence positions that remain unchanged across multiple species or variants, often indicating functional or structural importance  
- **Consensus Sequence:** A representative sequence constructed from the most frequent residue at each column of the alignment  
- **Alignment Score (e.g., SP Score):** A numerical measure of alignment quality, typically calculated by summing pairwise comparisons across columns  

MSA forms the backbone of many downstream analyses in bioinformatics, from evolutionary studies to protein structure prediction.

# Clustal Omega:MSA Algorithm

**Clustal Omega** is one of the most widely used tools for multiple sequence alignment (MSA) of protein or nucleotide sequences.  
It is the latest version in the Clustal series (after ClustalW and ClustalX) and uses advanced techniques to improve accuracy and scalability.

---

## Algorithm Overview

**Strategy:** Progressive alignment using Hidden Markov Model (HMM) profiles  

1. **Pairwise Distance Calculation:**  
   - Clustal Omega first calculates the similarity or distance between every pair of sequences.  
   - This is usually done using fast k-mer counting or pairwise alignment methods.
2. **Guide Tree Construction:**  
   - A guide tree is created using the pairwise distances (typically with Neighbor-Joining).  
   - The tree represents the approximate evolutionary relationships among sequences.
3. **Progressive Alignment:**  
   - Sequences are aligned progressively according to the guide tree, starting from the closest sequences.  
   - Profile-profile alignment is performed using HMMs to improve accuracy.

---

## Advantages of Clustal Omega

- **Fast and Scalable:** Can handle hundreds to thousands of sequences efficiently.  
- **Accurate for Closely Related Sequences:** Produces reliable alignments for sequences with moderate similarity.  
- **HMM-based Profile Alignment:** Improves accuracy for large and diverse datasets.  
- **Standard Format Outputs:** Provides `.aln` (alignment) and `.dnd` (guide tree) files that are widely compatible.

---

## Limitations

- **SP Score Not Provided:** Clustal Omega does not give a direct numerical measure of alignment quality to the user.  
- **Less Accurate for Very Divergent Sequences:** For highly divergent sequences, alternative algorithms like T-Coffee may perform better.  
- **Refinement Optional:** While iterative refinement is possible, it may not always be applied automatically.  

---
# Sequence Groups for MSA Workshop

In this workshop, we will use several groups of sequences to demonstrate Multiple Sequence Alignment (MSA) with Clustal Omega.  
The groups are organized by similarity levels.

---

## 1️⃣ Highly Similar Sequences

These sequences are almost identical, showing ~100% similarity.  
They are variants of Human Hemoglobin.

>Human_Hemoglobin_Alpha

MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

>Gorilla_Hemoglobin_Alpha

MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

>Orangutan_Hemoglobin_Alpha

MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

## 2️⃣ Moderately Similar Sequences

These sequences show moderate similarity (~70–90%), for example Human, Mouse, and Chicken Hemoglobin.  
They are moderately divergent and ideal for observing how MSA handles sequences with partial similarity.

>Human_Hemoglobin
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

>Mouse_Hemoglobin
MVHLTDAEKAAVNGLWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>Chicken_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH


---

## 3️⃣ Divergent / Dissimilar Sequences 

These sequences are very different from each other, representing highly divergent proteins.  
They are ideal for testing MSA performance on sequences with minimal similarity.

>Human_COX1
MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVVGPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA

>Human_Insulin
MALWMRLLPLLALLALWGPDPAAAQKLSGAQGTVLQKDSGTLEDQTLELEALTKLQS

>Human_Myoglobin
MGKVKVGVNGFGRIGRLVTRAAFNSGKVDIVLDSGDGVTHVV

---

## 4️⃣ Special Divergent Group: T-Rex, Chicken, Human

These sequences come from highly divergent species, including an extinct species (T-Rex), a bird (Chicken), and Human.  
They are used to demonstrate how MSA handles sequences with **extreme evolutionary divergence**.

>Chicken_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>T_Rex_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>Human_Hemoglobin
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH


In [1]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Human_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>Gorilla_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>Orangutan_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
'''
url = "https://www.ebi.ac.uk/Tools/services/rest/clustalo/run/"
params = {
    "email": "baran.6278m@gmail.com",
    "sequence": fasta_data,
    "guidetreeout": "true",
    "outfmt": "clustal_num"
}
response = requests.post(url, data=params)

if response.status_code != 200:
    raise Exception("Error submitting job: " + response.text)

job_id = response.text.strip()
print(f"Job submitted successfully! Job ID: {job_id}")


status_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print(f"Status: {status}")
    if status == "FINISHED":
        break
    elif status == "ERROR":
        raise Exception("Job failed.")
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/aln-clustal_num"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick Format) ===\n")
print(tree_text)


Job submitted successfully! Job ID: clustalo-R20251104-144044-0784-3812700-p1m
Status: RUNNING
Status: FINISHED

=== Alignment Result ===

CLUSTAL O(1.2.4) multiple sequence alignment


Human_Hemoglobin_Alpha          MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD	60
Gorilla_Hemoglobin_Alpha        MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD	60
Orangutan_Hemoglobin_Alpha      MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD	60
                                ************************************************************

Human_Hemoglobin_Alpha          ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS	120
Gorilla_Hemoglobin_Alpha        ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS	120
Orangutan_Hemoglobin_Alpha      ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS	120
                                ************************************************************

Human_Hemoglobin_Alpha          LDKFLASVSTVLTSK

## Understanding the Clustal Omega Output
When Clustal Omega finishes the multiple sequence alignment,it produces a result in the Clustal format (.aln).
This output is divided into several blocks, each containing about 60 aligned residues (or nucleotides).
Each block shows the same set of sequences, progressing step by step through the alignment.

## Each block includes:

The sequence name (e.g., Human_Hemoglobin) on the left

The aligned residues in the middle (including gaps “–”)

The position number on the right, indicating the last residue in that block

After the first 60 residues, the next block continues (61–120, 121–180, etc.), making it easier to read long sequences.

Below each block, Clustal Omega adds a consensus line, which shows how similar each position is across all sequences.

## Symbol	Meaning
.*	All residues in that column are identical (fully conserved) 

:	Strongly similar residues (chemically similar)

.	Weakly similar residues

(space)  	No similarity at that position 

## Understanding the Guide Tree in Clustal Omega

In Clustal Omega, when a guide tree is generated, each branch may have a decimal number displayed next to it.
This number represents the branch length, which indicates how similar or different the sequences are.
## What Branch Length Means

Branch length is proportional to the evolutionary distance between sequences.

.A short branch → sequences are very similar

.A long branch → sequences are more divergent or different

These numbers are calculated using pairwise sequence distances:

1.Each pair of sequences is compared.

2.The differences or “distances” are computed.

3.These distances are used to build the progressive alignment guide tree.

In [None]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Human_Hemoglobin
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>Mouse_Hemoglobin
MVHLTDAEKAAVNGLWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
>Chicken_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
'''

url = "https://www.ebi.ac.uk/Tools/services/rest/clustalo/run/"
params = {
    "email": "baran.6278m@gmail.com",  
    "sequence": fasta_data,
    "guidetreeout": "true",
    "outfmt": "clustal_num"
}
response = requests.post(url, data=params)

if response.status_code != 200:
    raise Exception("Error submitting job: " + response.text)

job_id = response.text.strip()
print(f"Job submitted successfully! Job ID: {job_id}")

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print(f"Status: {status}")
    if status == "FINISHED":
        break
    elif status == "ERROR":
        raise Exception("Job failed.")
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/aln-clustal_num"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick Format) ===\n")
print(tree_text)


Job submitted successfully! Job ID: clustalo-R20251029-155135-0477-32146398-p1m
Status: RUNNING
Status: FINISHED

=== Alignment Result ===

CLUSTAL O(1.2.4) multiple sequence alignment


Human_Hemoglobin        MVHLTPEEK-SAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNP	59
Mouse_Hemoglobin        MVHLTDAEK-AAVNGLWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP	59
Chicken_Hemoglobin      MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNP	60
                        ***:*  ** :*.   *.****:*******************:*:******: .*:****

Human_Hemoglobin        KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF	119
Mouse_Hemoglobin        KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF	119
Chicken_Hemoglobin      KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF	120
                        ************************************************************

Human_Hemoglobin        GKEFTPPVQAAYQKVVAGVANALAHKYH	147
Mouse_Hemoglobin        GKEFTPQVQAAYQKVVAGVANALAHKYH	

In [2]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Human_COX1
MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVVGPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA

>Human_Insulin
MALWMRLLPLLALLALWGPDPAAAQKLSGAQGTVLQKDSGTLEDQTLELEALTKLQS

>Human_Myoglobin
MGKVKVGVNGFGRIGRLVTRAAFNSGKVDIVLDSGDGVTHVV'''

url = "https://www.ebi.ac.uk/Tools/services/rest/clustalo/run/"
params = {
    "email": "baran.6278m@gmail.com", 
    "sequence": fasta_data,
    "guidetreeout": "true",
    "outfmt": "clustal_num"
}
response = requests.post(url, data=params)

if response.status_code != 200:
    raise Exception("Error submitting job: " + response.text)

job_id = response.text.strip()
print(f"Job submitted successfully! Job ID: {job_id}")

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print(f"Status: {status}")
    if status == "FINISHED":
        break
    elif status == "ERROR":
        raise Exception("Job failed.")
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/aln-clustal_num"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick Format) ===\n")
print(tree_text)


Job submitted successfully! Job ID: clustalo-R20251104-171604-0526-6627457-p1m
Status: FINISHED

=== Alignment Result ===

CLUSTAL O(1.2.4) multiple sequence alignment


Human_COX1           ----------MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVV	50
Human_Insulin        MALWMRLLPLLALLALWGPDPAAA--------QKLSGAQGTVLQKDSGTLEDQTLELEAL	52
Human_Myoglobin      ------------------------MGKVKVGVNGFGRI-G----RLVTRAAFNSGKVDIV	31
                                                        .   *    :        : .:: :

Human_COX1           GPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA	90
Human_Insulin        TKLQS-----------------------------------	57
Human_Myoglobin      --------------LDSGDGVTHVV---------------	42
                                                             


=== Guide Tree (Newick Format) ===

(
Human_Insulin:0.414474
,
(
Human_COX1:0.392857
,
Human_Myoglobin:0.392857
):0.0216165
)
;



In [None]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Chicken_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>T_Rex_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>Human_Hemoglobin
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
'''

url = "https://www.ebi.ac.uk/Tools/services/rest/clustalo/run/"
params = {
    "email": "baran.6278m@email.com", 
    "sequence": fasta_data,
    "guidetreeout": "true",
    "outfmt": "clustal_num"
}
response = requests.post(url, data=params)

if response.status_code != 200:
    raise Exception("Error submitting job: " + response.text)

job_id = response.text.strip()
print(f"Job submitted successfully! Job ID: {job_id}")

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print(f"Status: {status}")
    if status == "FINISHED":
        break
    elif status == "ERROR":
        raise Exception("Job failed.")
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/aln-clustal_num"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/clustalo/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick Format) ===\n")
print(tree_text)


Job submitted successfully! Job ID: clustalo-R20251101-111807-0521-82019710-p1m
Status: RUNNING
Status: FINISHED

=== Alignment Result ===

CLUSTAL O(1.2.4) multiple sequence alignment


Chicken_Hemoglobin      MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNP	60
T_Rex_Hemoglobin        MVHFTAEEKASAAVTSWAKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNP	60
Human_Hemoglobin        MVHLTPEEK-SAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNP	59
                        ***:* *** **..: *.****:*******************:*****************

Chicken_Hemoglobin      KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF	120
T_Rex_Hemoglobin        KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF	120
Human_Hemoglobin        KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF	119
                        ************************************************************

Chicken_Hemoglobin      GKEFTPQVQAAYQKVVAGVANALAHKYH	148
T_Rex_Hemoglobin        GKEFTPQVQAAYQKVVAGVANALAHKYH	

##  T-Coffee Algorithm (Tree-based Consistency Objective Function For Alignment Evaluation)

**T-Coffee** is a multiple sequence alignment algorithm designed to improve the accuracy of progressive alignment methods (like Clustal Omega) by introducing a *consistency-based scoring system*.

While Clustal Omega and similar algorithms rely purely on pairwise distances and guide trees, T-Coffee adds an intermediate step that increases robustness and accuracy.

---

### Core Idea

T-Coffee combines information from multiple pairwise alignments into a single **“library”** of pairwise residue matches.  
This library is then used to build the final multiple alignment in a way that maximizes *consistency* among all possible alignments.

---

### Main Steps of the Algorithm

1. **Pairwise Pre-alignment (Library Construction):**  
   - All sequences are aligned pairwise using both *global* (Needleman–Wunsch) and *local* (Smith–Waterman) alignments.  
   - The resulting alignments are stored in a **primary library**, where each pair of aligned residues (A<sub>i</sub>, B<sub>j</sub>) gets a weight (or “consistency score”).

2. **Library Extension (Consistency Stage):**  
   - The algorithm checks for *transitive consistency*:  
     If A aligns to B, and B aligns to C, then A should align to C.  
   - This process strengthens reliable relationships and reduces random mismatches.

3. **Progressive Alignment:**  
   - Using a guide tree (like in Clustal), T-Coffee aligns sequences progressively — but instead of using substitution matrices directly, it uses the *consistency-weighted library* as the scoring system.

4. **Final Refinement:**  
   - The resulting alignment may be optionally refined by iterative improvement, ensuring that the alignment remains as consistent as possible with the original library.

---

### Limitations

**Slower than Clustal Omega:**  
   Due to the extra library construction and consistency computation steps.  

**Not ideal for very large datasets:**  
   Memory and CPU requirements increase significantly with many sequences.  

---


In [None]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = """>Human_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>Gorilla_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>Orangutan_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
"""


url = "https://www.ebi.ac.uk/Tools/services/rest/tcoffee/run"
params = {
    "email": "baran.6278m@gmail.com", 
    "sequence": fasta_data,
    "outfmt": "clustalw" 
}
response = requests.post(url, data=params)
if response.status_code != 200:
    raise Exception("Error submitting T-Coffee job: " + response.text)

job_id = response.text.strip()
print("Job submitted. Job ID:", job_id)

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print("Status:", status)
    if status == "FINISHED":
        break
    elif status in ("ERROR", "FAILURE"):
        raise RuntimeError("T-Coffee job failed: " + status)
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-clustalw"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result (ClustalW) ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick) ===\n")
print(tree_text)

html_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-color"
html_text = requests.get(html_url).text
print("\n=== HTML Colored Alignment Snippet ===\n")
print(html_text[:500]) 

aln_handle = StringIO(alignment_text)
alignment = AlignIO.read(aln_handle, "clustal")
print("Number of sequences:", len(alignment))
print("Alignment length:", alignment.get_alignment_length())


Job submitted. Job ID: tcoffee-R20251101-121745-0126-14793502-p1m
Status: FINISHED

=== Alignment Result (ClustalW) ===

CLUSTAL W (1.83) multiple sequence alignment

Human_Hemoglobin_Alpha      MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
Gorilla_Hemoglobin_Alpha    MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
Orangutan_Hemoglobin_Alpha  MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
                            **************************************************

Human_Hemoglobin_Alpha      HGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHC
Gorilla_Hemoglobin_Alpha    HGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHC
Orangutan_Hemoglobin_Alpha  HGSAQVKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHC
                            **************************************************

Human_Hemoglobin_Alpha      LLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Gorilla_Hemoglobin_Alpha    LLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Orangutan_Hemoglobin_Alpha  LLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
 

In [None]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Human_Hemoglobin
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

>Mouse_Hemoglobin
MVHLTDAEKAAVNGLWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>Chicken_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
'''

url = "https://www.ebi.ac.uk/Tools/services/rest/tcoffee/run"
params = {
    "email": "baran.6278m@gmail.com", 
    "sequence": fasta_data,
    "outfmt": "clustalw" 
}
response = requests.post(url, data=params)
if response.status_code != 200:
    raise Exception("Error submitting T-Coffee job: " + response.text)

job_id = response.text.strip()
print("Job submitted. Job ID:", job_id)

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print("Status:", status)
    if status == "FINISHED":
        break
    elif status in ("ERROR", "FAILURE"):
        raise RuntimeError("T-Coffee job failed: " + status)
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-clustalw"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result (ClustalW) ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick) ===\n")
print(tree_text)

html_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-color"
html_text = requests.get(html_url).text

print("\n=== HTML Colored Alignment Snippet ===\n")
print(html_text[:500]) 

aln_handle = StringIO(alignment_text)
alignment = AlignIO.read(aln_handle, "clustal")
print("Number of sequences:", len(alignment))
print("Alignment length:", alignment.get_alignment_length())


Job submitted. Job ID: tcoffee-R20251101-122839-0982-35322104-p1m
Status: RUNNING
Status: FINISHED

=== Alignment Result (ClustalW) ===

CLUSTAL W (1.83) multiple sequence alignment

Human_Hemoglobin    MVHLTPEEKS-AVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDL
Mouse_Hemoglobin    MVHLTDAEKA-AVNGLWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDL
Chicken_Hemoglobin  MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDL
                    ***:*  **: *.   *.****:*******************:*:*****

Human_Hemoglobin    SSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
Mouse_Hemoglobin    STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
Chicken_Hemoglobin  SSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
                    *:..*:********************************************

Human_Hemoglobin    DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Mouse_Hemoglobin    DPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
Chicken_Hemoglobin  DPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
                    ********************

In [None]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Human_COX1
MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVVGPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA

>Human_Insulin
MALWMRLLPLLALLALWGPDPAAAQKLSGAQGTVLQKDSGTLEDQTLELEALTKLQS

>Human_Myoglobin
MGKVKVGVNGFGRIGRLVTRAAFNSGKVDIVLDSGDGVTHVV

'''

url = "https://www.ebi.ac.uk/Tools/services/rest/tcoffee/run"
params = {
    "email": "baran.6278m@gmail.com", 
    "sequence": fasta_data,
    "outfmt": "clustalw" 
}
response = requests.post(url, data=params)
if response.status_code != 200:
    raise Exception("Error submitting T-Coffee job: " + response.text)

job_id = response.text.strip()
print("Job submitted. Job ID:", job_id)

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print("Status:", status)
    if status == "FINISHED":
        break
    elif status in ("ERROR", "FAILURE"):
        raise RuntimeError("T-Coffee job failed: " + status)
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-clustalw"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result (ClustalW) ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick) ===\n")
print(tree_text)

html_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-color"
html_text = requests.get(html_url).text

print("\n=== HTML Colored Alignment Snippet ===\n")
print(html_text[:500]) 

aln_handle = StringIO(alignment_text)
alignment = AlignIO.read(aln_handle, "clustal")
print("Number of sequences:", len(alignment))
print("Alignment length:", alignment.get_alignment_length())


Job submitted. Job ID: tcoffee-R20251101-124623-0700-79181147-p1m
Status: FINISHED

=== Alignment Result (ClustalW) ===

CLUSTAL W (1.83) multiple sequence alignment

Human_COX1       MFVLSSWRVAVVAGLLVL----TAGVAGAGGCVGLTVAKLAGKEVTGSGD
Human_Insulin    MALWM--RLLPLLALLALWGPDPAAAQKLSGAQGTVLQKDSGTLEDQTLE
Human_Myoglobin  MGKVK------------------VGVNGFG-RIGRLVTRA----AFNSGK
                 *                      ...   .   *  : :        : .

Human_COX1       VEVVGPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA
Human_Insulin    LEALT--------------K-------LQ--------------S
Human_Myoglobin  VDIVL--------------DSGDGVTHVV---------------
                 :: :                       :                





=== Guide Tree (Newick) ===

(Human_COX1:0.15500,Human_Insulin:0.14500,Human_Myoglobin:0.14500);

=== HTML Colored Alignment Snippet ===

<?xml version='1.0' encoding='UTF-8'?>
<error>
 <description>The requested model stream renderer 'aln-color' is not available</description>
</error>
Number of sequences:

In [None]:
import time
import requests
from Bio import AlignIO
from io import StringIO

fasta_data = '''>Chicken_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>T_Rex_Hemoglobin
MVHFTAEEKASAAVTSWAKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH

>Human_Hemoglobin
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH


'''

url = "https://www.ebi.ac.uk/Tools/services/rest/tcoffee/run"
params = {
    "email": "baran.6278m@gmail.com", 
    "sequence": fasta_data,
    "outfmt": "clustalw"
}
response = requests.post(url, data=params)
if response.status_code != 200:
    raise Exception("Error submitting T-Coffee job: " + response.text)

job_id = response.text.strip()
print("Job submitted. Job ID:", job_id)

status_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/status/{job_id}"
while True:
    status = requests.get(status_url).text.strip()
    print("Status:", status)
    if status == "FINISHED":
        break
    elif status in ("ERROR", "FAILURE"):
        raise RuntimeError("T-Coffee job failed: " + status)
    time.sleep(5)

result_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-clustalw"
alignment_text = requests.get(result_url).text
print("\n=== Alignment Result (ClustalW) ===\n")
print(alignment_text)

tree_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/tree"
tree_text = requests.get(tree_url).text
print("\n=== Guide Tree (Newick) ===\n")
print(tree_text)

html_url = f"https://www.ebi.ac.uk/Tools/services/rest/tcoffee/result/{job_id}/aln-color"
html_text = requests.get(html_url).text

print("\n=== HTML Colored Alignment Snippet ===\n")
print(html_text[:500]) 

aln_handle = StringIO(alignment_text)
alignment = AlignIO.read(aln_handle, "clustal")
print("Number of sequences:", len(alignment))
print("Alignment length:", alignment.get_alignment_length())


Job submitted. Job ID: tcoffee-R20251101-125119-0761-80681257-p1m
Status: RUNNING
Status: FINISHED

=== Alignment Result (ClustalW) ===

CLUSTAL W (1.83) multiple sequence alignment

Chicken_Hemoglobin  MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDL
T_Rex_Hemoglobin    MVHFTAEEKASAAVTSWAKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDL
Human_Hemoglobin    MVHLTPEEKS-AVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDL
                    ***:*.***: *..: *.****:*******************:*******

Chicken_Hemoglobin  SSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
T_Rex_Hemoglobin    SSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
Human_Hemoglobin    SSASAIMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV
                    **************************************************

Chicken_Hemoglobin  DPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
T_Rex_Hemoglobin    DPENFRLLGNVLVCVLAHHFGKEFTPQVQAAYQKVVAGVANALAHKYH
Human_Hemoglobin    DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
                    ********************

# PRRN — Progressive + Iterative Refinement Algorithm (detailed)

**PRRN** = PRRN is an MSA method that combines a standard progressive alignment backbone with repeated, randomized refinement cycles to improve alignment quality.  
It aims to correct mistakes introduced in the progressive stage by re-aligning parts of the MSA and accepting only changes that increase a global score.

---

### 1. Pairwise alignments
- Compute pairwise alignments for every sequence pair (e.g., Needleman–Wunsch for proteins or Smith–Waterman for local signals).  
- From pairwise scores derive a **Distance Matrix** `D` where `D[i,j]` measures dissimilarity between sequences `i` and `j`.

**Output:** distance matrix `D`.

---

### 2. Guide tree construction
- Build a guide tree (e.g., Neighbor-Joining or UPGMA) from `D`.  
- The guide tree determines the order of progressive merging (which sequences/profiles get merged first).

**Output:** guide tree `T`.

---

### 3. Progressive alignment (initial MSA)
- Follow `T` from leaves to root, performing profile-to-profile (or sequence-to-profile) alignment to produce an **initial MSA**.  
- Use a substitution matrix (e.g., BLOSUM or PAM) and gap penalties for scoring during this phase.

**Output:** initial MSA `M0`.

---

### 4. Weighting 
- Compute weights to reduce bias from over-represented, very similar sequences.
- A typical per-sequence weight formula (example):

$$
w_i = \frac{1}{\sum_{j} D[i, j]}
$$

where `d_{ij}` is the distance between sequences `i` and `j`. Normalize weights so they sum to 1 or another convenient scale.

- You may also compute **per-column reliability** weights (e.g., columns with many gaps get lower column weight).

**Purpose:** make the global scoring function reflect *diverse* signal rather than raw counts of similar sequences.

---

### 5. Scoring (global MSA score)
- Compute a single scalar score `S(M)` for an MSA `M` that measures overall alignment quality using the weights.  
- A commonly used form (sum-of-pairs weighted) is:

$$
S(M) = \sum_{i<j} w_i \, w_j \cdot \text{score}_{ij}(M)
$$

where:
- `w_i, w_j` are sequence weights from step 4,  
- `score_{ij}(M)` is the pairwise alignment score between sequences `i` and `j` **within** the MSA (computed from substitution matrix and gap penalties).

**Purpose:** `S(M)` provides a single objective to compare alternative MSAs. PRRN accepts changes only if `S` increases.

---

### 6. Iterative refinement loop
Repeat the following until convergence or iteration limit:

1. **Select** one sequence or a small subset (randomized selection).  
2. **Remove** the selected sequence(s) from the current MSA, leaving a reduced MSA `M\minus`.  
3. **Realign** the removed sequence(s) back to `M\minus` (sequence → profile alignment or profile → profile). This produces a candidate MSA `M'`.  
4. **Recompute weights** if weights depend on MSA (optional; often weights are recomputed periodically).  
5. **Compute** `S(M')`.  
   - If `S(M') > S(M)`, **accept** the new MSA (`M ← M'`) — change retained.  
   - Else **reject** `M'` and keep previous `M`.  


**Output:** final MSA `M*` and optional derived guide tree.



## [PRRN Result](https://www.genome.jp/tools-bin/prrn)
>input [3:91]  ( 1 - 91 )
%  5.0000000e-01  1.0000000e+00  5.0000000e-01

       1 MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVVGPLSVVGFAG| Human_COX1
       1 M---ALW-----MRLLPL------------LALLALWGPDPAAAQKLSGAQGT-------| Human_Insulin
       1 M------------------------------------GKVKVGVNGFGRIGRL-------| Human_Myoglobin
         M     $       @@ @            @    @ G    .   o             

      61 VDLVGAA-HAEFTGVVVEFGGAYALYDYVAA                             | Human_COX1
      34 --VLQKD-SGTLEDQTLELEALTKL----QS                             | Human_Insulin
      18 --VTRAAFNSGKVDIVLDSGDGVTH----VV                             | Human_Myoglobin

### Understanding the PRRN Output

The PRRN output shown above represents the **multiple sequence alignment** of three highly divergent human proteins — COX1, Insulin, and Myoglobin.  
Because these sequences are evolutionarily and functionally unrelated, the alignment displays very few matching regions.

---

####  Explanation of Key Elements:

- **Sequence Labels (e.g., Human_COX1, Human_Insulin):**  
  These identify each protein sequence being aligned.

- **Numbers on the Left/Right (e.g., 1, 61):**  
  Indicate the residue (amino acid) positions within each sequence.

- **Dashes (-):**  
  Represent **gaps** inserted during alignment to maximize similarity across columns.

- **Symbols Below the Alignment:**
  - `*` → Perfect match (identical amino acids in all sequences)  
  - `:` or `.` → Conservative or semi-conservative substitutions (similar properties)  
  - `@`, `$`, `o` → Weaker or partial similarity (specific to PRRN notation)

- **Absence of `*`:**  
  Indicates that the sequences are **highly divergent** — they share almost no conserved regions.

### output

The PRRN alignment of **Human COX1**, **Human Insulin**, and **Human Myoglobin** shows very little similarity among the sequences.  
There are many **gaps** and very few conserved residues, indicating that these proteins are **structurally and functionally unrelated**.  
Each protein serves a different biological role, which explains the **low alignment consistency** and **scattered matches** in the result.



>input [3:148]  ( 1 - 148 )
%  5.0000000e-01  5.0000000e-01  1.0000000e+00

       1 MVHLTPEEKSAVT-ALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNP| Human_Hemoglobin
       1 MVHLTDAEKAAVN-GLWGKVNVEEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP| Mouse_Hemoglobin
       1 MVHFTAEEKASAAVTSWAKVNVEEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNP| Chicken_Hemoglob
         MVHoT  EK..   . W.KVNV_EVGGEALGRLLVVYPWTQR$F_SFGDLS.. A@MGNP

      60 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF| Human_Hemoglobin
      60 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF| Mouse_Hemoglobin
      61 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCD------------------------| Chicken_Hemoglob
         KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCD+@+@_.__$+@@._@@@.@@.++$

     120 GKEFTPPVQAAYQKVVAGVANALAHKYH                                | Human_Hemoglobin
     120 GKEFTPQVQAAYQKVVAGVANALAHKYH                                | Mouse_Hemoglobin
      97 ----------------------------                                | Chicken_Hemoglob
         .+_$.. @_..$_+@@..@._.@.++$+                                



## output

This PRRN alignment compares **Human**, **Mouse**, and **Chicken Hemoglobin** sequences.  
Human and Mouse show **high similarity**, while Chicken is **more divergent**, with several **gaps** and **missing regions** near the end.  
Overall, the alignment highlights conserved regions common to all three, showing how PRRN manages **moderate evolutionary differences** among species.


>input [3:137]  ( 1 - 137 )
%  5.0000000e-01  5.0000000e-01  1.0000000e+00

       1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD| Human_Hemoglobin
       1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD| Gorilla_Hemoglob
       1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD| Orangutan_Hemogl
         MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVAD

      61 ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS| Human_Hemoglobin
      61 ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS| Gorilla_Hemoglob
      61 ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS| Orangutan_Hemogl
         ALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHAS

     121 LDKFLASVSTVLTSKYR                                           | Human_Hemoglobin
     121 LDKFLASVSTVLTSKYR                                           | Gorilla_Hemoglob
     121 LDKFLASVSTVLTSKYR                                           | Orangutan_Hemogl
         LDKFLASVSTVLTSKYR                                           

## output

This PRRN alignment shows **Human**, **Gorilla**, and **Orangutan Hemoglobin** sequences.  
All three sequences are **almost identical**, with **no gaps or mismatches**, reflecting their **very close evolutionary relationship**.  


##  Comparing Clustal Omega, T-Coffee, and PRRN

>
       1 MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVVGPLSVVGFAG| Human_COX1
       1 M---ALW-----MRLLPL------------LALLALWGPDPAAAQKLSGAQGT-------| Human_Insulin
       1 M------------------------------------GKVKVGVNGFGRIGRL-------| Human_Myoglobin
         M     $       @@ @            @    @ G    .   o             

      61 VDLVGAA-HAEFTGVVVEFGGAYALYDYVAA                             | Human_COX1
      34 --VLQKD-SGTLEDQTLELEALTKL----QS                             | Human_Insulin
      18 --VTRAAFNSGKVDIVLDSGDGVTH----VV                             | Human_Myoglobin

Human_COX1       MFVLSSWRVAVVAGLLVL----TAGVAGAGGCVGLTVAKLAGKEVTGSGD
Human_Insulin    MALWM--RLLPLLALLALWGPDPAAAQKLSGAQGTVLQKDSGTLEDQTLE
Human_Myoglobin  MGKVK------------------VGVNGFG-RIGRLVTRA----AFNSGK
                 *                      ...   .   *  : :        : .

Human_COX1       VEVVGPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA
Human_Insulin    LEALT--------------K-------LQ--------------S
Human_Myoglobin  VDIVL--------------DSGDGVTHVV---------------
                 :: :                       :                

Human_COX1           ----------MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVV	50
Human_Insulin        MALWMRLLPLLALLALWGPDPAAA--------QKLSGAQGTVLQKDSGTLEDQTLELEAL	52
Human_Myoglobin      ------------------------MGKVKVGVNGFGRI-G----RLVTRAAFNSGKVDIV	31
                                                        .   *    :        : .:: :

Human_COX1           GPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA	90
Human_Insulin        TKLQS-----------------------------------	57
Human_Myoglobin      --------------LDSGDGVTHVV---------------	42
                                                             


We applied the same sequences to three different MSA algorithms. Here is a detailed comparison highlighting **methods, performance, and reasoning behind the results**.

---

### Method Used by Each Algorithm

- **Clustal Omega**:  
  Uses a **distance-based progressive alignment**. It first calculates pairwise distances between sequences, builds a guide tree, and aligns sequences step by step according to this tree.

- **T-Coffee**:  
  Uses a **consistency-based progressive alignment**. It generates a library of pairwise alignments (both global and local), then aligns sequences to **maximize consistency** across all pairwise comparisons. This reduces errors that can propagate in purely distance-based progressive methods.

- **PRRN**:  
  Uses a **residue-level probabilistic scoring** approach. It does not build a guide tree; instead, it focuses on **residue conservation and similarity**, marking conserved positions with symbols (`@`, `.`, `$`) to indicate full, partial, or similar conservation.

---

- **Accuracy in Similar Sequences**:  
  - **All three algorithms** handle highly similar sequences well.  
  - **Clustal Omega** is fast and produces straightforward alignments.  
  - **T-Coffee** also aligns them accurately, with little difference from Clustal Omega in this case.  
  - **PRRN** highlights conserved residues clearly, but may not show the full guide tree information.

- **Accuracy in Divergent Sequences**:  
  - **T-Coffee** performs best because its consistency-based method prevents errors that accumulate in progressive alignments.  
  - **Clustal Omega** can misalign divergent sequences because it relies on the initial pairwise distances and guide tree.  
  - **PRRN** shows residue-level differences clearly but does not provide a global progressive alignment, so it is less suitable for constructing a full MSA of divergent sequences.

- **Representation of Conservation**:  
  - **PRRN** excels in showing **residue-level conservation**, using explicit symbols.  
  - **T-Coffee** implicitly preserves conservation through its library-based scoring.  
  - **Clustal Omega** implicitly shows conservation via identical residues, but less obvious in divergent regions.

- **Guide Tree Output**:  
  - **Clustal Omega**: Yes, based on distance.  
  - **T-Coffee**: Optional, can be generated from the library.  
  - **PRRN**: No guide tree; focuses on residue conservation.

---

### Which Algorithm is “Better”

- **T-Coffee**:  
  - **Best for accuracy across diverse sequences**.  
  - Consistency-based scoring reduces alignment errors in divergent regions.  

- **Clustal Omega**:  
  - **Fast and reliable for similar sequences**.  
  - May misalign divergent sequences due to propagation of errors in progressive alignment.

- **PRRN**:  
  - **Best for residue-level conservation analysis**.  
  - Not intended for global alignment or guide tree construction, so less suitable for evaluating evolutionary relationships.

---




>Human_P53
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDE

>Human_GFP
MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTT

>Human_CytochromeC
MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGFSYTDANKNKGITWKE

>Human_Lactate_Dehydrogenase
MSTSVVIRNLSYTVQLGPAQGSLVAGHEGQKLPGVLSVGAPLAAGLKGKGVVVWGH

>Human_Insulin
MALWMRLLPLLALLALWGPDPAAAQKLSGAQGTVLQKDSGTLEDQTLELEALTKLQS

>Human_Tubulin
MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGTYHGDSDLQLERINVYYNEATGGV

>Human_Myoglobin
MGKVKVGVNGFGRIGRLVTRAAFNSGKVDIVLDSGDGVTHVV

>Human_COX1
MFVLSSWRVAVVAGLLVLTAGVAGAGGCVGLTVAKLAGKEVTGSGDVEVVGPLSVVGFAGVDLVGAAHAEFTGVVVEFGGAYALYDYVAA

>Human_Hemoglobin_Alpha
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKVADAL

>Human_Hemoglobin_Beta
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFDSFGDLSSASAIMGNPKV
