# Sequence Alignment
* Sequence Alignment is a method of arranging sequences of DNA, RNA, Amino Acids or Proteins to identify regions of similarity;
* The similarity being identified, may be a result of functional, structural or evolutionary relationships between the sequences;
* It is useful in identifying similarity and homology;
* Homology: descent from a commo ancestor or source

## Terms

* Matches
* Mismatches
* Gap

In [32]:
def represent_terms():
    print("* ACTCGC   A    C        T       C       G   \n* |.||     |    .        |       |           \n* ATTC-T   A    T        T       C       -   \n*        Match Mismatch Match  Match    Gap  ")

In [33]:
represent_terms()

* ACTCGC   A    C        T       C       G   
* |.||     |    .        |       |           
* ATTC-T   A    T        T       C       -   
*        Match Mismatch Match  Match    Gap  


### Alignment Types
* Global Alignment: finds the best concordance/agreement between all characters in two sequences
    * Mostly from end to end
    * Proposed by Needle
* Local Alignment: finds just the subsequences that allign the best
    * In this method, we consider subsequences within each of the two sequences and try to match them to get the best alignment
    * Proposed by Water


# When to use Local Alignment
* Two sequences have a small matched region
* Two sequences are of different lenghts
* Overlapping sequences
* One sequence is a subsequence of the other
* BLAST
* Emboss

In [3]:
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

In [4]:
from Bio.Seq import Seq

In [5]:
seq1 = Seq("ACTCGT")
seq2 = Seq("ATTCG")

In [6]:
# Global allignment
alignments= pairwise2.align.globalxx(seq1, seq2)
alignments

[Alignment(seqA='ACT-CGT', seqB='A-TTCG-', score=4.0, start=0, end=7),
 Alignment(seqA='AC-TCGT', seqB='A-TTCG-', score=4.0, start=0, end=7),
 Alignment(seqA='ACTCGT', seqB='ATTCG-', score=4.0, start=0, end=6)]

In [7]:
# To display the allignment 
def display_alignments(allignments, position=None):
    try:
        if position == None:
            for pos, alignment in enumerate(alignments):
                print(f"Alignment nº{pos + 1}:")
                print(format_alignment(*alignment))
        else: print(format_alignment(*alignments[position]))
    except:
        if type(alignments[0]) != "Alignment":
            raise TypeError("Check the alignment parameter if contains alignment type objects")
        if position not in range(len(alignments)):
            raise ValueError("Position invalid, it is not in the range of the alignment")
        

In [8]:
display_alignments(alignments)

Alignment nº1:
ACT-CGT
| | || 
A-TTCG-
  Score=4

Alignment nº2:
AC-TCGT
|  ||| 
A-TTCG-
  Score=4

Alignment nº3:
ACTCGT
|.||| 
ATTCG-
  Score=4



In [9]:
display_alignments(alignments, 0)

ACT-CGT
| | || 
A-TTCG-
  Score=4



In [18]:
# Local Alignment
local_alignment = pairwise2.align.localxx(seq1, seq2)
local_alignment_only_by_score = pairwise2.align.localxx(seq1, seq2, one_alignment_only=True, score_only=True)

In [19]:
display_alignments(local_alignment)

Alignment nº1:
ACT-CGT
| | || 
A-TTCG-
  Score=4

Alignment nº2:
AC-TCGT
|  ||| 
A-TTCG-
  Score=4

Alignment nº3:
ACTCGT
|.||| 
ATTCG-
  Score=4



In [20]:
#Get the allignment by only the score
alignment_only_by_score = pairwise2.align.globalxx(seq1, seq2, one_alignment_only=True, score_only=True)
alignment_only_by_score

4.0

# Check for similarity or percentage of similarity using alignment
* Fraction od nucleotides that is the same / total number of nucleotides * 100%

In [21]:
def check_percentage(alignment, sequence):
    return f"{alignment / len(sequence) * 100}%"

In [22]:
check_percentage(alignment_only_by_score, seq1)

'66.66666666666666%'

In [24]:
check_percentage(local_alignment_only_by_score, seq1)

'66.66666666666666%'

# Find out all the possible Global alignments with maximum similarity score
* Matching Characters: 2 pts;
* Each mismatching characters: -1pt;
* Opening a Gap: -0.5pt.
* Extending a Gap: -0.1pt.

In [27]:
# Global alignment with max simularity
glb_alignment = pairwise2.align.globalms(seq1, seq2, 2, -1, -0.5, -0.1)
display_alignments(glb_alignment)

Alignment nº1:
ACT-CGT
| | || 
A-TTCG-
  Score=4

Alignment nº2:
AC-TCGT
|  ||| 
A-TTCG-
  Score=4

Alignment nº3:
ACTCGT
|.||| 
ATTCG-
  Score=4

