# Sequence Alignment

+ Sequence alignment is a method of arranging sequences of DNA, RNA, or Amino Acids or proteins to identify regions of similarity.
+ This similarity being identified, may be a result of functional, structural, or evolutionary relationships between the sequences.
+ It is useful in identifying similarity and homology
+ Homology: descent from a common ancestor or source.

### Terms

+ Matches
+ Mismatches
+ Gap

<img src="aligngap.png">

### Alignment Types

+ Global alignment: finds the best concordance/agreement between all characters in two sequences 
 + Mostly from end to end
 + By Needle
+ Local Alignment: finds just the subsequences that align the best
 + In this method, we consider subsequences within each of the 2 sequences and try to match them to obtain the best alignment
 + By Water

<img src="Global_vs_Local_alignment.png">

### When to use Local Alignment

+ 2 sequences have a small matched region
+ 2 sequences are of different lengths
+ Overlapping sequences
+ One sequences is a subsequences of the other
+ Blast 
+ Emboss

In [1]:
from Bio.Seq import Seq
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

In [2]:
seq1 = Seq("ACTCGT")
seq2 = Seq("ATTCG")

#### Global Alignment

In [3]:
alignments = pairwise2.align.globalxx(seq1,seq2)

In [4]:
alignments

[Alignment(seqA='ACT-CGT', seqB='A-TTCG-', score=4.0, start=0, end=7),
 Alignment(seqA='AC-TCGT', seqB='A-TTCG-', score=4.0, start=0, end=7),
 Alignment(seqA='ACTCGT', seqB='ATTCG-', score=4.0, start=0, end=6)]

In [7]:
# To better display
print(format_alignment(*alignments[0]))

ACT-CGT
| | || 
A-TTCG-
  Score=4



In [8]:
for global_alignment in alignments:
    print(format_alignment(*global_alignment))

ACT-CGT
| | || 
A-TTCG-
  Score=4

AC-TCGT
|  ||| 
A-TTCG-
  Score=4

ACTCGT
|.||| 
ATTCG-
  Score=4



#### Local Alignment

In [9]:
local_alignment = pairwise2.align.localxx(seq1,seq2)

In [10]:
for local_align in local_alignment:
    print(format_alignment(*local_align))

1 ACT-CG
  | | ||
1 A-TTCG
  Score=4

1 AC-TCG
  |  |||
1 A-TTCG
  Score=4

1 ACTCG
  |.|||
1 ATTCG
  Score=4



### Check for similarity or percentage of similarity using Alignment

+ fraction of nucleotides that is the same/total number of nucleotides * 100%

In [20]:
seq1

Seq('ACTCGT')

In [21]:
seq2

Seq('ATTCG')

In [23]:
local_alignment[0][2] / len(seq1) * 100

66.66666666666666

### Find out all the possible global alignment with the maximum similarity score

+ Matching characters: 2 points
+ Each mismatch character: -1 point
+ 0.5 points are deducted when opening a gap
+ 0.1 points are deducted when extending it.

In [25]:
# Global Alignment with max similarity
glb_alignment = pairwise2.align.globalms(seq1,seq2,2,-1,-0.5,-0.1)

In [26]:
for align in glb_alignment:
    print(format_alignment(*align))

ACT-CGT
| | || 
A-TTCG-
  Score=6.5

AC-TCGT
|  ||| 
A-TTCG-
  Score=6.5

ACTCGT
|.||| 
ATTCG-
  Score=6.5



# Well Done!