## Pairwise alignments

Pairwise Sequence Alignment is a process in which two sequences are compared at a time, and the best possible sequence alignment is provided by maximizing the matching of similar characters. The purpose of this task is to identify regions of similarity that could have functional biological roles. A correct alignment should reflect the evolutionary history of the molecules. Gaps in this alignment can be interpreted as insertion or deletion throughout the evolutionary process of the molecules involved. 

Pairwise sequence alignment uses a [dynamic programming algorithm]
(https://en.wikipedia.org/wiki/Dynamic_programming); this means simplifying a complicated problem by breaking it down into simpler sub-problems. Nucleotide or amino acid residues are represented as columns in a matrix with their respective identities in a one-letter code. Matching positions pertain to the same column, and gaps are inserted where regions are not aligned. All the process is guided by a score function optimized to find alignment arrangements with maximum values.

There are two types of pairwise alignments; global and local. The former forces the alignment to span all the sequence length, while the latter tries to find small patterns between the sequences that, in cases, could be quite different.

Biopython has a particular module, "Bio.pairwise2", which creates the sequences' alignment using a pairwise method. The module provides methods to get global and local alignments between two sequences.

Let's take two hypothetical and straightforward sequences as an example for using the pairwise module.

In [1]:
# Import libraries 
from Bio import pairwise2 
from Bio.Seq import Seq 

# Creating sample sequences 
seq1 = Seq("AGGATCGTTGGCGCCCGACC") 
seq2 = Seq("CTTAGACTCTGTACTGAGTT") 
  
# Finding similarities 
alignments = pairwise2.align.globalxx(seq1, seq2)
  
# Showing results 
for match in alignments:
    print(match)

('---AGGATCGTTGGC-GCC--C-GACC---', 'CTTA-GA-C--T--CTG--TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGTTGGC-GCC--C-GACC---', 'CTTAG-A-C--T--CTG--TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGTTGGC-GCC--C-GACC---', 'CTTA-GA-C-T---CTG--TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGTTGGC-GCC--C-GACC---', 'CTTAG-A-C-T---CTG--TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGT-TGGCGCC--C-GACC---', 'CTTA-GA-C-TCT---G--TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGT-TGGCGCC--C-GACC---', 'CTTAG-A-C-TCT---G--TACTGA--GTT', 10.0, 0, 30)
('---AGGA-TCGTTGGCGCC--C-GACC---', 'CTTA-GACTC--T---G--TACTGA--GTT', 10.0, 0, 30)
('---AGGA-TCGTTGGCGCC--C-GACC---', 'CTTAG-ACTC--T---G--TACTGA--GTT', 10.0, 0, 30)
('---AGGA-TCGTTGGCGCC--C-GACC---', 'CTTA-GACTC-T----G--TACTGA--GTT', 10.0, 0, 30)
('---AGGA-TCGTTGGCGCC--C-GACC---', 'CTTAG-ACTC-T----G--TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGT-TGGCGCC--C-GACC---', 'CTTA-GA-C-TCT-G----TACTGA--GTT', 10.0, 0, 30)
('---AGGATCGT-TGGCGCC--C-GACC---', 'CTTAG-A-C-TCT-G----TACTGA--GTT', 10.0, 0, 30)
('---AGGA-TCGTTG

This matchings are difficult to visualize in this way. However, BioPython has a function (format_alignment) that makes our printing of the alignments much more intuitive:

In [2]:
from Bio.pairwise2 import format_alignment

for i, match in enumerate(alignments):
    print('Alignment '+str(i).zfill(2)+':') # print and index for each alignment
    print(format_alignment(*match))
    print() # print a blank space to make the output easy to read

Alignment 00:
---AGGATCGTTGGC-GCC--C-GACC---
   | || |  |  | |    | ||     
CTTA-GA-C--T--CTG--TACTGA--GTT
  Score=10


Alignment 01:
---AGGATCGTTGGC-GCC--C-GACC---
   || | |  |  | |    | ||     
CTTAG-A-C--T--CTG--TACTGA--GTT
  Score=10


Alignment 02:
---AGGATCGTTGGC-GCC--C-GACC---
   | || | |   | |    | ||     
CTTA-GA-C-T---CTG--TACTGA--GTT
  Score=10


Alignment 03:
---AGGATCGTTGGC-GCC--C-GACC---
   || | | |   | |    | ||     
CTTAG-A-C-T---CTG--TACTGA--GTT
  Score=10


Alignment 04:
---AGGATCGT-TGGCGCC--C-GACC---
   | || | | |   |    | ||     
CTTA-GA-C-TCT---G--TACTGA--GTT
  Score=10


Alignment 05:
---AGGATCGT-TGGCGCC--C-GACC---
   || | | | |   |    | ||     
CTTAG-A-C-TCT---G--TACTGA--GTT
  Score=10


Alignment 06:
---AGGA-TCGTTGGCGCC--C-GACC---
   | || ||  |   |    | ||     
CTTA-GACTC--T---G--TACTGA--GTT
  Score=10


Alignment 07:
---AGGA-TCGTTGGCGCC--C-GACC---
   || | ||  |   |    | ||     
CTTAG-ACTC--T---G--TACTGA--GTT
  Score=10


Alignment 08:
---AGGA-TCGTTGGCGCC--C-GAC

There are more than a hundred alternative alignments that have very similar scores. This result is very common because most alignments differ by merely a few gap positions. It is usual to use expert assistance to curate by hand the resulting alignments to reflect better the evolutionary process. Also, the use of more sequences in the alignment can serve to guide the alignment of conserved positions among a diverse set of evolutionarily related sequences. This method (multiple sequence alignment) will be the next topic of our course.

### globalxx

Here, the "globalxx" method does the main work; it follows the convention "alignment type>XX" where "XX" is a code having two characters indicating the parameters it takes. The first character indicates the matching and mismatching score, while the second indicates the gap penalty parameter.

Match parameters :

|Code Character |	Description|
|:---:|:---:|
|x|	No parameters. Identical character has score of 1, else 0.|
|m|	match score of identical chars, else mismatch score.|
|d|	dictionary returning scores of any pair of characters.|
|c|	A callback function returns scores.|

gap penalty parameters :

|Code Character|	Description|
|:---:|:---:|
|x|	No gap penalties.|
|s|	both sequences having same open and extend gap penalty.|
|d|	sequences having different open and extend gap penalty.|
|c	|A callback function returns the gap penalties.|

### localxx

In [7]:
# Finding similarities 
alignments = pairwise2.align.localxx(seq1, seq2)
  
# Showing results 
for i, match in enumerate(alignments):
    print('Alignment '+str(i).zfill(2)+':') # print and index for each alignment
    print(format_alignment(*match))
    print() # print a blank space to make the output easy to read

Alignment 00:
1 AGGATCGTTGGC-GCC--C-GA
  | || |  |  | |    | ||
4 A-GA-C--T--CTG--TACTGA
  Score=10


Alignment 01:
1 AGGATCGTTGGC-GCC--C-GA
  || | |  |  | |    | ||
4 AG-A-C--T--CTG--TACTGA
  Score=10


Alignment 02:
1 AGGATCGTTGGC-GCC--C-GA
  | || | |   | |    | ||
4 A-GA-C-T---CTG--TACTGA
  Score=10


Alignment 03:
1 AGGATCGTTGGC-GCC--C-GA
  || | | |   | |    | ||
4 AG-A-C-T---CTG--TACTGA
  Score=10


Alignment 04:
1 AGGATCGT-TGGCGCC--C-GA
  | || | | |   |    | ||
4 A-GA-C-TCT---G--TACTGA
  Score=10


Alignment 05:
1 AGGATCGT-TGGCGCC--C-GA
  || | | | |   |    | ||
4 AG-A-C-TCT---G--TACTGA
  Score=10


Alignment 06:
1 AGGA-TCGTTGGCGCC--C-GA
  | || ||  |   |    | ||
4 A-GACTC--T---G--TACTGA
  Score=10


Alignment 07:
1 AGGA-TCGTTGGCGCC--C-GA
  || | ||  |   |    | ||
4 AG-ACTC--T---G--TACTGA
  Score=10


Alignment 08:
1 AGGA-TCGTTGGCGCC--C-GA
  | || || |    |    | ||
4 A-GACTC-T----G--TACTGA
  Score=10


Alignment 09:
1 AGGA-TCGTTGGCGCC--C-GA
  || | || |    |    | ||
4 AG-ACTC-T----G--

### Sources of information:

* [Biopython - Pairwise alignment](https://www.geeksforgeeks.org/biopython-pairwise-alignment/)
* [Rando DNA sequence generator](https://faculty.ucr.edu/~mmaduro/random.htm)

Check also:
* The [Pairwise alignment chapter](http://readiab.org/book/0.1.3/2/1) in [An Introduction to Applied Bioinformatics](https://github.com/applied-bioinformatics/An-Introduction-To-Applied-Bioinformatics)