# Pairwise Sequence Alignment

**Pairwise Sequence Alignment** is used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biologic sequences (protein or nucleic acid)

Identifying the similar region enables us to infer a lot of information like what traits are conserved between species, how close different species genetically are, how species evolve, etc.

## Types of Pairwise Alignments

1. **Global Alignments**: This method finds the best alignment over the entire lengths of the 2 sequences. What is the maximum similarity between sequence X and Y?
2. **Local Alignments**: This method finds the most similar subsequence among the 2 sequences. What is the maximum similarity between a subsequence of X and a subsequence of Y?

When doing alignments, you can specify the match score and gap penalties.

1. The **match score** indicates the compatibility between an alignment of two characters in the sequences. Highly compatible characters should be given positive scores, and incompatible ones should be given negative scores or 0
2. The **gap penalties** should be negative

## `Bio.pairwise2`

Biopython includes two built-in pairwise aligners: `Bio.pairwise2` module and `PairwiseAligner` class within the `Bio.Align` module. Both can perform global and local alignments. The names of the alignment functions in the `pairwise2` module follow the convention **alignmenttypeXY** where **alignmenttype** is either "global" or "local" and **XY** is a 2 character code indicating the parameters it takes. The first character **X** indicates the parameters for matches (and mismatches), and the second **Y** indicates the parameters for gap penalties.

The match parameters are:
1. x - No parameters. Identical characters have score of 1, otherwise 0
2. m - A match score is the score of identical chars, otherwise mismatch score. Keywords: **match**, **mismatch**
3. d - A dictionary returns the score of any pair of characters. Keyword: **match_dict**
4. c - A callback function returns scores. Keyword: **match_fn**

The gap penalty parameters are:
1. x - No gap penalties
2. s - Same open and extend gap penalties for both sequences. Keywords: **open**, **extend**
3. d - The sequences have different open and extend gap penalties. Keywords: **openA**, **extendA**, **openB** and **extendB**
4. c - A callback function returns the gap penalties. Keywords **gap_A_fn**, **gap_B_fn**

For local alignment we use the same function and parameters, just instead of calling global we call local.

### Examples of Global Alignment

In [1]:
from Bio import pairwise2



In [2]:
# globalxx - matches score 1, mismatches 0 and no gap penalty
alignments = pairwise2.align.globalxx('ACCGGT', "ACGT")
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=4

ACCGGT
||  ||
AC--GT
  Score=4

ACCGGT
| || |
A-CG-T
  Score=4

ACCGGT
|| | |
AC-G-T
  Score=4



In [3]:
# globalmx - # matches score 2, mismatches -1. No gap penalty
alignments = pairwise2.align.globalmx("ACCGGT", "ACGT", match=2, mismatch=-1)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=8

ACCGGT
||  ||
AC--GT
  Score=8

ACCGGT
| || |
A-CG-T
  Score=8

ACCGGT
|| | |
AC-G-T
  Score=8



In [4]:
# globalxs - matches score 1, mismatches 0, opening gap -2, extended gap -1
alignments = pairwise2.align.globalxs("ACCGGT", "ACGT", open=-2, extend=-1)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
||  ||
AC--GT
  Score=1



In [6]:
# globadx - matching/mismatching scores read from blosum62 matrix, no gap penalty
from Bio.Align import substitution_matrices

matrix = substitution_matrices.load("BLOSUM62") # blosum62 scoring matrix for sequence alignment
alignments = pairwise2.align.globaldx("KEVLA", "EVL", match_dict=matrix)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

KEVLA
 ||| 
-EVL-
  Score=13



In [7]:
# globalmc - matches score 5, mismatches -4, gap penalty defined through function gap_function
from math import log

def gap_function(x, y):     # x is gap position in seq, y is gap length
    if y == 0:              # No gap
        return 0
    elif y == 1:            # Gap open penalty
        return -2
    return - (2 + y/4.0 + log(y) / 2.0)

alignments = pairwise2.align.globalmc("ACCCCCGT", "ACG", match=5, mismatch=-4, gap_A_fn=gap_function, gap_B_fn=gap_function)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))


ACCCCCGT
|    || 
A----CG-
  Score=9.30685

ACCCCCGT
||    | 
AC----G-
  Score=9.30685

