# Pairwise Sequence Alignments in Biopython

## Context:
- Pairwise Sequence Alignment is used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological sequences (protein or nucleic acid)
- Identifying the similar region enables us to infer a lot of information like what traits are conserved between species, how close different species genetically are, how species evolve, etc
- Pairwise sequence alignment uses a dynamic programming to the optimal alignment between the two sequences, scoring based on their similarity (how similar they are) or distance (how different they are), and then assessing the significance of this score
- Two types - global (comparing overall sequence) and local (comparing subsequence)
- Two parameters - match score (indiacates compatibility of two characters) and gap penalty (penalty assigned for mismatch)
- Two built-in pairwise aligners: **Bio.pairwise2 module** and **PairwiseAligner class within Bio.Align module**
- The names of the alignment functions in this module follow the convention alignmenttypeXY where alignmenttype is either “global” or “local”
- XY = 2 character code indicating the parameters it takes 
- The first character X = the parameters for matches (and mismatches), and the second Y = the parameters for gap penalties.

### Match parameters:
- x - No parameters. Identical characters have score of 1, otherwise 0
- m - A match score is the score of identical chars, otherwise mismatch score. Keywords: match, mismatch
- d - A dictionary returns the score of any pair of characters. Keyword: match_dict
- c - A callback function returns scores. Keyword: match_fn

### Gap penalty parameters:
- x - No gap penalties
- s - Same open and extend gap penalties for both sequences. Keywords: open, extend
- d - The sequences have different open and extend gap penalties. Keywords openA, extendA, openB, extendB
- c - A callback function returns the gap penalties. Keywords gap_A_fn, gap_B_fn

## Example of Global Alignment

In [1]:
from Bio import pairwise2

In [2]:
# globalxx - matches score 1, mismatches 0 and no gap penalty.
alignments = pairwise2.align.globalxx("ACCGGT", "ACGT") 
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=4

ACCGGT
||  ||
AC--GT
  Score=4

ACCGGT
| || |
A-CG-T
  Score=4

ACCGGT
|| | |
AC-G-T
  Score=4



Interpretation: The function calculates all possible global alignment combiantions for the two sequences.

In [3]:
# globalmx - # matches score 2, mismatches -1. No gap penalty.
alignments = pairwise2.align.globalmx("ACCGGT", "ACGT", match=2, mismatch=-1) 
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=8

ACCGGT
||  ||
AC--GT
  Score=8

ACCGGT
| || |
A-CG-T
  Score=8

ACCGGT
|| | |
AC-G-T
  Score=8



In [4]:
# globaldx - matching/mismatching scores read from blosum62 matrix, no gap penalty
from Bio.Align import substitution_matrices
matrix = substitution_matrices.load("BLOSUM62") # blosum62 scoring matrix for sequence alignment of proteins
alignments = pairwise2.align.globaldx("KEVLA", "EVL", match_dict=matrix)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

KEVLA
 ||| 
-EVL-
  Score=13



Interpreation: BLOSUM62 substitution matrix is used for scoring

In [5]:
# globalmc - matches score 5, mismatches -4, gap penalty defined through function gap_function
from math import log
def gap_function(x, y):  # x is gap position in seq, y is gap length
     if y == 0:  # No gap
        return 0
     elif y == 1:  # Gap open penalty
        return -2
     return - (2 + y/4.0 + log(y)/2.0)

alignments = pairwise2.align.globalmc("ACCCCCGT", "ACG", match=5, mismatch=-4,
                                     gap_A_fn=gap_function, gap_B_fn=gap_function)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCCCCGT
|    || 
A----CG-
  Score=9.30685

ACCCCCGT
||    | 
AC----G-
  Score=9.30685



Interpretation: Gap penalty is calculated using user-defined gap function

## Example of Local Alignment

In [6]:
# localxx - matches score 1, mismatches 0 and no gap penalty.
alignments = pairwise2.align.localxx("ACCGGT", "ACGT") 
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=4

ACCGGT
||  ||
AC--GT
  Score=4

ACCGGT
| || |
A-CG-T
  Score=4

ACCGGT
|| | |
AC-G-T
  Score=4



In [7]:
# localmx - # matches score 2, mismatches -1. No gap penalty.
alignments = pairwise2.align.localmx("ACCGGT", "ACGT", match=2, mismatch=-1) 
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=8

ACCGGT
||  ||
AC--GT
  Score=8

ACCGGT
| || |
A-CG-T
  Score=8

ACCGGT
|| | |
AC-G-T
  Score=8



In [8]:
# localdx - matching/mismatching scores read from blosum62 matrix, no gap penalty
from Bio.Align import substitution_matrices
matrix = substitution_matrices.load("BLOSUM62") # blosum62 scoring matrix for sequence alignment of proteins
alignments = pairwise2.align.localdx("KEVLA", "EVL", match_dict=matrix)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

2 EVL
  |||
1 EVL
  Score=13



In [9]:
# localmc - matches score 5, mismatches -4, gap penalty defined through function gap_function
from math import log
def gap_function(x, y):  # x is gap position in seq, y is gap length
     if y == 0:  # No gap
        return 0
     elif y == 1:  # Gap open penalty
        return -2
     return - (2 + y/4.0 + log(y)/2.0)

alignments = pairwise2.align.localmc("ACCCCCGT", "ACG", match=5, mismatch=-4,
                                     gap_A_fn=gap_function, gap_B_fn=gap_function)
for alignment in alignments:
    print(pairwise2.format_alignment(*alignment))

1 ACCCCCG
  |    ||
1 A----CG
  Score=11.3069

1 ACCCCCG
  ||    |
1 AC----G
  Score=11.3069



## Summary: In this tutorial, we performed global and local alignments between two sequences. For scoring in each instance, we opt for default alignment parameters, user-defined parameters, BLOSUM62 substitute matrix for scoring and also used-defined gap function.

# Finish!