# Global and local alignments with pairwise2

https://towardsdatascience.com/pairwise-sequence-alignment-using-biopython-d1a9d0ba861f

http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html


Let's try out some coding to simulate pairwise sequence alignment using Biopython. We will be using the pairwise2 module, which can be found in the Bio package.  

pairwise2 provides functions to get global and local alignments between two sequences. A global alignment find the best concordance between all characters in two sequences. A local alignment finds just the subsequences that align the best. 

When doing alignments, you can specify the match score and gap penalties. The match score indicates the compatibility between an alignment of two characters in the sequences. Highly compatible characters should be given positive scores, and incompatible ones should be given a negative score or 0. The gap penalties should be negative. 

The names of the alignment functions follow the convention:
`<alignment type>XX`

where `<alignment type>` is either global or local and `XX` is a 2 character code indicating the parameters it takes. The first character indicates the parameters for matches (and mismatches), and the second indicates the parameters for gap penalties. 

The match parameters are:

x = No parameters. Identical characters have score of 1, else 0.

m = A match score is the score of identical chars, else mismatch score. 

d = A dictionary returns the score of any pair of characters.

c = A callback function returns scores.



The gap penalty parameters are:

x = No gap penalties.

s = Same open and extend gap penalties for both sequences.

d = The sequences have different open and extend gap penalties.

c = A callback function returns the gap penalties.


## Example 1

Consider two sequences given below. We want to find out all the possible global alignments with the maximum similarity score 

x = ACGGGT

y = ACG


In [7]:
#Import pairwise2 module
from Bio import pairwise2

#Import format_alignment method
from Bio.pairwise2 import format_alignment

In [8]:
#Define two sequences to be aligned
x = "ACGGGT"
y = "ACG"


In [14]:
#Get a list of the global alignments between the two sequences ACGGGT and ACG
#No parameters. Identical characters have score of 1, else 0.
#No gap penalties.

alignments = pairwise2.align.globalxx(x, y)

In [11]:
#Use format_alignment method to format the alignments in the list
for a in alignments:
    print (format_alignment(*a))

ACGGGT
||  | 
AC--G-
  Score=3

ACGGGT
|| |  
AC-G--
  Score=3

ACGGGT
|||   
ACG---
  Score=3



In this example, matching characters have been given 1 point. No points have been deducted for mismatches or gaps.

## Example 2

Consider the two sequences given in the previous example. We want to find out all the possible local alignments with the maximum similarity score. 

In [15]:
alignmentslocal = pairwise2.align.localxx(x, y)

for a in alignmentslocal:
    print (format_alignment(*a))

ACGGGT
||  |
AC--G-
  Score=3

ACGGGT
|| |
AC-G--
  Score=3

ACGGGT
|||
ACG---
  Score=3



Again, the matching characters have been given 1 point. No points have been deducted for mismatches or gaps.

## Example 3

Now we're going to change the scoring scheme and assign values for matches, mismatches and gaps. We will be considering the same two sequences as before. We want to find all the possible global alignments with the maximum similarity score. 

Matching characters are given 2 points, 1 point is deducted for each mismatching character. 0.5 points are deducted when opening a gap, and 0.1 points are deducted when extending it.

In [17]:
alignmentsglobal = pairwise2.align.globalms(x, y, 2, -1, -0.5, -0.1)
# 2 = score for identical characters
# -1 = score for non-identical characters
# -0.5 = score for gap opening
# -0.1 = score for gap extension

for a in alignmentsglobal:
    print (format_alignment(*a))

ACGGGT
|||   
ACG---
  Score=5.3

