# Sequence Alignment

Sequence alignment is the process of arranging two or more sequences (of DNA, RNA or protein sequences) in a specific order identifying the region of similarity between them.

Identifying the similar region enables us to infer a lof of information like what traits are conserved between species, how close different species genetically are, how species evolve etc. BioPython provides extensive support for sequence alignment.

## Parsing Sequence Alignment

BioPython provides a module, `Bio.AlignIO` to read and write sequence alignments.

In [2]:
from Bio.AlignIO import read
with open('PF18225.alignment.seed', 'r') as file:
    alignment = read(file, 'stockholm')
    print(alignment)

Alignment with 5 rows and 65 columns
AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIM...NRK B3PFT7_CELJU/62-126
AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVA...NRT K4KEM7_SIMAS/61-125
MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVA...EGP B7RZ31_9GAMM/59-123
ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMA...KKP A0A143HL37_MICTH/57-121
TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMA...NKP A0A0X3UC67_9GAMM/57-121


We can also check the sequences (`SeqRecord`) available in the alignment

In [3]:
for align in alignment:
    print(align.seq)

AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIMVLAPRLTAKHPYDKVQDRNRK
AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVADLMRKLDLDRPFKKLERKNRT
MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVATVANQLRGRKRRAFARHREGP
ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMAPMLIALNYRNRESHAQVDKKP
TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMAPLFKVLSFRNREDQGLVNNKP


## Multiple Alignments

In general, most of the sequence alignment files contain single alignment data and it is enough to use **read** method to parse it. If the input sequence alignment format contains more than one sequence alignment, then we need to use **parse** method instead 

In [4]:
from Bio.AlignIO import parse
with open('PF18225.alignment.seed', 'r') as file:
    alignments = parse(file, 'stockholm')
    print(alignments)
    for alignment in alignments:
        print(alignment)

<generator object parse at 0x0000028C20E3AAC0>
Alignment with 5 rows and 65 columns
AINRNTQQLTQDLRAMPNWSLRFVYIVDRNNQDLLKRPLPPGIM...NRK B3PFT7_CELJU/62-126
AVNATEREFTERIRTLPHWARRNVFVLDSQGFEIFDRELPSPVA...NRT K4KEM7_SIMAS/61-125
MQNTPAERLPAIIEKAKSKHDINVWLLDRQGRDLLEQRVPAKVA...EGP B7RZ31_9GAMM/59-123
ARRHGQEYFQQWLERQPKKVKEQVFAVDQFGRELLGRPLPEDMA...KKP A0A143HL37_MICTH/57-121
TRRHGPESFRFWLERQPVEARDRIYAIDRSGAEILDRPIPRGMA...NKP A0A0X3UC67_9GAMM/57-121


## Pairwise Sequence Alignment

**Pairwise sequence alignment** compares only to sequences at a time and provides best possible sequence alignments. **Pairwise** is easy to understand and exceptional to infer from resulting sequence alignment. BioPython provides a special module `Bio.pairwise2` to identify alignment sequence using pairwise method. BioPython applies the best algorithm to find the alignment sequence and it is on par with other software.

In [5]:
from Bio import pairwise2
from Bio.Seq import Seq
seq1 = Seq("ACCGGT")
seq2 = Seq("ACGT")



In [6]:
alignments = pairwise2.align.globalxx(seq1, seq2)
alignments

[Alignment(seqA='ACCGGT', seqB='A-C-GT', score=4.0, start=0, end=6),
 Alignment(seqA='ACCGGT', seqB='AC--GT', score=4.0, start=0, end=6),
 Alignment(seqA='ACCGGT', seqB='A-CG-T', score=4.0, start=0, end=6),
 Alignment(seqA='ACCGGT', seqB='AC-G-T', score=4.0, start=0, end=6)]

Here **globalxx** method performs the actual work and finds all the best possible alignments in the given sequence.

`Bio.pairwise2` module provides a formatting method `format_alignment` to better visualize the result

In [7]:
from Bio.pairwise2 import format_alignment
alignments = pairwise2.align.globalxx(seq1, seq2)
for alignment in alignments:
    print(format_alignment(*alignment))

ACCGGT
| | ||
A-C-GT
  Score=4

ACCGGT
||  ||
AC--GT
  Score=4

ACCGGT
| || |
A-CG-T
  Score=4

ACCGGT
|| | |
AC-G-T
  Score=4



BioPython provides an interface for a lot of sequence alignment tools through `Bio.Align.Applications` module like:
- ClustalW
- MUSCLE
- EMBOSS needle and water