In [1]:
from Bio.Seq import Seq
from Bio import pairwise2

#### 双序列比对


对于Biopython而言

The names of the alignment functions in this module follow the
    convention
    <alignment type>XX
    where <alignment type> is either "global" or "local" and XX is a 2
    character code indicating the parameters it takes.  The first
    character indicates the parameters for matches (and mismatches), and
    the second indicates the parameters for gap penalties.
    
    The match parameters are::
    
        CODE  DESCRIPTION
        x     No parameters. Identical characters have score of 1, otherwise 0. match为1分，mismatch为0分
        m     A match score is the score of identical chars, otherwise mismatch
              score.
        d     A dictionary returns the score of any pair of characters.
        c     A callback function returns scores.
    
    The gap penalty parameters are::
    
        CODE  DESCRIPTION
        x     No gap penalties. 没有空位罚分
        s     Same open and extend gap penalties for both sequences.
        d     The sequences have different open and extend gap penalties.
        c     A callback function returns the gap penalties.
    
    All the different alignment functions are contained in an object
    ``align``. For example:
    
    
虽然这么多可以使用得比对函数。
但是一般常用得是以下

- `globalxx(seq1,seq2)` match为1分，mismatch为0分，空位罚分为0 
- `globalms(seq1,seq2,2,-1,-0.5,-0.1)`  指定match为2分，错配罚分为1分，空位罚分0.5，延伸空位罚分0.1
- `globaldx(seq1,seq2,matrix)` 通过指定打分矩阵，一般是蛋白质序列得比对采用。

对于不知道如何使用得 直接调用`help(pairwise2.align.globaldc)`来查询相应得使用帮助

In [2]:
seq1 = Seq('ACACACTA')
seq2 =Seq('AGCACACA')

##### gloablxx

In [3]:
alignments = pairwise2.align.globalxx(seq1,seq2)

In [25]:
for a in pairwise2.align.globalxx(seq1, seq2):
    print(format_alignment(*a))

A-CACACTA
| ||||| |
AGCACAC-A
  Score=7



使用`format_alignment(*alignments[0])`可以格式化显示比对结果
但是仅仅会显示匹配得部分，**不会显示全部序列**

In [17]:
from Bio.pairwise2 import format_alignment
print(format_alignment(*alignments[0]))

A-CACACTA
| ||||| |
AGCACAC-A
  Score=7



##### globalms

如果要指定相关得罚分则需要使用

指定match为2分，错配罚分为1分，空位罚分0.5，延伸空位罚分0.1

`pairwise2.align.globalms(seq1,seq2,2,1,-0.5,-0.1)`

In [33]:
for a in pairwise2.align.globalms(seq1,seq2,2,-1,-0.5,-0.1):
    print(format_alignment(*a)) 

A-CACACTA
| ||||| |
AGCACAC-A
  Score=13



##### globaldx

In [45]:
#导入打分矩阵相关得信息
from Bio.SubsMat import MatrixInfo as matlist
#选择pam30作为打分矩阵
PAM30= matlist.pam30

for a in pairwise2.align.globaldx('KEVLA','EVL',PAM30):
    print(format_alignment(*a,full_sequences=True))

KEVLA
 ||| 
-EVL-
  Score=22



#### 局部比对
与全局比对没区别。

注意gloablxx 改为localxx即可。

In [27]:
alignments = pairwise2.align.localxx('ACCGT','ACG')
for i in alignments:
    print(format_alignment(*i))

1 ACCG
  | ||
1 A-CG
  Score=3

1 ACCG
  || |
1 AC-G
  Score=3



给format_alignment()指定位置参数 full_sequences =True即可显示完整得序列

In [28]:
for i in alignments:
    print(format_alignment(*i,full_sequences=True))

ACCGT
| || 
A-CG-
  Score=3

ACCGT
|| | 
AC-G-
  Score=3



##### 打分矩阵

`from Bio.SubsMat import MatrixInfo as matlist`导入矩阵相关得信息， 从中选择需要得矩阵即可

In [34]:
from Bio.SubsMat import MatrixInfo as matlist

In [47]:
#选择pam30
PAM30= matlist.pam30
PAM30

{('W', 'F'): -4,
 ('L', 'R'): -8,
 ('S', 'P'): -2,
 ('V', 'T'): -3,
 ('Q', 'Q'): 8,
 ('N', 'A'): -4,
 ('Z', 'Y'): -9,
 ('W', 'R'): -2,
 ('Q', 'A'): -4,
 ('S', 'D'): -4,
 ('H', 'H'): 9,
 ('S', 'H'): -6,
 ('H', 'D'): -4,
 ('L', 'N'): -7,
 ('W', 'A'): -13,
 ('Y', 'M'): -11,
 ('G', 'R'): -9,
 ('Y', 'I'): -6,
 ('Y', 'E'): -8,
 ('B', 'Y'): -6,
 ('Y', 'A'): -8,
 ('V', 'D'): -8,
 ('B', 'S'): -1,
 ('Y', 'Y'): 10,
 ('G', 'N'): -3,
 ('E', 'C'): -14,
 ('Y', 'Q'): -12,
 ('Z', 'Z'): 6,
 ('V', 'A'): -2,
 ('C', 'C'): 10,
 ('M', 'R'): -4,
 ('V', 'E'): -6,
 ('T', 'N'): -2,
 ('P', 'P'): 8,
 ('V', 'I'): 2,
 ('V', 'S'): -6,
 ('Z', 'P'): -4,
 ('V', 'M'): -1,
 ('T', 'F'): -9,
 ('V', 'Q'): -7,
 ('K', 'K'): 7,
 ('P', 'D'): -8,
 ('I', 'H'): -9,
 ('I', 'D'): -7,
 ('T', 'R'): -6,
 ('P', 'L'): -7,
 ('K', 'G'): -7,
 ('M', 'N'): -9,
 ('P', 'H'): -4,
 ('F', 'Q'): -13,
 ('Z', 'G'): -5,
 ('X', 'L'): -6,
 ('T', 'M'): -4,
 ('Z', 'C'): -14,
 ('X', 'H'): -5,
 ('D', 'R'): -10,
 ('B', 'W'): -10,
 ('X', 'D'): -5,
 ('Z', 'K'):

In [48]:
from Bio import Align 

In [49]:
help(Align)

Help on package Bio.Align in Bio:

NAME
    Bio.Align - Code for dealing with sequence alignments.

DESCRIPTION
    One of the most important things in this module is the MultipleSeqAlignment
    class, used in the Bio.AlignIO module.

PACKAGE CONTENTS
    AlignInfo
    Applications (package)
    _aligners
    substitution_matrices (package)

CLASSES
    _algorithms.PairwiseAligner(builtins.object)
        PairwiseAligner
    builtins.object
        MultipleSeqAlignment
        PairwiseAlignment
        PairwiseAlignments
    
    class MultipleSeqAlignment(builtins.object)
     |  MultipleSeqAlignment(records, alphabet=None, annotations=None, column_annotations=None)
     |  
     |  Represents a classical multiple sequence alignment (MSA).
     |  
     |  By this we mean a collection of sequences (usually shown as rows) which
     |  are all the same length (usually with gap characters for insertions or
     |  padding). The data can then be regarded as a matrix of letters, with wel

A-CACACTA
| ||||| |
AGCACAC-A
  Score=7

