In [1]:
#hide
#default_exp alignment
from nbdev.showdoc import show_doc
from IPython.display import HTML
%load_ext autoreload
%autoreload 2

# alignment
>aligning glycan sequences based on a substitution matrix

In [2]:
#export
from glycowork.alignment.glysum import *

`alignment` contains the codebase for aligning glycan sequences based on a substitution matrix, GLYSUM. It currently includes the following modules:
 - `glysum` which contains the actual alignment functions
 
For future iterations of `glycowork`, we are planning to include additional alignment algorithms, such as described in https://pubmed.ncbi.nlm.nih.gov/15215393/

# glysum
>aligning glycan sequences based on a substitution matrix

In [3]:
show_doc(pairwiseAlign)

<h4 id="pairwiseAlign" class="doc_header"><code>pairwiseAlign</code><a href="https://github.com/BojarLab/glycowork/tree/master/glycowork/alignment/glysum.py#L14" class="source_link" style="float:right">[source]</a></h4>

> <code>pairwiseAlign</code>(**`query`**, **`corpus`**=*`None`*, **`n`**=*`5`*, **`vocab`**=*`None`*, **`submat`**=*`None`*, **`mismatch`**=*`-10`*, **`gap`**=*`-5`*, **`col`**=*`'glycan'`*)

aligns glycan sequence from database against rest of the database and returns the best n alignments

| Arguments:
| :-
| query (string): glycan string in IUPAC-condensed notation
| corpus (dataframe): database to align query against; default is SugarBase
| n (int): how many alignments to show; default shows top 5
| vocab (list): list of glycowords used for mapping to tokens
| submat (dataframe): GLYSUM substitution matrix
| mismatch (int): mismatch penalty; default: -10
| gap (int): gap penalty; default: -5
| col (string): column name where glycan sequences are; default: glycan

| Returns:
| :-
| The n best alignments of query against corpus in text form with scores etc

In [4]:
print("Test Alignment")
pairwiseAlign('Man(a1-3)Man(a1-4)Glc(b1-4)Man(a1-5)Kdo')

Test Alignment
1                                               9
Man a1-3 Man a1-4 Glc b1-4 Man a1-5 Kdo
Man a1-3 Man a1-4 Glc b1-4 Man a1-5 Kdo
Alignment Score: 30
Percent Identity: 100.0
Percent Coverage: 100.0
Sequence Index: Man*a1-3*Man*a1-4*Glc*b1-4*Man*a1-5*Kdo
Species: ['Xanthomonas_oryzae']

1                           5
Man a1-3 Man a1-4 Glc
Man a1-3 Man a1-4 Rha
Alignment Score: 21
Percent Identity: 80.0
Percent Coverage: 55.55555555555556
Sequence Index: Man*a1-3*Man*a1-4*Rha
Species: []

1                                               9
Man a1-3 Man a1-4 Glc b1-4 Man   a1-5 Kdo
Man a1-3 Man a1-4 Glc b1-4 ManOP a1-5 Kdo
Alignment Score: 20
Percent Identity: 88.88888888888889
Percent Coverage: 100.0
Sequence Index: Man*a1-3*Man*a1-4*Glc*b1-4*ManOP*a1-5*Kdo*a2-6*GlcOPN*b1-6*GlcN
Species: []

1                                               9
Man a1-3 Man a1-4 Glc b1-4 Man a1-5 Kdo
Man a1-3 Man a1-3 Glc b1-6 Man a1-5 Kdo
Alignment Score: 20
Percent Identity: 77.77777777777779
P

In [5]:
#hide
from nbdev.export import notebook2script; notebook2script()

Converted 00_core.ipynb.
Converted 01_alignment.ipynb.
Converted 02_glycan_data.ipynb.
Converted 03_ml.ipynb.
Converted 04_motif.ipynb.
Converted 05_examples.ipynb.
Converted 06_network.ipynb.
Converted index.ipynb.
