In [1]:
using MIToS.MSA

INFO: Recompiling stale cache file /home/diego/.julia/lib/v0.4/MIToS.ji for module MIToS.


In [2]:
?MIToS.MSA

## MSA

The MSA module of MIToS has utilities for working with Multiple Sequence Alignments of protein Sequences (MSA).

**Features**

  * Read and write MSAs in `Stockholm`, `FASTA` or `Raw` format
  * Handle MSA annotations
  * Edit the MSA, e.g. delete columns or sequences, change sequence order, shuffling...
  * Keep track of positions and annotations after modifications on the MSA
  * Describe a MSA, e.g. mean percent identity, sequence coverage, gap percentage...

```julia

using MIToS.MSA
```


In [3]:
?MIToS.MSA.Residue

### Residue

Most of the **MIToS** design is created around the `Residue` bitstype. It represents the 20 natural amino acids and a GAP value to represent insertion, deletion but also missing data: ambiguous residues and non natural amino acids. Each residue is encoded as an integer number, this allows fast indexing operation using Residues of probability or frequency matrices.

**Residue creation and conversion**

Creation and `convert`ion of `Residue`s should be treated carefully. `Residue` is encoded as an 8 bits type similar to `Int8`, to get faster indexing using `Int(x::Residue)`. In this way, `Int`, `Int8` and other signed integers returns the integer value encoded by the residue. Conversions to and from `Char`s and `Uint8` are different, to use the `Char`acter representation in IO operations.

```julia

julia> alanine = Residue('A')
A

julia> Int(alanine)
1

julia> Char(alanine)
'A'

julia> UInt8(alanine) # 0x41 == 65 == 'A'
0x41

julia> for residue in res"ARNDCQEGHILKMFPSTWYV-"
           println(residue, " ", Int(residue))
       end
A 1
R 2
N 3
D 4
C 5
Q 6
E 7
G 8
H 9
I 10
L 11
K 12
M 13
F 14
P 15
S 16
T 17
W 18
Y 19
V 20
- 21

```


In [4]:
?MIToS.MSA.@res_str

#### res"..."

The MIToS macro `@res_str` takes a string and returns a `Vector` of `Residues` (sequence).

```julia

julia> res"MIToS"
5-element Array{MIToS.MSA.Residue,1}:
 M
 I
 T
 -
 S

```


# Multiple Sequence Alignments

In [5]:
msa_file = MIToS.Pfam.downloadpfam("PF09645")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0

"PF09645.stockholm.gz"

100   647  100   647    0     0    280      0  0:00:02  0:00:02 --:--:--   280


In [6]:
?MIToS.MSA.AbstractMultipleSequenceAlignment

### AbstractMultipleSequenceAlignment

The most basic implementation of a MIToS MSA is a `Matrix` of `Residue`s.


In [7]:
msa = read(msa_file, Stockholm, Matrix{Residue})

2x110 Array{MIToS.MSA.Residue,2}:
 -  -  -  -  -  -  -  V  A  Q  Q  L  F  …  -  -  -  -  -  -  -  -  -  -  -  -
 Q  T  L  N  S  Y  K  M  A  E  I  M  Y     E  Q  T  D  Q  G  F  I  K  A  K  Q

In [8]:
?MIToS.MSA.MultipleSequenceAlignment

### MultipleSequenceAlignment

This MSA type include the `Matrix` of `Residue`s and the sequence names. To allow fast indexing of MSAs using **sequence identifiers**, they are saved as an `IndexedArray`.


In [None]:
msa = read(msa_file, Stockholm, MultipleSequenceAlignment)

In [None]:
msa.id

In [None]:
msa["F112_SSV1/3-112"]

Similar to this, MIToS defines an `AnnotatedMultipleSequenceAlignment` that also includes annotations.

In [None]:
fieldnames(AnnotatedMultipleSequenceAlignment)

In [None]:
msa = read(msa_file, Stockholm, AnnotatedMultipleSequenceAlignment, generatemapping=true, useidcoordinates=true)

In [None]:
msa.annotations

## MSA annotations

In [None]:
?MIToS.MSA.Annotations

In [None]:
fieldnames(Annotations)

MIToS uses MSA annotations to keep track of:  
- **Modifications** of the MSA (`MIToS_...`) as deletion of sequences or columns.  
- Positions numbers in the original MSA file (**column mapping:** `ColMap`)  
- Position of the residues in the sequence (**sequence mapping:** `SeqMap`)  

In [None]:
printmodifications(msa)

In [None]:
getcolumnmapping(msa)

In [None]:
getsequencemapping(msa,"F112_SSV1/3-112")