## [Example: Constant + MuE (Profile HMM)](http://pyro.ai/examples/mue_profile.html#example-constant-mue-profile-hmm)

### A standard profile HMM model [1], which corresponds to a constant (delta function) distribution with a MuE observation [2]. 
#### This is a standard generative model of variable-length biological sequences (e.g. proteins) which does not require preprocessing the data by building a multiple sequence alignment (MSA). It can be compared to a more complex MuE model in this package, the FactorMuE.

### References:
[1] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison (1998)
"Biological sequence analysis: probabilistic models of proteins and nucleic
acids"
Cambridge university press

[2] E. N. Weinstein, D. S. Marks (2021)
"A structured observation distribution for generative biological sequence
prediction and forecasting"
https://www.biorxiv.org/content/10.1101/2020.07.31.231381v2.full.pdf

In [None]:
%%sh
curl -O https://raw.githubusercontent.com/debbiemarkslab/MuE/master/models/examples/ve6_full.fasta

### [einsum - an underestimated function](https://medium.com/towards-data-science/einsum-an-underestimated-function-99ca96e2942e)

[Data](https://github.com/debbiemarkslab/MuE/blob/master/models/examples/ve6_full.fasta)

In [None]:
import json

In [None]:
import numpy as np
import torch
from torch.optim import Adam

import pyro
from mue.dataloaders import BiosequenceDataset
from mue.models import ProfileHMM
from pyro.optim import MultiStepLR

[MuE](https://github.com/pyro-ppl/pyro/tree/dev/pyro/contrib/mue)

In [None]:
file = './ve6_full.fasta'

In [None]:
# dataset = BiosequenceDataset(
#     file,
#     'fasta',
#     alphabet= 'amino-acid',
#     include_stop=False,
#     device='cpu'

# )

### Generating Small Dataset

In [None]:

mult_dat = 10
seqs = ["BABBA"] * mult_dat + ["BAAB"] * mult_dat + ["BABBB"] * mult_dat
dataset = BiosequenceDataset(
        seqs, "list", "AB", include_stop=True, device='cpu'
    )

In [None]:
latent_seq_length = int(dataset.max_length * 1.1)
latent_seq_length

In [None]:
batch_size = 2
split = 0.2

### Model

In [None]:
model = ProfileHMM(
    latent_seq_length=latent_seq_length,
    alphabet_length= dataset.alphabet_length,
    prior_scale=1.0,
    cuda=False,
    indel_prior_bias= 10.0,
    pin_memory=False
)

In [None]:
scheduler = MultiStepLR(
    {
        'optimizer': Adam,
        'optim_args': {'lr':0.001},
        'milestones': json.loads("[]"),
        'gamma': 0.5
    }
)

In [13]:
torch.set_default_dtype(torch.float64)

In [None]:
losses = model.fit_svi(dataset, 
                        scheduler=scheduler, 
                    )