## [Example: Constant + MuE (Profile HMM)](http://pyro.ai/examples/mue_profile.html#example-constant-mue-profile-hmm)

### A standard profile HMM model [1], which corresponds to a constant (delta function) distribution with a MuE observation [2]. 
#### This is a standard generative model of variable-length biological sequences (e.g. proteins) which does not require preprocessing the data by building a multiple sequence alignment (MSA). It can be compared to a more complex MuE model in this package, the FactorMuE.

### References:
[1] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison (1998)
"Biological sequence analysis: probabilistic models of proteins and nucleic
acids"
Cambridge university press

[2] E. N. Weinstein, D. S. Marks (2021)
"A structured observation distribution for generative biological sequence
prediction and forecasting"
https://www.biorxiv.org/content/10.1101/2020.07.31.231381v2.full.pdf

In [None]:
%%sh
curl -O https://raw.githubusercontent.com/debbiemarkslab/MuE/master/models/examples/ve6_full.fasta

[Data](https://github.com/debbiemarkslab/MuE/blob/master/models/examples/ve6_full.fasta)

In [None]:
import argparse
import datetime
import json
import os

import matplotlib.pyplot as plt
import numpy as np
import torch
from torch.optim import Adam

import pyro
from pyro.contrib.mue.dataloaders import BiosequenceDataset
from pyro.contrib.mue.models import ProfileHMM
from pyro.optim import MultiStepLR

[MuE](https://github.com/pyro-ppl/pyro/tree/dev/pyro/contrib/mue)

In [None]:
file = './ve6_full.fasta'

In [None]:
seqs = []
seq = ""
with open(file, 'r') as fr:
    for line in fr:
        if line[0] == '>':
           if seq !="":
                seq += "*"
                seqs.append(seq)
                seq=""
        else:
            seq += line.strip('\n')
 

In [34]:
len(seqs)

1608

In [29]:
dataset = BiosequenceDataset(
    file,
    'fasta',
    alphabet= 'amino-acid',
    include_stop=True,
    device='cpu'

)

In [30]:
dataset.data_size

1609

In [31]:
dataset.alphabet_length

21

In [32]:
dataset.max_length

159

In [33]:
len(dataset)

1609