miRvestigator hidden Markov Model (HMM)

cplaisier edited this page Oct 25, 2010 · 1 revision
Clone this wiki locally

What is the miRvestigator hidden Markov model (HMM)?
Regulation by miRNAs generates co-expression signatures which can be identified from transcriptome or proteome studies. The 3’ UTR of the genes comprising the co-expression signature will be enriched for sequences complementary to the miRNA seed sequence. There are many methods available to identify enriched sequence motifs from the 3’ UTR of genes, and our recommendation is the Weeder algorithm as it has proven to be quite effective at this job http://www.ncbi.nlm.nih.gov/pubmed/18411406, http://www.ncbi.nlm.nih.gov/pubmed/19553345. However, the identification of sequence motifs has been the stopping point as there was no tool available to compare the sequence motif (a matrix of nucleotide probabilities as long as the motif) to a database of miRNA seed sequences (a string of nucleotides). The miRvestigator HMM provides that service by turning the sequence motif into a profile HMM and then applying the Viterbi algorithm to first align each miRNA to the sequence motif and then provide a probability for the match. We took this one step further by calculating the Viterbi probabilities for exhaustively for the potential seed sequences (6mers, 7mers or 8mers) to provide a p-value for the Viterbi match given the sequence motif. We have validated the accuracy of this using the ROC AUC for the range of possible seed nucleotide frequencies and have found that it extremely accurate up to the point where the signal drops to noise (seed nucleotide frequency ~ 0.25).

Implementation
The sequence motif is assumed to be position specific scoring matrix (PSSM) which is then compared to every miRNA seed sequence from miRBase. The miRvestigator HMM is implemented as a Python object, which can be easily integrated into any Python code or wrapped into a command line tool very easily. We prefer to use it as a native Python object and have developed a web server implementation using Pyro and mod_python in Apache which can be found her on github or at http://mirvestigator.systemsbiology.org. Please refer to the README for specifics, and feel free to contact us with any comments or concerns.

Developed at the Institute for Systems Biology in the Baliga Lab by Christopher Plaisier (cplaisier (at) systemsbiology (dot) org).