Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Documentation.

  • Loading branch information...
commit ad0e5badc92499ab9a30b42771fa763794ba966a 1 parent 87a1698
@arnsholt authored
Showing with 95 additions and 10 deletions.
  1. +9 −10 README
  2. +86 −0 lib/Algorithm/Viterbi.pm
View
19 README
@@ -1,12 +1,11 @@
-Algorithm::Viterbi - Decoding Hidden Markov Models
-==============================================================================
+This module provides HMM decoding for Perl 6.
-This module contains an implementation of the Viterbi algorithm for decoding
-Hidden Markov Models (HMMs). It contains a class Algorithm::Viterbi, which
-handles the actual decoding of the HMM, and Algorithm::Viterbi::HMM which
-represents the HMM.
+To install it, use UFO (http://github.com/masak/ufo):
+ $ ufo
+ $ make test install
-XXX: An implementation of the Forward algorithm would probably use the same
-datastructure, so it would make sense to share the HMM class between them. How
-to best to this? Inverse the hierarchy, and have Algorithm::HMM contain all
-things HMM, or some other form of cleverness? See how CPAN handles this.
+Documentation is provided as POD and extensive in-code comments, see those for
+information on the usage of the module.
+
+Bugs, feedback, whatever can be sent to the author on arnsholt@gmail.com, or
+on the project's GitHub pages: http://github.com/arnsholt/Algorithm-Viterbi
View
86 lib/Algorithm/Viterbi.pm
@@ -187,3 +187,89 @@ multi method train($hmm: @input) {
}
}
}
+
+=begin pod
+
+=head1 NAME
+
+Algorithm::Viterbi - Decoding HMMs
+
+=head1 DESCRIPTION
+
+This module is a fairly straightforward implementation of Viterbi's algorithm
+for decoding hidden Markov models. The code is based on a Common Lisp
+implementation I wrote as coursework, itself based on pseudo-code from
+Jurafsky & Martin - Speech and language processing (2nd ed).
+
+=head1 SYNOPSIS
+
+=begin code
+
+ use Algorithm::Viterbi;
+
+ my Algorithm::Viterbi $hmm .= new(:alphabet<H C>);
+ $hmm.train("training-data.tt"); # Train from file
+ $hmm.train([ [a => 1, b => 2, a => 1],
+ [b => 3, c => 1, a => 2] ]); # Train from hardcoded data
+ $hmm.decode(<a b c>);
+
+=end code
+
+=head1 FIELDS
+
+=over 4
+
+=item %.p-transition
+
+The transition probabilities. A hash of hashes, indexed by tag name.
+
+=item %.p-emission
+
+The emission probabilities for a given tag. A hash of hashes, indexed first by
+tag, then by observation.
+
+=back
+
+=head1 METHODS
+
+=over 4
+
+=item method new(:@alphabet)
+
+The constructor takes a single argument: a list of the tag names used.
+
+=item method decode(Str @input)
+
+The C<decode> method decodes the input according to the probabilities
+specified in the C<%.p-transition> and C<%.p-emission> fields.
+
+=item method train(Str $file)
+
+Computes unsmoothed bigram probabilities from an input file. The input format
+is described by this grammar:
+
+=begin code
+
+ grammar G {
+ token TOP { <chunk>+ }
+ token chunk { <record>+ \n }
+ token record { \w+ \t \w+ \n }
+ }
+
+=end code
+
+The records are observation, then the associated tag.
+
+=item method train(Array of Pair @data)
+
+Computes unsmoothed bigram probabilities from an Array of Array of Pairs.
+Each pair is a single observation-tag pair, and each element of the top-level
+array is a sequence that is learnt.
+
+=back
+
+=head1 AUTHOR
+
+Arne Skjærholt - L<mailto:arnsholt@gmail.com>.
+
+=end pod
Please sign in to comment.
Something went wrong with that request. Please try again.