Skip to content

aezero/mt_decoder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

##Decoder

####CIS526, Machine Translation, HW2

Sean Welleck

This project is related to decoding a source sentence by maximizing the probability of the target sentence.

The project contains three decoders:

And functions to combine two decodings.

Run python decode > output.txt to decode using the default input files, and output the translations to output.txt.

Run python combine -x filename1 -y filename2 > combined.txt to combine two decoded files, by choosing the higher scoring sentence, and output to combined.txt.


#####Algorithm

  1. Decode with the monotone decoder.
  2. Decode with the greedy decoder, using the decodings from (1) as the initial seed decoding.
  3. Save decodings from (2).
  4. Decode with the stack decoder.
  5. Decode with the greedy decoder, using the decodings from (4) as the initial seed decoding.
  6. Combine decodings from (5) and (2).

#####Other

  1. Uses a combination of histogram pruning and threshold pruning.
  2. Uses <= 40 translations per phrase.

#####decoder.py Contains the decoder implementations in a single Decoder class.

Contains the top-level user functions decode() and combine(). #####evaluator.py Contains an Evaluator class that adapts the grading function.

Used to choose between two sentence translations while combining decodings.

Used as an alternative scoring function for the greedy decoder. Due to performance, I ended up just using my original, simpler scoring function.

About

Decoder for Machine Translation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published