Skip to content

GarrettNicolai/M2MP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Welcome to DTLM, a semi-supervised seq2seq system for low-resource transduction.

Note that due to submission size constraints, we could not include all the necessary libraries to make DTLM and M2M.

M2M further requires the tclap library, while DTLM requires SVMlight, STLPort, and tclap.

To take full advantage of DTLM, you will require 2 additional resources, possibly compiled from an unannotated corpus:

A language model file, in the .ARPA format. For this paper, LMs were constructed using the CMU language modeling toolkit, but all that should be necessary is a language model in .ARPA format.

A word list consisting of counts and words, separated by a tab.

ie:

200214 the 192245 a 46030 is

To start, your training file should contain source and target, separated by a tab. All individual characters in the file should be separated by spaces (_ and | are reserved characters, so please replace them with something else):

w $ k # z w a l k e r s

Call m2m-aligner with the following command:

./m2m+ --maxX 1 --maxY 1 --delX --delY -i train.txt -o train.pass1

This will produce the primary alignment:

w||$|k||#|z| w|a|l|k|e|r|s|

Replace all pipes with spaces:

sed -i 's/|/ /g' train.pass1

w _ $ k _ # z w a l k e r s

And run the second pass of the aligner:

./m2m+ --maxX 1 --maxY 1 -i train.pass1 -o train.pass2

w|:$|k|:r|z| w|a:l|k|e:r|s|

Remove all null mergings:

sed -i 's/://g' train.pass2 sed -i 's/://g' train.pass2

w|$|k|r|z| w|a:l|k|e:r|s|

And run DTLM:

./DTLM --cs 3 --ng 7 --jointMgram 3 --inChar : --outChar ' ' -t test.txt -a test.out -f train.pass2 --mo modelName

Note that the test file should have the source only, separated by :

r:U:n:#:z

If you want to test on another file later, use the second-to-last generated model (DTLM runs until convergence, so the last model will usually be slightly worse than the second-to-last).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published