Skip to content

CivicDataLab/Samoses

Repository files navigation

Samoses

License: MIT

Wrapper for Moses

Setup

Installation of Moses

Clone the Moses repository here

Follow the instructions here to install Moses

DATASET

Find the nyu corpus for English-Hindi here

Pipeline Creation Language (PCL)

PCL is a Domain Specific Language to construct non-recurrent software pipelines. We are using PCL to build pipeline for the Statistical Machine Translation.

RUN

sudo nice experiment.perl -config config_nyu -exec

The phrase table generated is phrase-table.1.gz and not phrase-table.1. Convert the phrase table to the compact format using:

sudo nice $WORKSPACE/mosesdecoder/bin/processPhraseTableMin -in $WORKSPACE/experiment/model/phrase-table.1.gz -nscores 4 -out $WORKSPACE/experiment/model/phrase-table

Similarly convert the reodering table:

sudo nice $WORKSPACE/mosesdecoder/bin/processLexicalTableMin -in $WORKSPACE/experiment/model/reordering-table.1.wbe-msd-bidirectional-fe.gz -out $WORKSPACE/experiment/model/reordering-table

Modify moses.tuned.ini.1 under tuning directory with:

# PhraseDictionaryMemory name=TranslationModel0 num-features=4 path=/home/ubuntu/mosesdecoder/experiment/model/phrase-table.1 input-factor=0 output-factor=0

PhraseDictionaryCompact name=TranslationModel0 num-features=4 path=/home/ubuntu/mosesdecoder/experiment/model/phrase-table.minphr input-factor=0 output-factor=0

#LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=/home/ubuntu/mosesdecoder/experiment/model/reordering-table.1.wbe-msd-bidirectional-fe.gz

LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=/home/ubuntu/mosesdecoder/experiment/model/reordering-table

Translate

$WORKSPACE/mosesdecoder/bin/moses -f $WORKSPACE/experiment/tuning/moses.tuned.ini.1

About

wrapper for Moses

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors