Rule scoring framework

This repository contains the set of scripts that has been used for the AMTA paper.

Johann Roturier, Linda Mitchell, Robert Grabowski and Melanie Siegel: Using Automatic Machine Translation Metrics to Analyze the Impact of Source Reformulations, AMTA 2012

The main script is score-rules.sh. The script...

runs the AutoApply client on a given input text file, creating a set of original and corrected segments in two parallel files, in addition to information about the flag in a third file
tokenizes and truecases the original and corrected segments using a source language truecaser and tokenizer
finds the reference translations of the original and corrected segments using a reference file that is parallel to the input file
translates the original and corrected segments using Moses (either via the Moses server XML-RPC interface, or via the Google Translate API)
tokenizes and truecases the translated original and corrected segments using a target language truecaser and tokenizer
tokenizes and truecases the reference translations using a target language truecaser and tokenizer
scores the tokenized original and corrected segments using a language model for the source language
scores the translated+tokenized original and corrected segments using a language model for the target language
scores the translated+tokenized original and corrected segments against the tokenized reference segments using smoothed BLEU, TER, and GTM
collects all scores and compares them, resulting a statistics CSV file with better/equal/worse rankings grouped by Acrolinx rule

Progress messages are output and written to a log file. The script usually does not create a data file if it already exists.

Usage

Prerequisites:

The script needs the following additional software:

autoApplyClient, which needs to be requested from Acrolinx
BLEU scorer
GTM scorer
TER scorer

Please look into the README.md files in the respective directories for information on how to obtain and/or compile the software.

Also, the script needs the tokenizer and truecaser scripts from the mosesdecoder repository at https://github.com/moses-smt/mosesdecoder. The root path to your local mosesdecoder repository is hard-coded in the script file as the variable MOSES_DIR at the beginning. Please change that first.

General:

score-rules.sh experiment-name text-file src-lmodel src-tcmodel src-toklang mosesServer ref-file-1 ref-file-2 tgt-lmodel tgt-tcmodel tgt-toklang [autoApplyOptions]

mosesServer:

either e.g. localhost:8081 (to translate via mosesserver XML-RPC),
or e.g. http://server/translate.php (to translate via Translate API)

autoApplyOptions:

e.g. -h host -p port -u user --pass pwd -l lang -r ruleset (run autoApplyClient.jar for complete list)

src-tcmodel and tgt-tcmodel:

can be "" to skip true-casing

Example:

bash score-rules.sh amta_experiment symc_bip_15_source.clean 1002-10.binlm truecase-model.1.en en 'http://user:pass@accept.statmt.org/demo/translate.php' symc_bip_17_target1.de.clean symc_bip_19_target2.de.clean 1002-06.binlm truecase-model.1.de de -h accept.acrolinx.com -p 80 -u USER --pass PASS -l en -r Preediting_SMT_SYMC -t Symantec -ts DEPRECATED,ADMITTED,VALID -skipReuseCheck

Output files:

amta_experiment.summary: The complete result, one record per correction instance: autoApply output, Moses translations, and all scores
amta_experiment.statistics.csv: The summary aggregated/grouped by rule.

A lot of other amta_experiment.* files are written into current directory, so it's best to run the script from a dedicated folder.

Notes

The script still contains a lot of code duplication that should be factored out to make it more flexible.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
autoApplyClient		autoApplyClient
bleu		bleu
doc		doc
extract-scripts		extract-scripts
gtm		gtm
ter		ter
LICENSE		LICENSE
README.md		README.md
accept-client.py		accept-client.py
adjoin.pl		adjoin.pl
createstats.pl		createstats.pl
findreftrans.sh		findreftrans.sh
merge-output.pl		merge-output.pl
score-lm.pl		score-lm.pl
score-ref.pl		score-ref.pl
score-rules.sh		score-rules.sh
tokenize.sh		tokenize.sh
translate-api.sh		translate-api.sh
translate.pl		translate.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rule scoring framework

Usage

Prerequisites:

General:

Example:

Notes

About

Releases

Packages

Languages

License

accept-project/accept-score-rules

Folders and files

Latest commit

History

Repository files navigation

Rule scoring framework

Usage

Prerequisites:

General:

Example:

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages