Skip to content
This repository has been archived by the owner. It is now read-only.
Permalink
Browse files
updated pipeline doc for 6.0.3 release
  • Loading branch information
mjpost committed Jun 1, 2015
1 parent 81748f5 commit f956df750815d8170e368fa5b8cce359d792a800
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 27 deletions.
@@ -4,6 +4,9 @@ category: links
title: The Joshua Pipeline
---

*Please note that the Joshua 6.0.3 included some big changes to directory organization of the
pipeline's files.*

This page describes the Joshua pipeline script, which manages the complexity of training and
evaluating machine translation systems. The pipeline eases the pain of two related tasks in
statistical machine translation (SMT) research:
@@ -164,13 +167,13 @@ generated by the individual sub-steps).
corpus.en
thrax-input-file
tune/
tune.tok.lc.ur
tune.tok.lc.en
corpus.ur -> tune.tok.lc.ur
corpus.en -> tune.tok.lc.en
grammar.filtered.gz
grammar.glue
test/
test.tok.lc.ur
test.tok.lc.en
corpus.ur -> test.tok.lc.ur
corpus.en -> test.tok.lc.en
grammar.filtered.gz
grammar.glue
alignments/
@@ -182,14 +185,14 @@ generated by the individual sub-steps).
grammar.gz
lm.gz
tune/
1/
decoder_command
joshua.config
params.txt
joshua.log
mert.log
joshua.config.ZMERT.final
final-bleu
decoder_command
model/
[model files]
params.txt
joshua.log
mert.log
joshua.config.final
final-bleu

These files will be described in more detail in subsequent sections of this tutorial.

@@ -554,17 +557,11 @@ memory specification (passed to its `-Xmx` flag).

Two optimizers are provided with Joshua: MERT and PRO (`--tuner {mert,pro}`). If Moses is
installed, you can also use Cherry & Foster's k-best batch MIRA (`--tuner mira`, recommended).
Tuning is run till convergence in the `$RUNDIR/tune/N` directory, where N is the tuning instance.
By default, tuning is run just once, but the pipeline supports running the optimizer an arbitrary
number of times due to [recent work](http://www.youtube.com/watch?v=BOa3XDkgf0Y) pointing out the
variance of tuning procedures in machine translation, in particular MERT. This can be activated
with `--optimizer-runs N`. Each run can be found in a directory `$RUNDIR/tune/N`.
Tuning is run till convergence in the `$RUNDIR/tune` directory.

When tuning is finished, each final configuration file can be found at either

$RUNDIR/tune/N/joshua.config.final

where N varies from 1..`--optimizer-runs`.
$RUNDIR/tune/joshua.config.final

## <a id="testing" /> 7. Testing

@@ -583,11 +580,6 @@ number of arguments:

This tells the decoder to start at the test step.

- `--name NAME`

A name is needed to distinguish this test set from the previous ones. Output for this test run
will be stored at `$RUNDIR/test/NAME`.

- `--joshua-config CONFIG`

A tuned parameter file is required. This file will be the output of some prior tuning run.
@@ -1,2 +1,2 @@
release_version: 6.0.2
release_date: April 10, 2015
release_version: 6.0.3
release_date: June 1, 2015

0 comments on commit f956df7

Please sign in to comment.