Skip to content

Bulk parsing

akoller edited this page Jun 9, 2015 · 1 revision

Bulk Parsing

It is sometimes useful to parse a large number of inputs with the same IRTG, e.g. in evaluation. This is what the Bulk Parsing feature is for.

The easiest way to bulk parse is to open an IRTG using the Alto GUI, and then to select the menu item "Tools -> Bulk Parse ...". You will first be prompted to select an input file. This can be any unannotated corpus that fits your IRTG, i.e. the interpretations in the corpus file must all be defined in your grammar. Note that not all interpretations of your IRTG need to be present in the corpus yet.

Screen Shot 2015-06-09 at 15.23.49.png

After you have selected an input corpus, you will be prompted to optionally select a file with precomputed parse charts.

Screen Shot 2015-06-09 at 15.26.18.png

If you have previously computed the parse charts of this particular unannotated corpus with this particular grammar, and have saved them to a Zip file, you can do select this file now to avoid recomputing the charts and maybe save some time. Otherwise you can simply cancel. (If you cancel, all the charts will be computed and written into a file for later use, so you can load them the next time. Check the messages in the main Alto GUI window for details.)

Finally, you will be prompted for a file into which you would like to save the parsing results.

Screen Shot 2015-06-09 at 15.27.44.png

Once you click on "Save", Alto will parse all instances in your input corpus. For each instance, it will compute the best derivation tree given the inputs in that instance, using the Viterbi algorithm. It will then map this derivation tree to all interpretations of the IRTG, obtaining a tuple of values in the different algebras. Finally, it will write these completed instances (derivation tree plus all values) into the output corpus that you specified in the "Save" step, in the same order as the instances in the input corpus.