Skip to content

Discourse Parsers Details

bsharpataz edited this page Nov 27, 2017 · 7 revisions

How to regenerate the discourse models

Note: this is of limited utility to regular users; but it might be useful to developers interested in modifying (i.e., retraining) the discourse parsers.

To regenerate the constituent-syntax model, run this command:

    sbt 'run-main org.clulab.discourse.rstparser.RSTParserMain -train /data/nlp/corpora/RST_cached_preprocessing/rst_train -model model.const.rst.gz'

This will generate the model file, model.const.rst.gz, in the current directory. To evaluate, move model.const.rst.gz to main/src/main/resources/. and run:

    sbt 'run-main org.clulab.discourse.rstparser.RSTParserMain -test /data/nlp/corpora/RST_cached_preprocessing/rst_test -model model.const.rst.gz'

To regenerate the dependency-syntax model, run this command:

    sbt 'run-main org.clulab.discourse.rstparser.RSTParserMain -train /data/nlp/corpora/RST_cached_preprocessing/rst_train -model model.dep.rst.gz -dep'

Similarly, this command generates the model that uses only dependency information: model.dep.rst.gz, in the current directory. To evaluate, move model.dep.rst.gz to main/src/main/resources/. and run:

    sbt 'run-main org.clulab.discourse.rstparser.RSTParserMain -test /data/nlp/corpora/RST_cached_preprocessing/rst_test -model model.dep.rst.gz -dep'

Note that the latter dependency-based model is both faster and more accurate than the former constituent-based one.