Maximum sentence length is not really the maximum sentence length. #32

dmcc · 2015-04-20T20:39:35Z

It seems that there are (at least) two off-by-one errors with these calculations:

shell% ./parseIt -l399 ../DATA/EN 398.sgml
<doesn't crash, gives dummy parse>
shell% ./parseIt -l399 ../DATA/EN 399.sgml
parseIt: GotIter.C:73: void LeftRightGotIter::makelrgi(Edge*): Assertion `i < 400' failed.
<segfaults>
shell% ./parseIt -l399 ../DATA/EN 400.sgml
<doesn't crash, sentence is "skipped" and dummy parse is printed instead>

The obvious workaround is to only parse things that are two fewer than the maximum sentence length (unlikely to be much of an issue in practice).

The text was updated successfully, but these errors were encountered:

Turns out max_sentence_length (MAXSENTLEN) is not really the maximum sentence length but two more than it (yikes). See issue #32. first-stage/PARSE/parseIt.C: fix check to avoid segfaults for sentence lengths one fewer than user-requested maximum length. python/bllipparser/RerankingParser.py: fixed similar check as above python/tests/test_reranking_parser.py: added long sentence tests

dmcc added the parser label Apr 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum sentence length is not really the maximum sentence length. #32

Maximum sentence length is not really the maximum sentence length. #32

dmcc commented Apr 20, 2015

Maximum sentence length is not really the maximum sentence length. #32

Maximum sentence length is not really the maximum sentence length. #32

Comments

dmcc commented Apr 20, 2015