Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum sentence length is not really the maximum sentence length. #32

Open
dmcc opened this issue Apr 20, 2015 · 0 comments
Open

Maximum sentence length is not really the maximum sentence length. #32

dmcc opened this issue Apr 20, 2015 · 0 comments
Labels

Comments

@dmcc
Copy link
Member

dmcc commented Apr 20, 2015

It seems that there are (at least) two off-by-one errors with these calculations:

shell% ./parseIt -l399 ../DATA/EN 398.sgml
<doesn't crash, gives dummy parse>
shell% ./parseIt -l399 ../DATA/EN 399.sgml
parseIt: GotIter.C:73: void LeftRightGotIter::makelrgi(Edge*): Assertion `i < 400' failed.
<segfaults>
shell% ./parseIt -l399 ../DATA/EN 400.sgml
<doesn't crash, sentence is "skipped" and dummy parse is printed instead>

The obvious workaround is to only parse things that are two fewer than the maximum sentence length (unlikely to be much of an issue in practice).

@dmcc dmcc added the parser label Apr 20, 2015
dmcc added a commit that referenced this issue Apr 20, 2015
Turns out max_sentence_length (MAXSENTLEN) is not really the maximum
sentence length but two more than it (yikes). See issue #32.

first-stage/PARSE/parseIt.C: fix check to avoid segfaults for sentence
    lengths one fewer than user-requested maximum length.
python/bllipparser/RerankingParser.py: fixed similar check as above
python/tests/test_reranking_parser.py: added long sentence tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant