Tokenizers must have an implementation of #end() in which they set up the final offset. Currently, nothing enforces this. end() has a useful implementation in TokenStream, so just making it abstract is not attractive.
Proposal: add
abstract int finalOffset();
to tokenizer, and then make
void end() {
super.end();
int fo = finalOffset();
offsetAttr.setOffsets(fo, fo);
}
or something to that effect.
Other alternative to be considered depending on how this looks.
Migrated from LUCENE-5386 by Benson Margulies (@bimargulies-google), updated Jan 10 2014
Tokenizers must have an implementation of #end() in which they set up the final offset. Currently, nothing enforces this. end() has a useful implementation in TokenStream, so just making it abstract is not attractive.
Proposal: add
abstract int finalOffset();
to tokenizer, and then make
}
or something to that effect.
Other alternative to be considered depending on how this looks.
Migrated from LUCENE-5386 by Benson Margulies (@bimargulies-google), updated Jan 10 2014