Skip to content

Make Tokenizers deliver their final offsets [LUCENE-5386] #6449

@asfimport

Description

@asfimport

Tokenizers must have an implementation of #end() in which they set up the final offset. Currently, nothing enforces this. end() has a useful implementation in TokenStream, so just making it abstract is not attractive.

Proposal: add

abstract int finalOffset();

to tokenizer, and then make

void end() {
    super.end();
    int fo = finalOffset();
   offsetAttr.setOffsets(fo, fo);

}

or something to that effect.

Other alternative to be considered depending on how this looks.


Migrated from LUCENE-5386 by Benson Margulies (@bimargulies-google), updated Jan 10 2014

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions