Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shingle filter should expose filler_token #4307

Closed
MrHash opened this issue Dec 1, 2013 · 13 comments
Closed

Shingle filter should expose filler_token #4307

MrHash opened this issue Dec 1, 2013 · 13 comments
Assignees

Comments

@MrHash
Copy link

MrHash commented Dec 1, 2013

Since Lucene 4.4 release enable_position_increment settings on token filters cannot be set to false which results in underscores appearing for filtered tokens in shingle filters.

@s1monw
Copy link
Contributor

s1monw commented Dec 1, 2013

I agree that shingle filter needs to have some options here but as a workaround you can specify a lucene_version with your stop filter to still set the enable_position_increment

@MrHash
Copy link
Author

MrHash commented Dec 1, 2013

OK thanks for the tip - that works for now. Presumably if position increment disabling has been removed from the Lucene core then the only workaround is to modify the shingle filter. I had a look at the Lucene source and it looks there is currently no method to override the filler token to an empty string which is currently hardcoded as an underscore.

@s1monw
Copy link
Contributor

s1monw commented Dec 1, 2013

@MrHash yes that is correct. I think there needs to be one. I hope I will be able to open an issue and fix that in lucene soon. Feel free to beat me!

@s1monw
Copy link
Contributor

s1monw commented Dec 1, 2013

here is an issue https://issues.apache.org/jira/browse/LUCENE-5353

@MrHash
Copy link
Author

MrHash commented Dec 1, 2013

Nice. I noticed also that the remove_trailing=false option of the stop filter also generates a filler token, despite enable_position_increments being set to false. Hopefully this update should also take care of this issue.

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2014

with lucene 4.7 we will be able to make the filler token configurable.

s1monw added a commit that referenced this issue Feb 19, 2014
Lucene 4.7 supports a setter for the `filler_token` that is
inserted if there are gaps in the token stream. This change exposes
this setting.

Closes #4307
@MrHash
Copy link
Author

MrHash commented Feb 24, 2014

Good stuff.

jpountz pushed a commit to jpountz/elasticsearch that referenced this issue Feb 26, 2014
Lucene 4.7 supports a setter for the `filler_token` that is
inserted if there are gaps in the token stream. This change exposes
this setting.

Closes elastic#4307
@s1monw s1monw closed this as completed in 9160516 Feb 26, 2014
s1monw added a commit that referenced this issue Feb 26, 2014
Lucene 4.7 supports a setter for the `filler_token` that is
inserted if there are gaps in the token stream. This change exposes
this setting.

Closes #4307
@alup
Copy link

alup commented Oct 17, 2014

By using filler_token = "" you may end up with duplicate tokens in the token stream. This is not the same behavior as it was by using enable_position_increment. Is there any way to bypass this problem?

@lalitkapoor
Copy link

+1

@clintongormley
Copy link

@alup Could you open a new issue and provide more detail of the problem please?

@alup
Copy link

alup commented Oct 29, 2014

Ok, when I find some time, I will come up with an example to denote the problem.

@apanimesh061
Copy link

@alup Did you find the work around for this? I did post a question here here as well.

@clintongormley
Copy link

@apanimesh061 could you open a new issue for this, and include a full JSON recreation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants