Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More helpful error when specifying side=BACK on edge n-gram tokenizer/tokenfilter #3489

Closed
dbertram opened this issue Aug 12, 2013 · 8 comments

Comments

@dbertram
Copy link

From what I can tell, as of Lucene 4.4/elasticsearch 0.90.2 the "side=BACK" variation of edgeNGram tokenizer/tokenfilter has not only been deprecated, but now throws an ElasticSearchIllegalArgumentException:

org.elasticsearch.ElasticSearchIllegalArgumentException: side=BACK is not supported anymore. Please fix your analysis chain or use an older compatibility version (<=4.2) but beware that it might cause highlighting bugs.

That's fine, but the exception message as well as the documentation page for edgeNGram on elsaticsearch.org don't provide any information on how you can/should fix your analysis chain.

http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer/

There used to be a side parameter up to 0.90.1 but it is now deprecated.

This change also wasn't mentioned/linked in any release notes (at least not that I could find).

After some digging, I found this ticket in Lucene's Jira instance that explains a) what changed and b) how to work around this issue if you were using side=BACK previously:

https://issues.apache.org/jira/browse/LUCENE-3907?focusedCommentId=13653011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13653011

The elasticsearch docs and the exception message should provide some guidance on how to workaround this backwards-incompatible change.

@s1monw
Copy link
Contributor

s1monw commented Aug 13, 2013

I agree we should be more verbose regarding how to gain the previous behaviour. Yet, I have to say this is bw compatible given you have an already existing index that option will still work ok. You can also use the option if you specify an old version in the mapping like '{ "version" : "4.2", ...}' I will go ahead and fix the documentation and the exception for this tokenizer and related token filters. Thanks for reporting.

simon

@ghost ghost assigned s1monw Aug 13, 2013
@dbertram
Copy link
Author

Thanks, Simon! Sorry about the mix-up regarding whether or not this was backwards incompatible. Our setup does automatic mapping updates, so from my perspective all I did was update from 0.90.1 to 0.90.3 and then I started seeing exceptions. I'll be more careful before filing issues in the future. :)

@s1monw
Copy link
Contributor

s1monw commented Aug 13, 2013

@dbertram I think it's good to file issues! First other people can easily google for it and 2. they see how to resolve it. I just wanted to make sure that folks understand that it's not breaking anything. Thanks for opening this. Just to confirm, you move to a setup where you have "filter" : ["reverse", "edgeNGram", "reverse"] right?

@dbertram
Copy link
Author

Correct. I've also tested this using the _analyze API to ensure the old side=BACK tokenizer and the new "analyzer with reverse, edgeNGram, reverse token filters" produce the same tokens. The offsets are different, but from what I've read that was the entire point of modifying the edgeNGram tokenizer/token filter in the first place.

Just one thing I wanted to clarify wrt the "backwards compatible change" aspect:

If you have an index created with 0.90.1 that defines an edgeNGram tokenizer using side=BACK and then upgrade to 0.90.3, you do start getting exceptions without changing the mapping when you try to index documents that use that tokenizer:

org.elasticsearch.ElasticSearchIllegalArgumentException: side=BACK is not supported anymore. Please fix your analysis ch
ain or use an older compatibility version (<=4.2) but beware that it might cause highlighting bugs.
        at org.elasticsearch.index.analysis.EdgeNGramTokenizerFactory.create(EdgeNGramTokenizerFactory.java:66)
        at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:83)
        at org.apache.lucene.analysis.CustomAnalyzerWrapper.createComponents(CustomAnalyzerWrapper.java:60)
        at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:66)
        at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:177)
        at org.apache.lucene.document.Field.tokenStream(Field.java:552)
        at org.elasticsearch.index.mapper.core.StringFieldMapper$StringField.tokenStream(StringFieldMapper.java:360)
        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
        at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:323)
        at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:398)
        at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1288)
        at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1248)
        at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:577)
        at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:490)
        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:341)
        at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:207)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationActi
on.performOnPrimary(TransportShardReplicationOperationAction.java:521)
        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationActi
on$1.run(TransportShardReplicationOperationAction.java:419)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

So wouldn't that qualify as a backwards-incompatible change? Or am I misunderstanding something?

@s1monw
Copy link
Contributor

s1monw commented Aug 13, 2013

@dbertram this is what I feared! This should not be the case, so this seems to be a bug! I will try to come up with a test that reproduces this.

s1monw added a commit to s1monw/elasticsearch that referenced this issue Aug 14, 2013
The '"side" : "back"' parameter is not supported anymore on
EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3
The 'EdgeNgramTokenFilter' handles this automatically wrapping the
'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this
optimization is not possible. This commit also add a more verbose error message
how to work around this limitation.

Closes elastic#3489
@s1monw s1monw closed this as completed in 4a15106 Aug 14, 2013
s1monw added a commit that referenced this issue Aug 14, 2013
The '"side" : "back"' parameter is not supported anymore on
EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3
The 'EdgeNgramTokenFilter' handles this automatically wrapping the
'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this
optimization is not possible. This commit also add a more verbose error message
how to work around this limitation.

Closes #3489
@s1monw
Copy link
Contributor

s1monw commented Aug 14, 2013

I found out what caused the problems with upgrading. Unfortunately the check was not smart enough to handle the case correctly so I fixe the upgrade path. This should now work just fine to upgrade from 0.90.1 to 0.90.3. The problem here was essentially that we checked on the lucene version we added this with but 0.90.1 was already release with this lucene version and we manually ported the fixed tokenizers. So I needed to check for the actual ES version to make this work correctly.

@dbertram
Copy link
Author

Ah. Glad you got the upgrade path smoothed out. Thanks, Simon!

@s1monw
Copy link
Contributor

s1monw commented Aug 15, 2013

@dbertram thanks for bringing it up!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
The '"side" : "back"' parameter is not supported anymore on
EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3
The 'EdgeNgramTokenFilter' handles this automatically wrapping the
'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this
optimization is not possible. This commit also add a more verbose error message
how to work around this limitation.

Closes elastic#3489
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants