New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More helpful error when specifying side=BACK on edge n-gram tokenizer/tokenfilter #3489
Comments
I agree we should be more verbose regarding how to gain the previous behaviour. Yet, I have to say this is bw compatible given you have an already existing index that option will still work ok. You can also use the option if you specify an old version in the mapping like '{ "version" : "4.2", ...}' I will go ahead and fix the documentation and the exception for this tokenizer and related token filters. Thanks for reporting. simon |
Thanks, Simon! Sorry about the mix-up regarding whether or not this was backwards incompatible. Our setup does automatic mapping updates, so from my perspective all I did was update from 0.90.1 to 0.90.3 and then I started seeing exceptions. I'll be more careful before filing issues in the future. :) |
@dbertram I think it's good to file issues! First other people can easily google for it and 2. they see how to resolve it. I just wanted to make sure that folks understand that it's not breaking anything. Thanks for opening this. Just to confirm, you move to a setup where you have |
Correct. I've also tested this using the _analyze API to ensure the old side=BACK tokenizer and the new "analyzer with reverse, edgeNGram, reverse token filters" produce the same tokens. The offsets are different, but from what I've read that was the entire point of modifying the edgeNGram tokenizer/token filter in the first place. Just one thing I wanted to clarify wrt the "backwards compatible change" aspect: If you have an index created with 0.90.1 that defines an edgeNGram tokenizer using side=BACK and then upgrade to 0.90.3, you do start getting exceptions without changing the mapping when you try to index documents that use that tokenizer: org.elasticsearch.ElasticSearchIllegalArgumentException: side=BACK is not supported anymore. Please fix your analysis ch
ain or use an older compatibility version (<=4.2) but beware that it might cause highlighting bugs.
at org.elasticsearch.index.analysis.EdgeNGramTokenizerFactory.create(EdgeNGramTokenizerFactory.java:66)
at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:83)
at org.apache.lucene.analysis.CustomAnalyzerWrapper.createComponents(CustomAnalyzerWrapper.java:60)
at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:66)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:177)
at org.apache.lucene.document.Field.tokenStream(Field.java:552)
at org.elasticsearch.index.mapper.core.StringFieldMapper$StringField.tokenStream(StringFieldMapper.java:360)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:323)
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:398)
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1288)
at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1248)
at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:577)
at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:490)
at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:341)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:207)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationActi
on.performOnPrimary(TransportShardReplicationOperationAction.java:521)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationActi
on$1.run(TransportShardReplicationOperationAction.java:419)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619) So wouldn't that qualify as a backwards-incompatible change? Or am I misunderstanding something? |
@dbertram this is what I feared! This should not be the case, so this seems to be a bug! I will try to come up with a test that reproduces this. |
The '"side" : "back"' parameter is not supported anymore on EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3 The 'EdgeNgramTokenFilter' handles this automatically wrapping the 'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this optimization is not possible. This commit also add a more verbose error message how to work around this limitation. Closes elastic#3489
The '"side" : "back"' parameter is not supported anymore on EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3 The 'EdgeNgramTokenFilter' handles this automatically wrapping the 'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this optimization is not possible. This commit also add a more verbose error message how to work around this limitation. Closes #3489
I found out what caused the problems with upgrading. Unfortunately the check was not smart enough to handle the case correctly so I fixe the upgrade path. This should now work just fine to upgrade from |
Ah. Glad you got the upgrade path smoothed out. Thanks, Simon! |
@dbertram thanks for bringing it up! |
The '"side" : "back"' parameter is not supported anymore on EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3 The 'EdgeNgramTokenFilter' handles this automatically wrapping the 'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this optimization is not possible. This commit also add a more verbose error message how to work around this limitation. Closes elastic#3489
From what I can tell, as of Lucene 4.4/elasticsearch 0.90.2 the "side=BACK" variation of edgeNGram tokenizer/tokenfilter has not only been deprecated, but now throws an ElasticSearchIllegalArgumentException:
That's fine, but the exception message as well as the documentation page for edgeNGram on elsaticsearch.org don't provide any information on how you can/should fix your analysis chain.
http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer/
This change also wasn't mentioned/linked in any release notes (at least not that I could find).
After some digging, I found this ticket in Lucene's Jira instance that explains a) what changed and b) how to work around this issue if you were using side=BACK previously:
https://issues.apache.org/jira/browse/LUCENE-3907?focusedCommentId=13653011&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13653011
The elasticsearch docs and the exception message should provide some guidance on how to workaround this backwards-incompatible change.
The text was updated successfully, but these errors were encountered: