New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on MoreLikeThis API with Non Stored Numeric Fields #3252
Comments
any comments on that? I could try going around and store the fields(even though its not the best scenario...), but would be nice also having that without the need to reindex... |
@clintongormley cool :) |
@lmenezes You are right about why you got this error but unfortunately setting the value of the field instance even when the field is not stored won't work. The reason is that Lucene's MoreLikeThis can only work on top on character token streams and numeric fields are encoded as binary token streams. This issue is very similar to #3211, where we decided to ignore numeric fields when performing highlighting in order to match Elasticsearch 0.20 behavior. Maybe we should do the same here? @clintongormley what do you think? |
My feeling is that the If you want to treat numbers as "full text" then you can always use a So ++ for ignoring non-strings, I'd say. |
@jpountz @clintongormley I don't really agree, since if the numbers are ids for some kind of relation, they represent similarity as well or even better than matching tokens. But, if it's a lucene limitation, ignoring is definitely better than failing. Still, would be nice having that working on numeric fields(I guess that affects everything that internally is stored as a number, like ips?). |
@lmenezes This is correct, the limitation is in Lucene and this affects everything which is stored as a number, so byte, short, integer, long, float and double but also ips and dates. There might be options to support numbers in the future but right now I think the best fix to apply is to ignore numeric data from the mlt fields. |
@jpountz cool, waiting for the fix then :) |
…re-like-this or fuzzy-like-this queries. More-like-this and fuzzy-like-this queries expect analyzers which are able to generate character terms (CharTermAttribute), so unfortunately this doesn't work with analyzers which generate binary-only terms (BinaryTermAttribute, the default CharTermAttribute impl being a special BinaryTermAttribute) such as our analyzers for numeric fields (byte, short, integer, long, float, double but also date and ip). To work around this issue, this commits adds a fail_on_unsupported_field parameter to the more-like-this and fuzzy-like-this parsers. When this parameter is false, numeric fields will just be ignored and when it is true, an error will be returned, saying that these queries don't support numeric fields. By default, this setting is true but the mlt API sets it to true in order not to fail on documents which contain numeric fields. Close elastic#3252
The mlt API uses the mlt query, so I updated to pull request:
|
sounds good 👍 |
…re-like-this or fuzzy-like-this queries. More-like-this and fuzzy-like-this queries expect analyzers which are able to generate character terms (CharTermAttribute), so unfortunately this doesn't work with analyzers which generate binary-only terms (BinaryTermAttribute, the default CharTermAttribute impl being a special BinaryTermAttribute) such as our analyzers for numeric fields (byte, short, integer, long, float, double but also date and ip). To work around this issue, this commits adds a fail_on_unsupported_field parameter to the more-like-this and fuzzy-like-this parsers. When this parameter is false, numeric fields will just be ignored and when it is true, an error will be returned, saying that these queries don't support numeric fields. By default, this setting is true but the mlt API sets it to true in order not to fail on documents which contain numeric fields. Close #3252
…re-like-this or fuzzy-like-this queries. More-like-this and fuzzy-like-this queries expect analyzers which are able to generate character terms (CharTermAttribute), so unfortunately this doesn't work with analyzers which generate binary-only terms (BinaryTermAttribute, the default CharTermAttribute impl being a special BinaryTermAttribute) such as our analyzers for numeric fields (byte, short, integer, long, float, double but also date and ip). To work around this issue, this commits adds a fail_on_unsupported_field parameter to the more-like-this and fuzzy-like-this parsers. When this parameter is false, numeric fields will just be ignored and when it is true, an error will be returned, saying that these queries don't support numeric fields. By default, this setting is true but the mlt API sets it to true in order not to fail on documents which contain numeric fields. Close elastic#3252
According to the documentation:
Note: In order to use the mlt feature a mlt_field needs to be either be stored, store term_vector or source needs to be enabled.
But,running this:
fails(second query) with:
{"error":"MapperParsingException[failed to parse [id]]; nested: ElasticSearchIllegalStateException[Field should have either a string, numeric or binary value]; ","status":400}
This is basically because here(for example):
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/mapper/core/IntegerFieldMapper.java#L356-L360
The numeric value is not actually used unless the field is stored.
Then here:
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/mlt/TransportMoreLikeThisAction.java#L293-L303
if you can't read it, it will just thrown an exception.
The text was updated successfully, but these errors were encountered: