New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple tokens at the same position not working correctly with match query if AND operator is used #3881
Comments
I agree! |
btw @s1monw I do not want to hijack this issue but what do you think about my comment no.2 (to me it seems that the search analyzer is not used while it should be, no?) Is it worth opening a new issue or I am misunderstanding something here? |
I updated the PR with a test for your issue no. 2 but I can't reproduce it though. Works just fine and uses the right filter or do I miss something? |
If my recreation script returns one hit for the second query to you then this means it has been probably fixed already (or hard to say ... ). Just ignore it... |
I will try to recreate it via REST maybe there is some problem there. I don't think I will get to it today so I will update it later! |
SynonymFilters produces token streams with stacked tokens such that conjunction queries need to be parsed in a special way such that the stacked tokens are added as an innner disjuncition. Closes #3881
SynonymFilters produces token streams with stacked tokens such that conjunction queries need to be parsed in a special way such that the stacked tokens are added as an innner disjuncition. Closes elastic#3881
If multiple tokens are output at the same position then
match
queries are not working correctly ifAND
operator is used.First I noticed this issue when using Hunspell token filter (something similar has been reported in LUCENE-5057 but it is not really a Lucene issue). With Hunspell it is possible to get multiple output tokens from a single input token, all at the same position. However, client query usually contains only one of those tokens or token that can output different set of tokens. When using
match
query andAND
operator the document is not matching (while it should be).I also think that this can impact other linguistics packages (like Basis`s RBL?)
Similar situation can be simulated using synonym filter. Imagine that we are using query time synonyms.
Let's say we index simple document:
and we define query time synonym "quick, fast". Now let's see what we can do with this in the following recreation script (using ES 0.90.5), output commented below:
Output of queries:
My comments on results:
(note that comment no.2 may contain question regarding other non related issue)
query_string
for query "quick" works as expected.query_string
for query "fast" does not seem to work. According to the documentation I was expecting thatsearch_analyzer
defined instring
type mapping would be used. But anyway, this should not be the topic of this issue... 😄2.5)
query_string
for query "fast" works (if I explicitly forcesearch
analyzer) so we can say query time synonym works fine.The same situation as in 2.5) except we are forcing
AND
operator. It should work and it is working.Now, let's use
match
query and query for "quick". It works fine.Again,
match
query but query for "fast". It works, so far so good.The same as in 5) except we are forcing
AND
operator. It should work (I hope) but it is not.If I could speculate about why this is happening:
a) MatchQueryParser does something like:
b) and MatchQuery does not take account on the position of tokens. It simply stacks all incoming tokens into BooleanQuery. It contains patterns similar to the following excerpt:
The position of tokens is not taken into account which would explain why this is not working as expected in combination with
AND
operator in situations described above.I think if incoming tokens share the same position it should generate Boolean subquery with
OR
operator (?).The text was updated successfully, but these errors were encountered: