Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate model for field AND logic within MultiMatch query #2959

Closed
wants to merge 1 commit into from

Conversation

tarass
Copy link

@tarass tarass commented Apr 30, 2013

I wrote a patch to MultiMatch query that provides more natural and processing when considering multiple fields.

Consider document with fields:
title: Something
description: featured on their 1969 album Abbey Road
author: Beatles
Now if I take user's input and run a query to match my documents, it would be natural to consider ether the dreaded _all field or a multi_match query like:
multi_match:{"query":"Something Beatles", "fields":["title", "description", "author"], "operator":"and"}

Which would get transformed into a boolean query such as:
(+title:something +title:beatles) (+description:something +description:beatles) (+author:something +author: beatles)

There is no match for our document! From human input perspective often the most natural way to AND multi-field search is to ensure each term is matched somewhere across all fields such as:
+(title:something description:something author:something) +(title:beatles description:beatles author:beatles)

My patch does exactly that and it also accounts for use of multiple analyzers which may remove tokens from some fields (ex: The Beatles). If a token is skipped by an analyzer it will be turned into a should requirement on remaining fields instead of a must.

I am using facilities of match query for minimum should match as well as fuzzy processing so a new match type felt natural.
multi_match:{"query":"Something Beatles", "fields":["title", "description", "author"], "type":"across"}

@ghost ghost assigned s1monw May 2, 2013
@s1monw
Copy link
Contributor

s1monw commented May 2, 2013

this looks interesting... I hope I can get to it soon!

@s1monw
Copy link
Contributor

s1monw commented May 3, 2013

hey @tarass I have been playing with similar ideas in many projects where you have enough knowledge / structure about your data to match across fields. I think your assumption here is maybe the most generic and it makes perfect sense. Yet, I am not sure about how you handle different anaysis chains. I think we should somehow align this with the actual positions that are returned from the position increment attribute and take PositionLengthAttribute into account that tells us how many tokens a single token spans in the case of a multi term synonym or things like that (ie. word delimiter creates that one as well). I'd want to extract this maybe in a more general datastructure where you can align several analysis output in a graph or something like that.

Does this make sense?

@tarass
Copy link
Author

tarass commented May 3, 2013

It does make sense in concept, but I'll have to do some testing on how PositionLengthAttribute works. The delimiter case is something I was concerned about but didn't know how to fix. I'll get out an update in a few days.

@gibrown
Copy link
Contributor

gibrown commented Jul 15, 2013

Hi @tarass any update on this patch? We've run into the same problem, and I'd rather have a solution that is built into ES than hack together an ugly query on the client side.

Let me know if I can help get this over the finish line.

@tarass
Copy link
Author

tarass commented Aug 28, 2013

Have really not had the time to finish the switch to PositionLengthAttribute. Since someone else needs it, I'll try and find the time.

@gibrown
Copy link
Contributor

gibrown commented Aug 29, 2013

Thanks @tarass that would be really great. I'd hoped to get to working on it myself this month, but I'm bogged down with other things at the moment.

@andrewmacheret
Copy link

+1

@skade
Copy link
Contributor

skade commented Oct 22, 2013

+1

Just ran into this problem at a client and I think the described assumption is very valid, especially as {multi_}match is propagated as an alternative to query_string, which can easily have this behaviour.

@thorsten
Copy link

+1

@s1monw
Copy link
Contributor

s1monw commented Jan 30, 2014

for those of you that are interested I linked some WIP that I have ^^ and if anybody is up for some feedback that would be much appreciated

s1monw added a commit to s1monw/elasticsearch that referenced this pull request Feb 4, 2014
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined.

Relates to elastic#2959
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Feb 6, 2014
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined. `cross_fields` can also search across
fields of hetrogenous types for instance if numbers can be part of
the query it makes sense to search also on numeric fields if an analyzer
is provided in the reqeust.

Relates to elastic#2959
s1monw added a commit that referenced this pull request Feb 6, 2014
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined. `cross_fields` can also search across
fields of hetrogenous types for instance if numbers can be part of
the query it makes sense to search also on numeric fields if an analyzer
is provided in the reqeust.

Relates to #2959
s1monw added a commit that referenced this pull request Feb 6, 2014
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined. `cross_fields` can also search across
fields of hetrogenous types for instance if numbers can be part of
the query it makes sense to search also on numeric fields if an analyzer
is provided in the reqeust.

Relates to #2959
@s1monw
Copy link
Contributor

s1monw commented Mar 12, 2014

I am closing this since cross_fields is in for 1.1

@s1monw s1monw closed this Mar 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants