Alternate model for field AND logic within MultiMatch query #2959

tarass · 2013-04-30T21:25:33Z

I wrote a patch to MultiMatch query that provides more natural and processing when considering multiple fields.

Consider document with fields:
title: Something
description: featured on their 1969 album Abbey Road
author: Beatles
Now if I take user's input and run a query to match my documents, it would be natural to consider ether the dreaded _all field or a multi_match query like:
multi_match:{"query":"Something Beatles", "fields":["title", "description", "author"], "operator":"and"}

Which would get transformed into a boolean query such as:
(+title:something +title:beatles) (+description:something +description:beatles) (+author:something +author: beatles)

There is no match for our document! From human input perspective often the most natural way to AND multi-field search is to ensure each term is matched somewhere across all fields such as:
+(title:something description:something author:something) +(title:beatles description:beatles author:beatles)

My patch does exactly that and it also accounts for use of multiple analyzers which may remove tokens from some fields (ex: The Beatles). If a token is skipped by an analyzer it will be turned into a should requirement on remaining fields instead of a must.

I am using facilities of match query for minimum should match as well as fuzzy processing so a new match type felt natural.
multi_match:{"query":"Something Beatles", "fields":["title", "description", "author"], "type":"across"}

s1monw · 2013-05-02T20:09:52Z

this looks interesting... I hope I can get to it soon!

s1monw · 2013-05-03T08:43:51Z

hey @tarass I have been playing with similar ideas in many projects where you have enough knowledge / structure about your data to match across fields. I think your assumption here is maybe the most generic and it makes perfect sense. Yet, I am not sure about how you handle different anaysis chains. I think we should somehow align this with the actual positions that are returned from the position increment attribute and take PositionLengthAttribute into account that tells us how many tokens a single token spans in the case of a multi term synonym or things like that (ie. word delimiter creates that one as well). I'd want to extract this maybe in a more general datastructure where you can align several analysis output in a graph or something like that.

Does this make sense?

tarass · 2013-05-03T18:18:29Z

It does make sense in concept, but I'll have to do some testing on how PositionLengthAttribute works. The delimiter case is something I was concerned about but didn't know how to fix. I'll get out an update in a few days.

gibrown · 2013-07-15T21:16:09Z

Hi @tarass any update on this patch? We've run into the same problem, and I'd rather have a solution that is built into ES than hack together an ugly query on the client side.

Let me know if I can help get this over the finish line.

tarass · 2013-08-28T03:40:54Z

Have really not had the time to finish the switch to PositionLengthAttribute. Since someone else needs it, I'll try and find the time.

gibrown · 2013-08-29T22:02:26Z

Thanks @tarass that would be really great. I'd hoped to get to working on it myself this month, but I'm bogged down with other things at the moment.

andrewmacheret · 2013-09-04T00:23:18Z

+1

skade · 2013-10-22T10:16:55Z

+1

Just ran into this problem at a client and I think the described assumption is very valid, especially as {multi_}match is propagated as an alternative to query_string, which can easily have this behaviour.

thorsten · 2013-10-22T10:42:12Z

+1

s1monw · 2014-01-30T16:30:05Z

for those of you that are interested I linked some WIP that I have ^^ and if anybody is up for some feedback that would be much appreciated

`cross_fields` attemps to treat fields with the same analysis configuration as a single field and uses maximum score promotion or combination of the scores based depending on the `use_dis_max` setting. By default scores are combined. Relates to elastic#2959

`cross_fields` attemps to treat fields with the same analysis configuration as a single field and uses maximum score promotion or combination of the scores based depending on the `use_dis_max` setting. By default scores are combined. `cross_fields` can also search across fields of hetrogenous types for instance if numbers can be part of the query it makes sense to search also on numeric fields if an analyzer is provided in the reqeust. Relates to elastic#2959

`cross_fields` attemps to treat fields with the same analysis configuration as a single field and uses maximum score promotion or combination of the scores based depending on the `use_dis_max` setting. By default scores are combined. `cross_fields` can also search across fields of hetrogenous types for instance if numbers can be part of the query it makes sense to search also on numeric fields if an analyzer is provided in the reqeust. Relates to #2959

s1monw · 2014-03-12T20:38:35Z

I am closing this since cross_fields is in for 1.1

Initial commit for the across type multi-match query

8f9fe6c

ghost assigned s1monw May 2, 2013

s1monw mentioned this pull request Feb 4, 2014

Added cross_fields type to multi_match query #5005

Closed

s1monw closed this Mar 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternate model for field AND logic within MultiMatch query #2959

Alternate model for field AND logic within MultiMatch query #2959

tarass commented Apr 30, 2013

s1monw commented May 2, 2013

s1monw commented May 3, 2013

tarass commented May 3, 2013

gibrown commented Jul 15, 2013

tarass commented Aug 28, 2013

gibrown commented Aug 29, 2013

andrewmacheret commented Sep 4, 2013

skade commented Oct 22, 2013

thorsten commented Oct 22, 2013

s1monw commented Jan 30, 2014

s1monw commented Mar 12, 2014

Alternate model for field AND logic within MultiMatch query #2959

Alternate model for field AND logic within MultiMatch query #2959

Conversation

tarass commented Apr 30, 2013

s1monw commented May 2, 2013

s1monw commented May 3, 2013

tarass commented May 3, 2013

gibrown commented Jul 15, 2013

tarass commented Aug 28, 2013

gibrown commented Aug 29, 2013

andrewmacheret commented Sep 4, 2013

skade commented Oct 22, 2013

thorsten commented Oct 22, 2013

s1monw commented Jan 30, 2014

s1monw commented Mar 12, 2014