Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added cross_fields type to multi_match query #5005

Closed
wants to merge 1 commit into from

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Feb 4, 2014

cross_fields attempts to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the use_dis_max setting.
By default scores are combined.

Relates to #2959

@@ -36,15 +36,35 @@ individual clauses inside the top level query.

* `fields` - Fields to be used in the query.
* `use_dis_max` - Boolean indicating to either create a `dis_max` query
or a `bool` query. Defaults to `true`.
or a `bool` query. Defaults to `true`. (Deprecated since coming[1.1.0] use type `best_fields` or `most_fields` instead)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults to true. deprecated[1.1.0,Use type best_fields or most_fields instead]

@s1monw
Copy link
Contributor Author

s1monw commented Feb 4, 2014

I merged in the changes from clinton and updated some minor stuff in the docs. I also added some more test randomization. I think it's ready

@mattweber
Copy link
Contributor

+1, so many people on the mailing list have been hit by this "misunderstanding" of the multi-match query. The doc updates in this PR alone will help considerably! I love how you group the fields by analyzer without requiring the user to do it by hand (not that it is hard).

Is is going to be in 1.0? I only see a 1.1 tag.

@markharwood
Copy link
Contributor

What is the expected behaviour when (stupidly) querying across a combo of number and text fields? Currently get a NumberFormatException from parser on text search terms. Might user expect e.g. "37 the high street" to work on house number (numeric) and street name (text) fields without error?

@clintongormley
Copy link

@markharwood if the user specifies an analyzer for the query, then i would have expected this to work, but apparently it doesn't.

PUT /t/t/1
{
  "first_name": "will",
  "last_name": "smith",
  "count": 12
}

GET /_validate/query?explain
{
  "query": {
    "multi_match": {
      "query": "will smith 12",
      "fields": [
        "*_name",
        "count"
      ],
      "type": "cross_fields",
      "analyzer": "standard",
      "operator": "and"
    }
  }
}

returns an explanation of:

+blended("will", fields: [first_name, count, last_name]) 
+blended("smith", fields: [first_name, count, last_name]) 
+blended("12", fields: [first_name, count, last_name])

But it doesn't:

GET /_search
{
  "query": {
    "multi_match": {
      "query": "will smith 12",
      "fields": [
        "*_name",
        "count"
      ],
      "type": "cross_fields",
      "analyzer": "standard",
      "operator": "and"
    }
  }
}
# no results

@markharwood
Copy link
Contributor

@clintongormley I've tried specifying 'standard' analyzer and it looks like it doesn't get used because the mapping for integer fields doesn't want to use any analyzer on the input (due to IntegerFieldMapper.useTermQueryWithQueryString returns true). So the query string does not get tokenized for this field and IntegerFieldMapper just barfs on the full string "37 the high st" as an invalid number.
A long winded way of saying "this won't work"...
So we either needed to document the shortcomings or try fix.

@s1monw
Copy link
Contributor Author

s1monw commented Feb 5, 2014

@clintongormley @markharwood I just pushed a fix for the numeric problem your test should work now. I will pull this in if nobody objects.


BlendedTermQuery that = (BlendedTermQuery) o;

if (!Arrays.equals(terms, that.terms)) return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should ignore the order of the terms here?

@s1monw
Copy link
Contributor Author

s1monw commented Feb 5, 2014

@jpountz I fixed the last two comments and squashed after rebase to master. Can you take a last look?

@s1monw
Copy link
Contributor Author

s1monw commented Feb 6, 2014

@mattweber 1.0 is frozen we don't push features into it anymore. As you can guess GA is close so this will only be 1.1 but since we don't plan to wait too long with the next minor release I don't think this is too much of a bummer :)

@jpountz
Copy link
Contributor

jpountz commented Feb 6, 2014

+1!

@markharwood
Copy link
Contributor

@s1monw Just tried this again and was still getting java.lang.NumberFormatException: For input string: "song 2"
running this:

"query" :
{
      "multi_match" : {
             "query" : "song 2",
              "fields" : [ "Album", "Ignore2" ],
                "analyzer":"standard",
               "type":"cross_fields"
        },
}

This is on the musicbrainz dataset I shared where field "Album" is type string and "Ignore2" is type integer.

@s1monw
Copy link
Contributor Author

s1monw commented Feb 6, 2014

@markharwood can you confirm that last commit fixed your issues?

@markharwood
Copy link
Contributor

Following latest changes I can confirm that #5005 (comment) no longer throws exceptions and the logic for checking numeric fields works in my test

@s1monw s1monw removed the v0.90.12 label Feb 6, 2014
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined. `cross_fields` can also search across
fields of hetrogenous types for instance if numbers can be part of
the query it makes sense to search also on numeric fields if an analyzer
is provided in the reqeust.

Relates to elastic#2959
@s1monw
Copy link
Contributor Author

s1monw commented Feb 6, 2014

pushed to 1.x and master

@s1monw s1monw closed this Feb 6, 2014
@s1monw s1monw deleted the issues/44 branch February 6, 2014 16:16
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Mar 14, 2014
This fixes a regression introduced by elastic#5005 where the query slop
was simply ignored when a `match_phrase_prefix` type was set.

Closes elastic#5437
s1monw added a commit that referenced this pull request Mar 14, 2014
This fixes a regression introduced by #5005 where the query slop
was simply ignored when a `match_phrase_prefix` type was set.

Closes #5437
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v1.1.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants