Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the field_value_factor function to the function_score query #5519

Closed
wants to merge 14 commits into from

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Mar 24, 2014

The field_value_factor function uses the value of a field in the document to influence the score. This is a common case that script scoring was previously used for.

For example, a query that looks like:

{
  "query": {
    "function_score": {
      "query": {"match": { "body": "foo" }},
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.1,
            "modifier": "square"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

Would have the score modified for each document by:

square(1.1 * doc['popularity'].value)

This is faster and less error-prone than using scripting to influence the score. Speed-wise, I used the IMDB movie database and tested with two different queries:

{
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [
        {
          "field_value_factor": {
            "field": "runtime",
            "modifier": "none"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

vs:

{
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [
        {
          "script_score": {
            "script": "_score * doc['runtime'].value"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

The field_value_factor version took about 75ms on average, the script_score version took about 145ms on average (after field data was loaded for both versions).

}
return subQueryScore * Modifier.apply(modifier, val * boostFactor, lenient);
} else {
return subQueryScore;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit assumption here is that the default value for a missing field value causes Modifier.apply(...) to evaluate to one. People can circumvent this by using the missing filter together with another function score function that implements a different default behavior but maybe that should be documented?
Alternatively, we could allow to pass a default field value.

@dakrone
Copy link
Member Author

dakrone commented Mar 25, 2014

@brwe I removed the lenient flag entirely, but I did add in an ignore_missing flag that returns the original score unmodified if the field is missing a value (defaulting to false, which throws an exception if the field is missing). What do you think of this solution?

@brwe
Copy link
Contributor

brwe commented Mar 26, 2014

I have no strong opinion about the ignore_missing if it is switched off per default, it might make sense in some cases.

@brwe
Copy link
Contributor

brwe commented Mar 26, 2014

+1

@@ -246,6 +246,44 @@ In contrast to the normal and exponential decay, this function actually
sets the score to 0 if the field value exceeds twice the user given
scale value.

===== Field Value factor
The `field_value_factor` function allows you to use a field from a document to
influnce the score. It's similar to using the `script_score` function, however,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

influence

@imotov
Copy link
Contributor

imotov commented Mar 26, 2014

LGTM

RECIPROCAL;

public static double apply(Modifier t, double n) {
if (t == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should catch this here! this must not be null at all just let it run into NPE

@s1monw
Copy link
Contributor

s1monw commented Mar 27, 2014

I like the feature I left some comments on the implementation - speed is important here :)

public enum Modifier {
NONE,
LOG,
LOG1P,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love a LOG2P, actually. log(n + 2) is nice because it is always > 0 if n is > 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added both log2p and ln2p :)

The `field_value_factor` function uses the value of a field in the
document to influence the score.

A query that looks like:
{
  "query": {
    "function_score": {
      "query": {"match": { "body": "foo" }},
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.1,
            "modifier": "square"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

Would have the score modified by:

_score * square(1.1 * doc['popularity'])
Also makes the `field_value_function` not ignore missing fields by default
@dakrone
Copy link
Member Author

dakrone commented Mar 27, 2014

@s1monw updated to use IndexNumericFieldData and load the values more efficiently as you suggested. Also updated the docs like @clintongormley asked.

if (numValues > 0) {
double val = this.values.nextValue();
return Modifier.apply(modifier, val * boostFactor);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to throw an exception for this? it seems not very flexible can't you specify a default value in the query that is used instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with Britta and we decided that since it was trivial to remove missing values with a filter, it was simpler not to add complexity by adding a default field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so I try to think about situations where this doesn't work.... the missing / hasvalue filter can be expensive though but I guess we should just make it fast for that case and expose the filter on the field data... I like the idea... ok fair enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be a missing/hasvalue filter, a range filter works just as well for filtering out values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples I have been using here: http://p.writequit.org/org/field-value-function.html

SQRT,
RECIPROCAL;

public static double apply(Modifier t, double n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes more sense to have public double apply(double n) on each of the different enum types? That way I guess they can be much better inlined and you don't have to do the swtich altogether?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add that to each type.

@s1monw
Copy link
Contributor

s1monw commented Mar 27, 2014

LGTM

dakrone added a commit that referenced this pull request Mar 27, 2014
The `field_value_factor` function uses the value of a field in the
document to influence the score.

A query that looks like:
{
  "query": {
    "function_score": {
      "query": {"match": { "body": "foo" }},
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.1,
            "modifier": "square"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

Would have the score modified by:

square(1.1 * doc['popularity'].value)

Closes #5519
@dakrone dakrone closed this in 8fbd1bd Mar 27, 2014
@ncolomer
Copy link

Nice work! 👍
Any idea when this will be available?

@dakrone dakrone deleted the function-score-factor branch April 21, 2014 23:00
dakrone pushed a commit that referenced this pull request Jul 23, 2014
While the blogpost http://www.elasticsearch.org/blog/2014-04-02-this-week-in-elasticsearch/ states, that feature #5519 was
added to 1.x, the release notes for, e.g. v1.1.2, however tell otherwise.
Only the release notes for 1.2.0 list #5519 as a new feature.

Since the 1.x docs deprecate/discourage from using `_boost`, and seemingly give a migration example at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost
users of 1.1.x should be warned.
dakrone pushed a commit that referenced this pull request Jul 23, 2014
While the blogpost http://www.elasticsearch.org/blog/2014-04-02-this-week-in-elasticsearch/ states, that feature #5519 was
added to 1.x, the release notes for, e.g. v1.1.2, however tell otherwise.
Only the release notes for 1.2.0 list #5519 as a new feature.

Since the 1.x docs deprecate/discourage from using `_boost`, and seemingly give a migration example at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost
users of 1.1.x should be warned.
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v1.2.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants