Add the `field_value_factor` function to the function_score query #5519

dakrone · 2014-03-24T16:15:27Z

The field_value_factor function uses the value of a field in the document to influence the score. This is a common case that script scoring was previously used for.

For example, a query that looks like:

{
  "query": {
    "function_score": {
      "query": {"match": { "body": "foo" }},
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.1,
            "modifier": "square"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

Would have the score modified for each document by:

square(1.1 * doc['popularity'].value)

This is faster and less error-prone than using scripting to influence the score. Speed-wise, I used the IMDB movie database and tested with two different queries:

{
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [
        {
          "field_value_factor": {
            "field": "runtime",
            "modifier": "none"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

vs:

{
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [
        {
          "script_score": {
            "script": "_score * doc['runtime'].value"
          }
        }
      ],
      "score_mode": "max",
      "boost_mode": "sum"
    }
  }
}

The field_value_factor version took about 75ms on average, the script_score version took about 145ms on average (after field data was loaded for both versions).

brwe · 2014-03-25T10:34:57Z

src/main/java/org/elasticsearch/common/lucene/search/function/FieldValueFactorFunction.java

+                }
+                return subQueryScore * Modifier.apply(modifier, val * boostFactor, lenient);
+            } else {
+                return subQueryScore;


The implicit assumption here is that the default value for a missing field value causes Modifier.apply(...) to evaluate to one. People can circumvent this by using the missing filter together with another function score function that implements a different default behavior but maybe that should be documented?
Alternatively, we could allow to pass a default field value.

dakrone · 2014-03-25T17:42:34Z

@brwe I removed the lenient flag entirely, but I did add in an ignore_missing flag that returns the original score unmodified if the field is missing a value (defaulting to false, which throws an exception if the field is missing). What do you think of this solution?

brwe · 2014-03-26T15:53:21Z

I have no strong opinion about the ignore_missing if it is switched off per default, it might make sense in some cases.

brwe · 2014-03-26T16:22:56Z

+1

imotov · 2014-03-26T22:46:55Z

docs/reference/query-dsl/queries/function-score-query.asciidoc

@@ -246,6 +246,44 @@ In contrast to the normal and exponential decay, this function actually
 sets the score to 0 if the field value exceeds twice the user given
 scale value.

+===== Field Value factor
+The `field_value_factor` function allows you to use a field from a document to
+influnce the score. It's similar to using the `script_score` function, however,


imotov · 2014-03-26T23:35:37Z

LGTM

s1monw · 2014-03-27T09:36:53Z

src/main/java/org/elasticsearch/common/lucene/search/function/FieldValueFactorFunction.java

+        RECIPROCAL;
+
+        public static double apply(Modifier t, double n) {
+            if (t == null) {


I don't think you should catch this here! this must not be null at all just let it run into NPE

s1monw · 2014-03-27T09:44:50Z

I like the feature I left some comments on the implementation - speed is important here :)

clintongormley · 2014-03-27T13:51:03Z

Please update these docs as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost

nik9000 · 2014-03-27T13:59:15Z

src/main/java/org/elasticsearch/common/lucene/search/function/FieldValueFactorFunction.java

+    public enum Modifier {
+        NONE,
+        LOG,
+        LOG1P,


I'd love a LOG2P, actually. log(n + 2) is nice because it is always > 0 if n is > 0.

Added both log2p and ln2p :)

The `field_value_factor` function uses the value of a field in the document to influence the score. A query that looks like: { "query": { "function_score": { "query": {"match": { "body": "foo" }}, "functions": [ { "field_value_factor": { "field": "popularity", "factor": 1.1, "modifier": "square" } } ], "score_mode": "max", "boost_mode": "sum" } } } Would have the score modified by: _score * square(1.1 * doc['popularity'])

Also makes the `field_value_function` not ignore missing fields by default

…score itself

dakrone · 2014-03-27T19:24:06Z

@s1monw updated to use IndexNumericFieldData and load the values more efficiently as you suggested. Also updated the docs like @clintongormley asked.

s1monw · 2014-03-27T19:48:01Z

src/main/java/org/elasticsearch/common/lucene/search/function/FieldValueFactorFunction.java

+        if (numValues > 0) {
+            double val = this.values.nextValue();
+            return Modifier.apply(modifier, val * boostFactor);
+        } else {


Do we really have to throw an exception for this? it seems not very flexible can't you specify a default value in the query that is used instead?

I discussed this with Britta and we decided that since it was trivial to remove missing values with a filter, it was simpler not to add complexity by adding a default field.

ok so I try to think about situations where this doesn't work.... the missing / hasvalue filter can be expensive though but I guess we should just make it fast for that case and expose the filter on the field data... I like the idea... ok fair enough.

It doesn't have to be a missing/hasvalue filter, a range filter works just as well for filtering out values.

Examples I have been using here: http://p.writequit.org/org/field-value-function.html

s1monw · 2014-03-27T19:50:05Z

src/main/java/org/elasticsearch/common/lucene/search/function/FieldValueFactorFunction.java

+        SQRT,
+        RECIPROCAL;
+
+        public static double apply(Modifier t, double n) {


I wonder if it makes more sense to have public double apply(double n) on each of the different enum types? That way I guess they can be much better inlined and you don't have to do the swtich altogether?

Sure, I can add that to each type.

s1monw · 2014-03-27T20:26:29Z

LGTM

The `field_value_factor` function uses the value of a field in the document to influence the score. A query that looks like: { "query": { "function_score": { "query": {"match": { "body": "foo" }}, "functions": [ { "field_value_factor": { "field": "popularity", "factor": 1.1, "modifier": "square" } } ], "score_mode": "max", "boost_mode": "sum" } } } Would have the score modified by: square(1.1 * doc['popularity'].value) Closes #5519

ncolomer · 2014-04-10T15:12:42Z

Nice work! 👍
Any idea when this will be available?

While the blogpost http://www.elasticsearch.org/blog/2014-04-02-this-week-in-elasticsearch/ states, that feature #5519 was added to 1.x, the release notes for, e.g. v1.1.2, however tell otherwise. Only the release notes for 1.2.0 list #5519 as a new feature. Since the 1.x docs deprecate/discourage from using `_boost`, and seemingly give a migration example at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost users of 1.1.x should be warned.

dakrone added feature labels Mar 24, 2014

brwe reviewed Mar 25, 2014
View reviewed changes

imotov reviewed Mar 26, 2014
View reviewed changes

s1monw reviewed Mar 27, 2014
View reviewed changes

nik9000 reviewed Mar 27, 2014
View reviewed changes

dakrone added 12 commits March 27, 2014 09:42

Throw an exception if required 'field' parameter is missing

7468fe6

Replace the lenient flag with an ignore_missing flag

40f82d6

Also makes the `field_value_function` not ignore missing fields by default

Don't multiply by score, the score can be combined with the function …

03b8815

…score itself

Fix test for document with missing value

1c03c54

Remove ignore_missing

9cc452f

Add log1p and ln1p, fix typo in docs

1cc1da9

Don't load values in .score(), load them in .setNextReader()

ea285bc

currentContext is not used

a548ebc

load field data in parser, pass to FieldValueFactorFunction

1247dc7

Use numeric field data, load values in a more efficient way

e06077f

Update _boost documentation to use field_value_factor

a740b2e

s1monw reviewed Mar 27, 2014
View reviewed changes

dakrone added 2 commits March 27, 2014 14:13

Make Modifier.apply() an abstract enum method implemented by each type

83a7d22

Add log2p and ln2p

49552ad

dakrone closed this in 8fbd1bd Mar 27, 2014

dakrone deleted the function-score-factor branch April 21, 2014 23:00

ruflin mentioned this pull request Jun 22, 2014

field_value_factor support ruflin/Elastica#639

Closed

konradkonrad mentioned this pull request Jul 22, 2014

Docs: Reflect that 'field_value_factor' is only in 1.2.x #6957

Closed

clintongormley added the :Query DSL label Jun 6, 2015

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the `field_value_factor` function to the function_score query #5519

Add the `field_value_factor` function to the function_score query #5519

dakrone commented Mar 24, 2014

brwe Mar 25, 2014

dakrone commented Mar 25, 2014

brwe commented Mar 26, 2014

brwe commented Mar 26, 2014

imotov Mar 26, 2014

imotov commented Mar 26, 2014

s1monw Mar 27, 2014

s1monw commented Mar 27, 2014

clintongormley commented Mar 27, 2014

nik9000 Mar 27, 2014

dakrone Mar 27, 2014

dakrone commented Mar 27, 2014

s1monw Mar 27, 2014

dakrone Mar 27, 2014

s1monw Mar 27, 2014

dakrone Mar 27, 2014

dakrone Mar 27, 2014

s1monw Mar 27, 2014

dakrone Mar 27, 2014

s1monw commented Mar 27, 2014

ncolomer commented Apr 10, 2014

Add the field_value_factor function to the function_score query #5519

Add the field_value_factor function to the function_score query #5519

Conversation

dakrone commented Mar 24, 2014

Choose a reason for hiding this comment

dakrone commented Mar 25, 2014

brwe commented Mar 26, 2014

brwe commented Mar 26, 2014

Choose a reason for hiding this comment

imotov commented Mar 26, 2014

Choose a reason for hiding this comment

s1monw commented Mar 27, 2014

clintongormley commented Mar 27, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dakrone commented Mar 27, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Mar 27, 2014

ncolomer commented Apr 10, 2014

Add the `field_value_factor` function to the function_score query #5519

Add the `field_value_factor` function to the function_score query #5519