New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the field_value_factor
function to the function_score query
#5519
Conversation
} | ||
return subQueryScore * Modifier.apply(modifier, val * boostFactor, lenient); | ||
} else { | ||
return subQueryScore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implicit assumption here is that the default value for a missing field value causes Modifier.apply(...) to evaluate to one. People can circumvent this by using the missing filter together with another function score function that implements a different default behavior but maybe that should be documented?
Alternatively, we could allow to pass a default field value.
@brwe I removed the |
I have no strong opinion about the |
+1 |
@@ -246,6 +246,44 @@ In contrast to the normal and exponential decay, this function actually | |||
sets the score to 0 if the field value exceeds twice the user given | |||
scale value. | |||
|
|||
===== Field Value factor | |||
The `field_value_factor` function allows you to use a field from a document to | |||
influnce the score. It's similar to using the `script_score` function, however, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
influence
LGTM |
RECIPROCAL; | ||
|
||
public static double apply(Modifier t, double n) { | ||
if (t == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should catch this here! this must not be null at all just let it run into NPE
I like the feature I left some comments on the implementation - speed is important here :) |
Please update these docs as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost |
public enum Modifier { | ||
NONE, | ||
LOG, | ||
LOG1P, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love a LOG2P, actually. log(n + 2) is nice because it is always > 0 if n is > 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added both log2p
and ln2p
:)
The `field_value_factor` function uses the value of a field in the document to influence the score. A query that looks like: { "query": { "function_score": { "query": {"match": { "body": "foo" }}, "functions": [ { "field_value_factor": { "field": "popularity", "factor": 1.1, "modifier": "square" } } ], "score_mode": "max", "boost_mode": "sum" } } } Would have the score modified by: _score * square(1.1 * doc['popularity'])
Also makes the `field_value_function` not ignore missing fields by default
@s1monw updated to use IndexNumericFieldData and load the values more efficiently as you suggested. Also updated the docs like @clintongormley asked. |
if (numValues > 0) { | ||
double val = this.values.nextValue(); | ||
return Modifier.apply(modifier, val * boostFactor); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really have to throw an exception for this? it seems not very flexible can't you specify a default value in the query that is used instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this with Britta and we decided that since it was trivial to remove missing values with a filter, it was simpler not to add complexity by adding a default field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so I try to think about situations where this doesn't work.... the missing / hasvalue filter can be expensive though but I guess we should just make it fast for that case and expose the filter on the field data... I like the idea... ok fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to be a missing/hasvalue filter, a range filter works just as well for filtering out values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples I have been using here: http://p.writequit.org/org/field-value-function.html
SQRT, | ||
RECIPROCAL; | ||
|
||
public static double apply(Modifier t, double n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes more sense to have public double apply(double n)
on each of the different enum types? That way I guess they can be much better inlined and you don't have to do the swtich altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add that to each type.
LGTM |
The `field_value_factor` function uses the value of a field in the document to influence the score. A query that looks like: { "query": { "function_score": { "query": {"match": { "body": "foo" }}, "functions": [ { "field_value_factor": { "field": "popularity", "factor": 1.1, "modifier": "square" } } ], "score_mode": "max", "boost_mode": "sum" } } } Would have the score modified by: square(1.1 * doc['popularity'].value) Closes #5519
Nice work! 👍 |
While the blogpost http://www.elasticsearch.org/blog/2014-04-02-this-week-in-elasticsearch/ states, that feature #5519 was added to 1.x, the release notes for, e.g. v1.1.2, however tell otherwise. Only the release notes for 1.2.0 list #5519 as a new feature. Since the 1.x docs deprecate/discourage from using `_boost`, and seemingly give a migration example at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost users of 1.1.x should be warned.
While the blogpost http://www.elasticsearch.org/blog/2014-04-02-this-week-in-elasticsearch/ states, that feature #5519 was added to 1.x, the release notes for, e.g. v1.1.2, however tell otherwise. Only the release notes for 1.2.0 list #5519 as a new feature. Since the 1.x docs deprecate/discourage from using `_boost`, and seemingly give a migration example at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost users of 1.1.x should be warned.
The
field_value_factor
function uses the value of a field in the document to influence the score. This is a common case that script scoring was previously used for.For example, a query that looks like:
Would have the score modified for each document by:
This is faster and less error-prone than using scripting to influence the score. Speed-wise, I used the IMDB movie database and tested with two different queries:
vs:
The
field_value_factor
version took about 75ms on average, thescript_score
version took about 145ms on average (after field data was loaded for both versions).