Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function_score should throw exception if decay function is used with multi valued field #3960

Closed
brwe opened this issue Oct 24, 2013 · 10 comments
Assignees

Comments

@brwe
Copy link
Contributor

brwe commented Oct 24, 2013

The decay functions for function_score do not handle fields with multiple values. Decay functions silently compute the distance from the first value. This can lead to confusion, as seen in issue #3926. Instead of using the first value in the field, an exception should be thrown if a field has multiple values.

@bobrik
Copy link
Contributor

bobrik commented Oct 24, 2013

@brwe I'm wondering if "flattering" before applying function_score could work too. Like in #3926 each function could be applied to each value and score_mode could be used to calculate result from NxM values where N is functions count and M is values count.

@s1monw
Copy link
Contributor

s1monw commented Oct 24, 2013

we have a bunch of places where we do similar things like in Sorting we use org.elasticsearch.index.fielddata.fieldcomparator.SortMode where users can simply pick if they want the min/max/sum/average of a multivalued field.

@brwe
Copy link
Contributor Author

brwe commented Oct 24, 2013

I see. Might make sense.
Do you think we should also take care of the following use case?

I have a multi valued field with N entries, but it actually represents an N-dim vector. Instead of computing avg, min or max distance, I actually want to give the origin as vector as well and then compute euclidean distance of origin and field vector.

In this case the number of values in the field should be the same as the number of values given in the origin parameter.

@brwe
Copy link
Contributor Author

brwe commented Apr 24, 2014

Closed this issue. Feel free to open a new issue if you think multiple values should be handled as described above.

@jpountz
Copy link
Contributor

jpountz commented Apr 24, 2014

@brwe Unfortunately, the isMultiValued method might return true even if the field is single-valued. The reason is that our doc-values-based impls can't know whether they are single-valued. (This reminds me that we should randomly use doc values instead of uninverted field data in our tests /cc @javanna @s1monw)

@jpountz jpountz reopened this Apr 24, 2014
brwe added a commit that referenced this issue Apr 24, 2014
brwe added a commit that referenced this issue Apr 24, 2014
@brwe
Copy link
Contributor Author

brwe commented Apr 24, 2014

Thanks! I reverted the commits for now.

s1monw added a commit to s1monw/elasticsearch that referenced this issue Apr 24, 2014
@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2014

@brwe I pushed a commit on how we could solve this - just as an example to start the discussion

@brwe
Copy link
Contributor Author

brwe commented Apr 24, 2014

@s1monw That would work I think.

mikemccand pushed a commit to mikemccand/elasticsearch that referenced this issue Apr 24, 2014
mikemccand pushed a commit to mikemccand/elasticsearch that referenced this issue Apr 24, 2014
@brwe
Copy link
Contributor Author

brwe commented Apr 25, 2014

Opened pull request #5940

brwe pushed a commit to brwe/elasticsearch that referenced this issue Apr 25, 2014
@brwe brwe self-assigned this Apr 25, 2014
s1monw added a commit that referenced this issue Apr 28, 2014
Decay functions currently only use the first value in a field that contains
multiple values to compute the distance to the origin. Instead, it should
consider all distances if more values are in the field and then use
one of min/max/sum/avg which is defined by the user.

Relates to #3960
closes #5940
s1monw added a commit that referenced this issue Apr 28, 2014
Decay functions currently only use the first value in a field that contains
multiple values to compute the distance to the origin. Instead, it should
consider all distances if more values are in the field and then use
one of min/max/sum/avg which is defined by the user.

Relates to #3960
closes #5940
@brwe
Copy link
Contributor Author

brwe commented Apr 29, 2014

Closing, the issue is fixed by #5940

@brwe brwe closed this as completed Apr 29, 2014
@brwe brwe removed v1.2.0 labels Apr 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants