Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function score #3423

Closed
brwe opened this issue Aug 1, 2013 · 1 comment
Closed

function score #3423

brwe opened this issue Aug 1, 2013 · 1 comment

Comments

@brwe
Copy link
Contributor

brwe commented Aug 1, 2013

Function score

function_score allows to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.

function_score provides the same functionality that custom_boost_factor, custom_score and custom_filters_score provided but furthermore adds the option to score a document depending on the distance of a numeric field value from a user given reference (see description below).

Using function score

function_score can be used with only one function like this:

"function_score": {
    "(query|filter)": {},
    "boost": "boost for the whole query",
    "FUNCTION": {}
} 

Furthermore, several functions can be combined. In this case one can optionally choose to apply the function only if a document matches a given filter:

"function_score": {
    "(query|filter)": {},
    "boost": "boost for the whole query",
    "functions": [
        {
            "filter": {},
            "FUNCTION": {}
        },
        {
            "FUNCTION": {}
        }
    ],
    "score_mode": "(mult|max|...)"
}

If no filter is given with a function this is equivalent to specifying "match_all": {}

score_mode defines how functions are combined before multiplying to the score of the query:

  • "multiply": all functions are multiplied
  • "total": functions are summed
  • "avg": average of functions is computed
  • "first": the first function that has a matching filter with it is applied
  • "max": the function yielding the maximum score is applied
  • "min": the function yielding the minimum score is applied

The default is "multiply".

Score functions

function_score provides three types of score functions.

Script score

The script_score function allows to wrap another query and customize the scoring of it optionally with a computation derived from other field values in the doc (numeric ones) using script expression. Here is a simple sample:

"script_score" : {
    "script" : "_score * doc['my_numeric_field'].value"
}

On top of the different scripting field values and expression, the _score script parameter can be used to retrieve the score based on the wrapped query.

Scripts are cached for faster execution. If the script has parameters that it needs to take into account, it is preferable to use the same script, and provide parameters to it:

"script_score": {
    "lang": "lang",
    "params": {
        "param1": value1,
        "param2": value2
     },
    "script": "_score * doc['my_numeric_field'].value / pow(param1, param2)"
}

Boost factor

The boost_factor score allows to multiply the score by the provided boost_factor. This can sometimes be desired since boost value set on specific queries gets normalized, while for this score function it does not.

"boost_factor" : number

Decay functions

Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference. This is similar to a range query, but with smooth edges instead of boxes.

To use distance scoring on a query that has numerical fields, the user has to define

  1. a reference and
  2. a scale

for each field. A reference is needed to define a distance for the document and a scale to define the rate of decay. The decay function is specified by

"DECAY_FUNCTION": {
    "FIELD_NAME": {
          "reference": "11, 12",
          "scale": "2km"
    }
}

where DECAY_FUNCTION can be "linear", "exp" and "gauss".

Normal decay, keyword "gauss"

The score is computed as

Exponential decay, keyword "exp"

The score is computed as

'Linear' decay, keyword "linear"

The score is computed as

In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds the user given scale value.

Choosing an appropriate scale

For all three functions, it might not always be easy to define an appropriate scale. Rather than defining the scale parameter directly, one can optionally define a distance at which the function should compute a particular factor.

For example, your documents might represents hotels and contain a geo location field. You want to compute a decay function depending on how far the hotel is from a given location. You might not immediately see what scale to choose for the gauss function, but you can say something like: "At a distance of 2km from the desired location, the score should be reduced by half."
You can provide this parameter like this:

  "DECAY_FUNCTION": {
    "location": {
          "reference": "11, 12",
          "scale": "2km",
          "scale_weight" : 0.5
    }
}

The parameter "scale" will then be adjusted automatically to assure that the score function computes a score of 0.5 for hotels that are 2km away from the desired location.

Detailed example

Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.
You would like the query results that match your criterion (for example, "hotel, Nancy, non-smoker") to be scored with respect to distance to the town center and also the price.

Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.
In this case your reference for the location field is the town center and the scale is ~2km.

If your budget is low, you would probably prefer something cheap above something expensive.
For the price field, the reference would be 0 Euros and the scale depends on how much you are willing to pay, for example 20 Euros.

In this example, the fields might be called "price" for the price of the hotel and "location" for the coordinates of this hotel.

The function for "price" in this case would be

"DECAY_FUNCTION": {
    "price": {
          "reference": "0",
          "scale": "20"
    }
}

and for "location"

"DECAY_FUNCTION": {
    "location": {
          "reference": "11, 12",
          "scale": "2km"
    }
}

where DECAY_FUNCTION can be "linear", "exp" and "gauss".

Suppose you want to multiply these two functions on the original score, the request would look like this:

curl 'localhost:9200/hotels/_search/' -d '{
"query": {
    "function_score": {
        "functions": [
            {
                "DECAY_FUNCTION": {
                    "price": {
                        "reference": "0",
                        "scale": "20"
                    }
                }
            },
            {
                "DECAY_FUNCTION": {
                    "location": {
                        "reference": "11, 12",
                        "scale": "2km"
                    }
                }
            }
        ],
        "query": {
            "match": {
                "properties": "balcony"
            }
        },
        "score_mode": "multiply"
    }
}
}'

Next, we show how the computed score looks like for each of the three possible decay functions.

Normal decay, keyword "gauss"

When choosing "gauss" as decay function in the above example, the multiplier to the original score is computed as

A contour and surface plot of the multiplier looks like this:

gausscontour
gausssurf

Suppose your original search results matches three hotels : "Backback Nap", "Drink n Drive" and "BnB Bellevue".
"Drink n Drive" is pretty far from your defined location (nearly 2 km) and is not too cheap (about 13 Euros) so it gets a low factor a factor of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the defined location but "BnB Bellevue" is cheaper, so it gets a multiplier of 0.86 whereas "Backpack Nap" gets a value of 0.66."

Exponential decay, keyword "exp"

When choosing "exp" as decay function in the above example, the multiplier to the original score is computed as

A contour and surface plot of the multiplier looks like this:

expcontour
expsurf

'Linear' decay, keyword "linear"

When choosing "exp" as decay function in the above example, the multiplier to the original score is computed as

A contour and surface plot of the multiplier looks like this:

lincontour
linsurf

Supported fields for decay functions

Only single valued numeric fields, including time and geo locations, should be supported.

What is a field is missing?

Is the numeric field is missing in the document, the function will return 1.

Relation to custom_boost_factor, custom_score and custom_filters_score

The custom boost factor query

"custom_boost_factor" : {
    "query" : {
        ....
    },
    "boost_factor" : 5.2
}

becomes

"function_score" : {
    "query" : {
        ....
    },
    "boost_factor" : 5.2
}

The custom script score

"custom_score" : {
    "query" : {
        ....
    },
    "params" : {
        "param1" : 2,
        "param2" : 3.1
    },
    "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
}

becomes

"function_score" : {
    "query" : {
        ....
    },
    "script_score" : {

        "params" : {
            "param1" : 2,
            "param2" : 3.1
        },
        "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
    }
}

and the custom filters score query

"custom_filters_score" : {
    "query" : {
        …
     },
    "filters" : [
        {
            "filter" : { …},
            "boost" : "3"
        },
        {
            "filter" : {…},
            "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
        }
    ],
    "params" : {
        "param1" : 2,
        "param2" : 3.1
    }
    "score_mode" : "first"
}       

becomes:

"function_score" : {
    "query" : {
        …
    },
    "functions" : [
        {
            "filter" : {…},
            "boost" : "3"
        },
        {
            "filter" : { … },
            "script_score" : { 
                "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)",
                "params" : {
                    "param1" : 2,
                    "param2" : 3.1
                }

            }
        }
    ],
    "score_mode" : "first",     
}       

This issue replaces Issues #3307 and #3407

@ghost ghost assigned brwe Aug 1, 2013
brwe added a commit to brwe/elasticsearch that referenced this issue Aug 2, 2013
===================

The custom boost factor, custom script boost and the filters function query all do the same thing: They take a query and for each found document compute a new score based on the query score and some script, come custom boost factor or a combination of these two. However, the json format for these three functionalities is very different. This makes it hard to add new functions.

This commit introduces one keyword <code>function_score</code> for all three functions.

The new format can be used to either compute a new score with one function:

	"function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "function": {}
    }

or allow to combine the newly computed scores

    "function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "functions": [
            {
                "filter": {},
                "function": {}
            },
            {
                "function": {}
            }
        ],
        "score_mode": "(mult|max|...)"
    }

<code>function</code> here can be either

	"script_score": {
    	"lang": "lang",
    	"params": {
        	"param1": "value1",
        	"param2": "value2"
   		 },
    	"script": "some script"
	}

or

	"boost_factor" : number

New custom functions can be added via the function score module.

Changes
---------

The custom boost factor query

	"custom_boost_factor" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

becomes

	"function_score" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

The custom script score

	"custom_score" : {
    	"query" : {
        	....
	    },
    	"params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	},
	    "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	}

becomes

	"custom_score" : {
    	"query" : {
        	....
	    },
	    "script_score" : {

    		"params" : {
        		"param1" : 2,
 	       		"param2" : 3.1
    		},
	    	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	    }
	}

and the custom filters score query

    "custom_filters_score" : {
        "query" : {
            "match_all" : {}
       	 },
        "filters" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
            }
        ],
        "score_mode" : "first",
        "params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	}
    	"score_mode" : "first"
    }

becomes:

    "function_score" : {
        "query" : {
            "match_all" : {}
       	},
        "functions" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script_score" : {
                	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)",
                	"params" : {
        				"param1" : 2,
 	       				"param2" : 3.1
    				}

            	}
            }
        ],
        "score_mode" : "first",
    }

Partially closes issue elastic#3423
brwe added a commit to brwe/elasticsearch that referenced this issue Aug 2, 2013
================

It might sometimes be desirable to have a tool available that allows to multiply the original score for a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference.

These functions could be computed for several numeric fields and eventually be combined as a sum or a product and multiplied on the score of the original query.

This commit adds new score functions similar to boost factor and custom script scoring, that can be used togeter with the <code>function_score</code> keyword in a query.

To use distance scoring, the user has to define

 1. a reference and
 2. a scale

for each field the function should be applied on. A reference is needed to define a distance for the document and a scale to define the rate of decay.

Example use case
----------------

Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.
You would like the query results that match your criterion (for example, "hotel, Berlin, non-smoker") to be scored with respect to distance to the town center and also the price.

Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.
In this case your *reference* for the location field is the town center and the *scale* is ~2km.

If your budget is low, you would probably prefer something cheap above something expensive.
For the price field, the *reference* would be 0 Euros and the *scale* depends on how much you are willing to pay, for example 20 Euros.

Usage
----------------

The distance score functions can be applied in two ways:

In the most simple case, only one numeric field is to be evaluated. To do so, call <code>function_score</code>, with the appropriate function. In the above example, this might be:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "gauss": {
                "location": {
                    "reference": [
                        52.516272,
                        13.377722
                    ],
                    "scale": "2km"
                }
            },
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            }
        }
    }
    }'

which would then search for hotels in berlin with a balcony and weight them depending on how far they are from the Brandenburg Gate.

If you have more that one numeric field, you can combine them by defining a series of functions and filters, like, for example, this:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "filter": {
                        "match_all": {}
                    },
                    "gauss": {
                        "location": {
                            "reference": "11,12",
                            "scale": "2km"
                        }
                    }
                },
                {
                    "filter": {
                        "match_all": {}
                    },
                    "lin": {
                        "price": {
                            "reference": "0",
                            "scale": "20"
                        }
                    }
                }
            ],
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            },
            "score_mode": "multiply"
        }
    }
    }'

This would effectively compute the decay function for "location" and "price" and multiply them onto the score. See <code> function_score</code> for the different options for combining functions.

Supported fields
----------------
Only single valued numeric fields, including time and geo locations, are be supported.

What is a field is missing?
----------------

Is the numeric field is missing in the document, that field will not be taken into account at all for this document. The function value for this field is set to 1 for this document. Suppose you have two hotels both of which are in Berlin and cost the same. If one of the documents does not have a "location", this document would get a higher score than the document having the "location" field set.

To avoid this, you could, for example, use the exists or the missing filter and add a custom boost factor to the functions.

      …
     "functions": [
        {
            "filter": {
                "match_all": {}
            },
            "gauss": {
                "location": {
                    "reference": "11, 12",
                    "scale": "2km"
                }
            }
        },
        {
            "filter": {
                "match_all": {}
            },
            "lin": {
                "price": {
                    "reference": "0",
                    "scale": "20"
                }
            }
        },
        {
            "boost_factor": 0.001,
            "filter": {
                "bool": {
                    "must_not": {
                        "missing": {
                            "existence": true,
                            "field": "coordinates",
                            "null_value": true
                        }
                    }
                }
            }
        }
    ],
    ...

Closes elastic#3423
brwe added a commit that referenced this issue Aug 6, 2013
===================

The custom boost factor, custom script boost and the filters function query all do the same thing: They take a query and for each found document compute a new score based on the query score and some script, come custom boost factor or a combination of these two. However, the json format for these three functionalities is very different. This makes it hard to add new functions.

This commit introduces one keyword <code>function_score</code> for all three functions.

The new format can be used to either compute a new score with one function:

	"function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "function": {}
    }

or allow to combine the newly computed scores

    "function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "functions": [
            {
                "filter": {},
                "function": {}
            },
            {
                "function": {}
            }
        ],
        "score_mode": "(mult|max|...)"
    }

<code>function</code> here can be either

	"script_score": {
    	"lang": "lang",
    	"params": {
        	"param1": "value1",
        	"param2": "value2"
   		 },
    	"script": "some script"
	}

or

	"boost_factor" : number

New custom functions can be added via the function score module.

Changes
---------

The custom boost factor query

	"custom_boost_factor" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

becomes

	"function_score" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

The custom script score

	"custom_score" : {
    	"query" : {
        	....
	    },
    	"params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	},
	    "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	}

becomes

	"custom_score" : {
    	"query" : {
        	....
	    },
	    "script_score" : {

    		"params" : {
        		"param1" : 2,
 	       		"param2" : 3.1
    		},
	    	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	    }
	}

and the custom filters score query

    "custom_filters_score" : {
        "query" : {
            "match_all" : {}
       	 },
        "filters" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
            }
        ],
        "score_mode" : "first",
        "params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	}
    	"score_mode" : "first"
    }

becomes:

    "function_score" : {
        "query" : {
            "match_all" : {}
       	},
        "functions" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script_score" : {
                	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)",
                	"params" : {
        				"param1" : 2,
 	       				"param2" : 3.1
    				}

            	}
            }
        ],
        "score_mode" : "first",
    }

Partially closes issue #3423
@brwe brwe closed this as completed in e707308 Aug 6, 2013
@s1monw
Copy link
Contributor

s1monw commented Aug 6, 2013

NICE! :)

brwe added a commit that referenced this issue Aug 14, 2013
===================

The custom boost factor, custom script boost and the filters function query all do the same thing: They take a query and for each found document compute a new score based on the query score and some script, come custom boost factor or a combination of these two. However, the json format for these three functionalities is very different. This makes it hard to add new functions.

This commit introduces one keyword <code>function_score</code> for all three functions.

The new format can be used to either compute a new score with one function:

	"function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "function": {}
    }

or allow to combine the newly computed scores

    "function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "functions": [
            {
                "filter": {},
                "function": {}
            },
            {
                "function": {}
            }
        ],
        "score_mode": "(mult|max|...)"
    }

<code>function</code> here can be either

	"script_score": {
    	"lang": "lang",
    	"params": {
        	"param1": "value1",
        	"param2": "value2"
   		 },
    	"script": "some script"
	}

or

	"boost_factor" : number

New custom functions can be added via the function score module.

Changes
---------

The custom boost factor query

	"custom_boost_factor" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

becomes

	"function_score" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

The custom script score

	"custom_score" : {
    	"query" : {
        	....
	    },
    	"params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	},
	    "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	}

becomes

	"custom_score" : {
    	"query" : {
        	....
	    },
	    "script_score" : {

    		"params" : {
        		"param1" : 2,
 	       		"param2" : 3.1
    		},
	    	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	    }
	}

and the custom filters score query

    "custom_filters_score" : {
        "query" : {
            "match_all" : {}
       	 },
        "filters" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
            }
        ],
        "score_mode" : "first",
        "params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	}
    	"score_mode" : "first"
    }

becomes:

    "function_score" : {
        "query" : {
            "match_all" : {}
       	},
        "functions" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script_score" : {
                	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)",
                	"params" : {
        				"param1" : 2,
 	       				"param2" : 3.1
    				}

            	}
            }
        ],
        "score_mode" : "first",
    }

Partially closes issue #3423

Conflicts:
	src/main/java/org/elasticsearch/index/query/QueryBuilders.java
brwe added a commit that referenced this issue Aug 14, 2013
================

It might sometimes be desirable to have a tool available that allows to multiply the original score for a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference.

These functions could be computed for several numeric fields and eventually be combined as a sum or a product and multiplied on the score of the original query.

This commit adds new score functions similar to boost factor and custom script scoring, that can be used togeter with the <code>function_score</code> keyword in a query.

To use distance scoring, the user has to define

 1. a reference and
 2. a scale

for each field the function should be applied on. A reference is needed to define a distance for the document and a scale to define the rate of decay.

Example use case
----------------

Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.
You would like the query results that match your criterion (for example, "hotel, Berlin, non-smoker") to be scored with respect to distance to the town center and also the price.

Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.
In this case your *reference* for the location field is the town center and the *scale* is ~2km.

If your budget is low, you would probably prefer something cheap above something expensive.
For the price field, the *reference* would be 0 Euros and the *scale* depends on how much you are willing to pay, for example 20 Euros.

Usage
----------------

The distance score functions can be applied in two ways:

In the most simple case, only one numeric field is to be evaluated. To do so, call <code>function_score</code>, with the appropriate function. In the above example, this might be:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "gauss": {
                "location": {
                    "reference": [
                        52.516272,
                        13.377722
                    ],
                    "scale": "2km"
                }
            },
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            }
        }
    }
    }'

which would then search for hotels in berlin with a balcony and weight them depending on how far they are from the Brandenburg Gate.

If you have more that one numeric field, you can combine them by defining a series of functions and filters, like, for example, this:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "filter": {
                        "match_all": {}
                    },
                    "gauss": {
                        "location": {
                            "reference": "11,12",
                            "scale": "2km"
                        }
                    }
                },
                {
                    "filter": {
                        "match_all": {}
                    },
                    "linear": {
                        "price": {
                            "reference": "0",
                            "scale": "20"
                        }
                    }
                }
            ],
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            },
            "score_mode": "multiply"
        }
    }
    }'

This would effectively compute the decay function for "location" and "price" and multiply them onto the score. See <code> function_score</code> for the different options for combining functions.

Supported fields
----------------
Only single valued numeric fields, including time and geo locations, are be supported.

What is a field is missing?
----------------

Is the numeric field is missing in the document, that field will not be taken into account at all for this document. The function value for this field is set to 1 for this document. Suppose you have two hotels both of which are in Berlin and cost the same. If one of the documents does not have a "location", this document would get a higher score than the document having the "location" field set.

To avoid this, you could, for example, use the exists or the missing filter and add a custom boost factor to the functions.

      …
     "functions": [
        {
            "filter": {
                "match_all": {}
            },
            "gauss": {
                "location": {
                    "reference": "11, 12",
                    "scale": "2km"
                }
            }
        },
        {
            "filter": {
                "match_all": {}
            },
            "linear": {
                "price": {
                    "reference": "0",
                    "scale": "20"
                }
            }
        },
        {
            "boost_factor": 0.001,
            "filter": {
                "bool": {
                    "must_not": {
                        "missing": {
                            "existence": true,
                            "field": "coordinates",
                            "null_value": true
                        }
                    }
                }
            }
        }
    ],
    ...

Closes #3423
martijnvg added a commit that referenced this issue Aug 14, 2013
Scoring support will allow the percolate matches to be sorted, or just assign a scores to percolate matches. Sorting by score can be very useful when millions of matching percolate queries are being returned.

The scoring support hooks in into the percolate query option and adds two new boolean options:
* `sort` - Whether to sort the matches based on the score. This will also include the score for each match. The `size` option is a required option when sorting percolate matches is enabled.
* `score` - Whether to compute the score and include it with each match. This will not sort the matches.

For both new options the `query` option needs to be specified, which is used to produce the scores. The `query` option is normally used to control which percolate queries are evaluated. In order to give meaning to these scores, the recently added `function_score` query in #3423 can be used to wrap the percolate query, this way the scores have meaning.

Closes #3506
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
===================

The custom boost factor, custom script boost and the filters function query all do the same thing: They take a query and for each found document compute a new score based on the query score and some script, come custom boost factor or a combination of these two. However, the json format for these three functionalities is very different. This makes it hard to add new functions.

This commit introduces one keyword <code>function_score</code> for all three functions.

The new format can be used to either compute a new score with one function:

	"function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "function": {}
    }

or allow to combine the newly computed scores

    "function_score": {
        "(query|filter)": {},
        "boost": "boost for the whole query",
        "functions": [
            {
                "filter": {},
                "function": {}
            },
            {
                "function": {}
            }
        ],
        "score_mode": "(mult|max|...)"
    }

<code>function</code> here can be either

	"script_score": {
    	"lang": "lang",
    	"params": {
        	"param1": "value1",
        	"param2": "value2"
   		 },
    	"script": "some script"
	}

or

	"boost_factor" : number

New custom functions can be added via the function score module.

Changes
---------

The custom boost factor query

	"custom_boost_factor" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

becomes

	"function_score" : {
    	"query" : {
        	....
    	},
    	"boost_factor" : 5.2
	}

The custom script score

	"custom_score" : {
    	"query" : {
        	....
	    },
    	"params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	},
	    "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	}

becomes

	"custom_score" : {
    	"query" : {
        	....
	    },
	    "script_score" : {

    		"params" : {
        		"param1" : 2,
 	       		"param2" : 3.1
    		},
	    	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
	    }
	}

and the custom filters score query

    "custom_filters_score" : {
        "query" : {
            "match_all" : {}
       	 },
        "filters" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
            }
        ],
        "score_mode" : "first",
        "params" : {
        	"param1" : 2,
 	       	"param2" : 3.1
    	}
    	"score_mode" : "first"
    }

becomes:

    "function_score" : {
        "query" : {
            "match_all" : {}
       	},
        "functions" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "script_score" : {
                	"script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)",
                	"params" : {
        				"param1" : 2,
 	       				"param2" : 3.1
    				}

            	}
            }
        ],
        "score_mode" : "first",
    }

Partially closes issue elastic#3423

Conflicts:
	src/main/java/org/elasticsearch/index/query/QueryBuilders.java
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
================

It might sometimes be desirable to have a tool available that allows to multiply the original score for a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference.

These functions could be computed for several numeric fields and eventually be combined as a sum or a product and multiplied on the score of the original query.

This commit adds new score functions similar to boost factor and custom script scoring, that can be used togeter with the <code>function_score</code> keyword in a query.

To use distance scoring, the user has to define

 1. a reference and
 2. a scale

for each field the function should be applied on. A reference is needed to define a distance for the document and a scale to define the rate of decay.

Example use case
----------------

Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.
You would like the query results that match your criterion (for example, "hotel, Berlin, non-smoker") to be scored with respect to distance to the town center and also the price.

Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.
In this case your *reference* for the location field is the town center and the *scale* is ~2km.

If your budget is low, you would probably prefer something cheap above something expensive.
For the price field, the *reference* would be 0 Euros and the *scale* depends on how much you are willing to pay, for example 20 Euros.

Usage
----------------

The distance score functions can be applied in two ways:

In the most simple case, only one numeric field is to be evaluated. To do so, call <code>function_score</code>, with the appropriate function. In the above example, this might be:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "gauss": {
                "location": {
                    "reference": [
                        52.516272,
                        13.377722
                    ],
                    "scale": "2km"
                }
            },
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            }
        }
    }
    }'

which would then search for hotels in berlin with a balcony and weight them depending on how far they are from the Brandenburg Gate.

If you have more that one numeric field, you can combine them by defining a series of functions and filters, like, for example, this:

    curl 'localhost:9200/hotels/_search/' -d '{
    "query": {
        "function_score": {
            "functions": [
                {
                    "filter": {
                        "match_all": {}
                    },
                    "gauss": {
                        "location": {
                            "reference": "11,12",
                            "scale": "2km"
                        }
                    }
                },
                {
                    "filter": {
                        "match_all": {}
                    },
                    "linear": {
                        "price": {
                            "reference": "0",
                            "scale": "20"
                        }
                    }
                }
            ],
            "query": {
                "bool": {
                    "must": {
                        "city": "Berlin"
                    }
                }
            },
            "score_mode": "multiply"
        }
    }
    }'

This would effectively compute the decay function for "location" and "price" and multiply them onto the score. See <code> function_score</code> for the different options for combining functions.

Supported fields
----------------
Only single valued numeric fields, including time and geo locations, are be supported.

What is a field is missing?
----------------

Is the numeric field is missing in the document, that field will not be taken into account at all for this document. The function value for this field is set to 1 for this document. Suppose you have two hotels both of which are in Berlin and cost the same. If one of the documents does not have a "location", this document would get a higher score than the document having the "location" field set.

To avoid this, you could, for example, use the exists or the missing filter and add a custom boost factor to the functions.

      …
     "functions": [
        {
            "filter": {
                "match_all": {}
            },
            "gauss": {
                "location": {
                    "reference": "11, 12",
                    "scale": "2km"
                }
            }
        },
        {
            "filter": {
                "match_all": {}
            },
            "linear": {
                "price": {
                    "reference": "0",
                    "scale": "20"
                }
            }
        },
        {
            "boost_factor": 0.001,
            "filter": {
                "bool": {
                    "must_not": {
                        "missing": {
                            "existence": true,
                            "field": "coordinates",
                            "null_value": true
                        }
                    }
                }
            }
        }
    ],
    ...

Closes elastic#3423
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants