Skip to content

Commit

Permalink
Generic models for ranking (#174)
Browse files Browse the repository at this point in the history
This is to make the models more generic to handle all forms of two types of models. One type of model is the linear model. Previously, we focused on only the RankSVM, but this version is generic with any linear model including Pranking, a form of ordinal regression. The other type of model is a generic multiple additive trees model. Previosuly, we focused only on LambdaMART, but this version is generic with any form of additive trees, which also includes gradient boosted regression trees (GBRT) models.
  • Loading branch information
jdorando authored and mnilsson23 committed Oct 18, 2016
1 parent f2a8e8a commit bfa05b8
Show file tree
Hide file tree
Showing 56 changed files with 288 additions and 298 deletions.
58 changes: 28 additions & 30 deletions solr/contrib/ltr/README.md
Expand Up @@ -33,7 +33,7 @@ the techproducts example please follow these steps.
`mkdir example/techproducts/solr/techproducts/lib`
3. Install the plugin in the lib folder

`cp build/contrib/ltr/lucene-ltr-7.0.0-SNAPSHOT.jar example/techproducts/solr/techproducts/lib/`
`cp build/contrib/ltr/solr-ltr-7.0.0-SNAPSHOT.jar example/techproducts/solr/techproducts/lib/`
4. Replace the original solrconfig with one importing all the ltr components

`cp contrib/ltr/example/solrconfig.xml example/techproducts/solr/techproducts/conf/`
Expand Down Expand Up @@ -61,7 +61,7 @@ the techproducts example please follow these steps.
http://localhost:8983/solr/techproducts/schema/model-store
* Perform a reranking query using the model, and retrieve the features

http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.user_query=%27test%27}&fl=[features],price,score,name
http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=linear%20reRankDocs=25%20efi.user_query=%27test%27}&fl=[features],price,score,name


BONUS: Train an actual machine learning model
Expand All @@ -78,7 +78,7 @@ BONUS: Train an actual machine learning model

This script deploys your features from `config.json` "featuresFile" to Solr. Then it takes the relevance judged query
document pairs of "userQueriesFile" and merges it with the features extracted from Solr into a training
file. That file is used to train a rankSVM model, which is then deployed to Solr for you to rerank results.
file. That file is used to train a linear model, which is then deployed to Solr for you to rerank results.

4. Search and rerank the results using the trained model

Expand Down Expand Up @@ -153,7 +153,7 @@ using standard Solr queries. As an example:
]
```

Defines four features. Anything that is a valid Solr query can be used to define
Defines five features. Anything that is a valid Solr query can be used to define
a feature.

### Filter Query Features
Expand Down Expand Up @@ -198,19 +198,18 @@ The majority of features should be possible to create using the methods describe
above.

# Defining Models
Currently the Learning to Rank plugin supports 2 main types of
ranking models: [Ranking SVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf)
and [LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf)
Currently the Learning to Rank plugin supports 2 generalized forms of
models: 1. Linear Model i.e. [RankSVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf), [Pranking](https://papers.nips.cc/paper/2023-pranking-with-ranking.pdf)
and 2. Multiple Additive Trees i.e. [LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf), [Gradient Boosted Regression Trees (GBRT)](https://papers.nips.cc/paper/3305-a-general-boosting-method-and-its-application-to-learning-ranking-functions-for-web-search.pdf)

### Ranking SVM
Currently only a linear ranking svm is supported. Use LambdaMART for
a non-linear model. If you'd like to introduce a bias set a constant feature
### Linear
If you'd like to introduce a bias set a constant feature
to the bias value you'd like and make a weight of 1.0 for that feature.

###### model.json
```json
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"myModelName",
"features":[
{ "name": "userTextTitleMatch"},
Expand All @@ -228,27 +227,26 @@ to the bias value you'd like and make a weight of 1.0 for that feature.
}
```

This is an example of a toy Ranking SVM model. Type specifies the class to be
using to interpret the model (RankSVMModel in the case of Ranking SVM).
Name is the model identifier you will use when making request to the ltr
framework. Features specifies the feature space that you want extracted
when using this model. All features that appear in the model params will
be used for scoring and must appear in the features list. You can add
extra features to the features list that will be computed but not used in the
model for scoring, which can be useful for logging.
Params are the Ranking SVM parameters.
This is an example of a toy Linear model. Class specifies the class to be
using to interpret the model. Name is the model identifier you will use
when making request to the ltr framework. Features specifies the feature
space that you want extracted when using this model. All features that
appear in the model params will be used for scoring and must appear in
the features list. You can add extra features to the features list that
will be computed but not used in the model for scoring, which can be useful
for logging. Params are the Linear parameters.

Good library for training SVM's (https://www.csie.ntu.edu.tw/~cjlin/liblinear/ ,
https://www.csie.ntu.edu.tw/~cjlin/libsvm/) . You will need to convert the
libSVM model format to the format specified above.
Good library for training SVM, an example of a Linear model, is
(https://www.csie.ntu.edu.tw/~cjlin/liblinear/ , https://www.csie.ntu.edu.tw/~cjlin/libsvm/) .
You will need to convert the libSVM model format to the format specified above.

### LambdaMART
### Multiple Additive Trees

###### model2.json
```json
{
"class":"org.apache.solr.ltr.model.LambdaMARTModel",
"name":"lambdamartmodel",
"class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"name":"multipleadditivetreesmodel",
"features":[
{ "name": "userTextTitleMatch"},
{ "name": "originalScore"}
Expand Down Expand Up @@ -285,17 +283,17 @@ libSVM model format to the format specified above.
}
}
```
This is an example of a toy LambdaMART. Type specifies the class to be using to
interpret the model (LambdaMARTModel in the case of LambdaMART). Name is the
This is an example of a toy Multiple Additive Trees. Class specifies the class to be using to
interpret the model. Name is the
model identifier you will use when making request to the ltr framework.
Features specifies the feature space that you want extracted when using this
model. All features that appear in the model params will be used for scoring and
must appear in the features list. You can add extra features to the features
list that will be computed but not used in the model for scoring, which can
be useful for logging. Params are the LambdaMART specific parameters. In this
be useful for logging. Params are the Multiple Additive Trees specific parameters. In this
case we have 2 trees, one with 3 leaf nodes and one with 1 leaf node.

A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ).
A good library for training LambdaMART, an example of Multiple Additive Trees, is ( http://sourceforge.net/p/lemur/wiki/RankLib/ ).
You will need to convert the RankLib model format to the format specified above.

# Deploy Models and Features
Expand Down
2 changes: 1 addition & 1 deletion solr/contrib/ltr/example/libsvm_formatter.py
Expand Up @@ -45,7 +45,7 @@ def _getFeatureId(self,key):
def convertLibSvmModelToLtrModel(self,libSvmModelLocation, outputFile, modelName):
with open(libSvmModelLocation, 'r') as inFile:
with open(outputFile,'w') as convertedOutFile:
convertedOutFile.write('{\n\t"class":"org.apache.solr.ltr.model.RankSVMModel",\n')
convertedOutFile.write('{\n\t"class":"org.apache.solr.ltr.model.LinearModel",\n')
convertedOutFile.write('\t"name": "' + str(modelName) + '",\n')
convertedOutFile.write('\t"features": [\n')
isFirst = True;
Expand Down
4 changes: 2 additions & 2 deletions solr/contrib/ltr/example/techproducts-model.json
@@ -1,6 +1,6 @@
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"name":"svm",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"linear",
"features":[
{"name":"isInStock"},
{"name":"price"},
Expand Down
Expand Up @@ -26,11 +26,12 @@
import org.apache.solr.ltr.norm.Normalizer;

/**
* A scoring model that computes scores using a linear Support Vector Machine (SVM) algorithm.
* A scoring model that computes scores using a dot product.
* Example models are RankSVM and Pranking.
* <p>
* Example configuration:
* <pre>{
"class" : "org.apache.solr.ltr.model.RankSVMModel",
"class" : "org.apache.solr.ltr.model.LinearModel",
"name" : "myModelName",
"features" : [
{ "name" : "userTextTitleMatch" },
Expand All @@ -52,8 +53,13 @@
* Thorsten Joachims. Optimizing Search Engines Using Clickthrough Data.
* Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002.</a>
* </ul>
* <ul>
* <li> <a href="https://papers.nips.cc/paper/2023-pranking-with-ranking.pdf">
* Koby Crammer and Yoram Singer. Pranking with Ranking.
* Advances in Neural Information Processing Systems (NIPS), 2001.</a>
* </ul>
*/
public class RankSVMModel extends LTRScoringModel {
public class LinearModel extends LTRScoringModel {

protected Float[] featureToWeight;

Expand All @@ -66,7 +72,7 @@ public void setWeights(Object weights) {
}
}

public RankSVMModel(String name, List<Feature> features,
public LinearModel(String name, List<Feature> features,
List<Normalizer> norms,
String featureStoreName, List<Feature> allFeatures,
Map<String,Object> params) {
Expand Down
Expand Up @@ -28,12 +28,13 @@
import org.apache.solr.util.SolrPluginUtils;

/**
* A scoring model that computes scores based on the LambdaMART algorithm.
* A scoring model that computes scores based on the summation of multiple weighted trees.
* Example models are LambdaMART and Gradient Boosted Regression Trees (GBRT) .
* <p>
* Example configuration:
<pre>{
"class" : "org.apache.solr.ltr.model.LambdaMARTModel",
"name" : "lambdamartmodel",
"class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"name" : "multipleadditivetreesmodel",
"features":[
{ "name" : "userTextTitleMatch"},
{ "name" : "originalScore"}
Expand Down Expand Up @@ -76,8 +77,13 @@
* Christopher J.C. Burges. From RankNet to LambdaRank to LambdaMART: An Overview.
* Microsoft Research Technical Report MSR-TR-2010-82.</a>
* </ul>
* <ul>
* <li> <a href="https://papers.nips.cc/paper/3305-a-general-boosting-method-and-its-application-to-learning-ranking-functions-for-web-search.pdf">
* Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A General Boosting Method and its Application to Learning Ranking Functions for Web Search.
* Advances in Neural Information Processing Systems (NIPS), 2007.</a>
* </ul>
*/
public class LambdaMARTModel extends LTRScoringModel {
public class MultipleAdditiveTreesModel extends LTRScoringModel {

private final HashMap<String,Integer> fname2index;
private List<RegressionTree> trees;
Expand Down Expand Up @@ -208,20 +214,20 @@ public RegressionTreeNode() {
public void validate() throws ModelException {
if (isLeaf()) {
if (left != null || right != null) {
throw new ModelException("LambdaMARTModel tree node is leaf with left="+left+" and right="+right);
throw new ModelException("MultipleAdditiveTreesModel tree node is leaf with left="+left+" and right="+right);
}
return;
}
if (null == threshold) {
throw new ModelException("LambdaMARTModel tree node is missing threshold");
throw new ModelException("MultipleAdditiveTreesModel tree node is missing threshold");
}
if (null == left) {
throw new ModelException("LambdaMARTModel tree node is missing left");
throw new ModelException("MultipleAdditiveTreesModel tree node is missing left");
} else {
left.validate();
}
if (null == right) {
throw new ModelException("LambdaMARTModel tree node is missing right");
throw new ModelException("MultipleAdditiveTreesModel tree node is missing right");
} else {
right.validate();
}
Expand Down Expand Up @@ -268,10 +274,10 @@ public RegressionTree() {

public void validate() throws ModelException {
if (weight == null) {
throw new ModelException("LambdaMARTModel tree doesn't contain a weight");
throw new ModelException("MultipleAdditiveTreesModel tree doesn't contain a weight");
}
if (root == null) {
throw new ModelException("LambdaMARTModel tree doesn't contain a tree");
throw new ModelException("MultipleAdditiveTreesModel tree doesn't contain a tree");
} else {
root.validate();
}
Expand All @@ -286,7 +292,7 @@ public void setTrees(Object trees) {
}
}

public LambdaMARTModel(String name, List<Feature> features,
public MultipleAdditiveTreesModel(String name, List<Feature> features,
List<Normalizer> norms,
String featureStoreName, List<Feature> allFeatures,
Map<String,Object> params) {
Expand Down Expand Up @@ -321,7 +327,7 @@ public float score(float[] modelFeatureValuesNormalized) {

// /////////////////////////////////////////
// produces a string that looks like:
// 40.0 = lambdamartmodel [ org.apache.solr.ltr.model.LambdaMARTModel ]
// 40.0 = multipleadditivetreesmodel [ org.apache.solr.ltr.model.MultipleAdditiveTreesModel ]
// model applied to
// features, sum of:
// 50.0 = tree 0 | 'matchedTitle':1.0 > 0.500001, Go Right |
Expand Down
Expand Up @@ -33,7 +33,7 @@
* defines how to combine the features in order to create a new
* score for a document. A new Learning to Rank model is plugged
* into the framework by extending {@link org.apache.solr.ltr.model.LTRScoringModel},
* (see for example {@link org.apache.solr.ltr.model.LambdaMARTModel} and {@link org.apache.solr.ltr.model.RankSVMModel}).
* (see for example {@link org.apache.solr.ltr.model.MultipleAdditiveTreesModel} and {@link org.apache.solr.ltr.model.LinearModel}).
* </p>
* <p>
* The {@link org.apache.solr.ltr.LTRScoringQuery} will take care of computing the values of
Expand Down
2 changes: 1 addition & 1 deletion solr/contrib/ltr/src/java/overview.html
Expand Up @@ -57,7 +57,7 @@ <h2> Code structure </h2>
defines how to combine the features in order to create a new
score for a document. A new learning to rank model is plugged
into the framework by extending {@link org.apache.solr.ltr.model.LTRScoringModel},
(see for example {@link org.apache.solr.ltr.model.LambdaMARTModel} and {@link org.apache.solr.ltr.model.RankSVMModel}).
(see for example {@link org.apache.solr.ltr.model.MultipleAdditiveTreesModel} and {@link org.apache.solr.ltr.model.LinearModel}).
</p>
<p>
The {@link org.apache.solr.ltr.LTRScoringQuery} will take care of computing the values of
Expand Down
Expand Up @@ -7,7 +7,7 @@
}
},
{
"name": "constantScoreToForceLambdaMARTScoreAllDocs",
"name": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs",
"class": "org.apache.solr.ltr.feature.ValueFeature",
"params": {
"value": 1
Expand Down
@@ -1,5 +1,5 @@
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"externalmodel",
"features":[
{ "name": "matchedTitle"}
Expand Down
@@ -1,5 +1,5 @@
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"externalmodelstore",
"store": "fstore2",
"features":[
Expand Down
@@ -1,5 +1,5 @@
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"fqmodel",
"features":[
{
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

@@ -1,6 +1,6 @@
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"name":"svm-efi",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"linear-efi",
"features":[
{"name":"sampleConstant"},
{"name":"search_number_of_nights"}
Expand Down
@@ -1,5 +1,5 @@
{
"class":"org.apache.solr.ltr.model.RankSVMModel",
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"6029760550880411648",
"features":[
{"name":"title"},
Expand Down
@@ -1,9 +1,9 @@
{
"class":"org.apache.solr.ltr.model.LambdaMARTModel",
"name":"lambdamartmodel",
"class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"name":"multipleadditivetreesmodel",
"features":[
{ "name": "matchedTitle"},
{ "name": "constantScoreToForceLambdaMARTScoreAllDocs"}
{ "name": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs"}
],
"params":{
"trees": [
Expand Down
@@ -1,5 +1,5 @@
{
"class":"org.apache.solr.ltr.model.LambdaMARTModel",
"class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"name":"external_model_binary_feature",
"features":[
{ "name": "user_device_smartphone"},
Expand Down

0 comments on commit bfa05b8

Please sign in to comment.