# Solr Client

In [46]:
from ltr.client import SolrClient
client = SolrClient()

# Download & Build Index (run once)

If you don't already have the downloaded dependencies; if you don't have TheMovieDB data indexed run this

In [None]:
from ltr import download
download();

from ltr.index import rebuild_tmdb
rebuild_tmdb(client)

## Features for movie titles

We'll be searching movie titles (think searching for a specific movie on Netflix). And we have a set of judgments around the appropriatte movie to return. IE search for "Star Wars" return good star wars matches, in quality order...

These cover various aspects of the problem (searching title by phrase, title bm25 score, release date, etc). We'll use this to explore and analyze a simple model

In [47]:
config = [
    #1
    {
      "name" : "title_has_phrase",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "title:\"${keywords})\"^=1"
      }
    },
    #2
    {
      "name" : "title_has_terms",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "title:(${keywords})^=1"
      }
    },
    #3
    {
      "name" : "title_bm25",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "title:(${keywords})"
      }
    },
    #4
    {
      "name" : "overview_bm25",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "overview:(${keywords})"
      }
    },
    #5
    {
      "name" : "overview_phrase_bm25",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "overview:\"${keywords}\""
      }
    },
    #6
    {
      "name" : "title_fuzzy",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "{!lucene df=title}${keywords}~"
      }
    },
    #7
    {
      "name" : "release_year",
      "store": "title",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "{!func}def(release_year,2000)"
      }
    }

]



from ltr import setup
setup(client, config=config, featureset='title')

Deleted title2 Featurestore [Status: 200]
Created title feature store under tmdb: [Status: 200]


## Training Set Generation

Log out features for each of the above queries out to a training set file

In [48]:
from ltr.log import judgments_to_training_set
trainingSet = judgments_to_training_set(client, 
                                        judgmentInFile='data/title_judgments.txt', 
                                        trainingOutFile='data/title_judgments_train.txt', 
                                        featureSet='title')

Recognizing 40 queries...
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for rambo (0/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for rocky (1/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for war games (2/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for crocodile dundee (3/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for matrix (4/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for contact (5/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for space jam (6/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for battlestar galactica (7/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for her (8/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for jobs (9/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for social network (10/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for rocky horror (11/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for shawshank re

## Feature Search: which features work best?

What combination of these features work best? Train a model with every combination, and use k-fold cross valudation (see `kcv=15` below). The combination with the best NDCG is output

In [49]:
from ltr.train import feature_search
rankLibResult, ndcgPerFeature = feature_search(client,
                                               trainingInFile='data/title_judgments_train.txt',
                                               metric2t='NDCG@10',
                                               leafs=20,
                                               trees=20,
                                               kcv=15,
                                               features=[1,2,3,4,5,6,7],
                                               featureSet='title')

print()
print("Impact of each feature on the model")
trainLogs = rankLibResult.trainingLogs
for ftrId, impact in trainLogs[-1].impacts.items():
    print("{} - {}".format(ftrId, impact))
    
for roundDcg in trainLogs[-1].rounds:
    print(roundDcg)
    
print("Avg NDCG@10 when feature included:")
for ftrId, ndcg in ndcgPerFeature.items():
    print("%s => %s" % (ftrId, ndcg))
    
print("Avg K-Fold NDCG@10 %s" % rankLibResult.kcvTestAvg)

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1] TEST NDCG@10=0.9071
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2] TEST NDCG@10=0.9064
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3] TEST NDCG@10=0.8482
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4] TEST NDCG@10=0.4683
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree

DONE
Trying features [1, 3, 6] TEST NDCG@10=0.9013
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 7] TEST NDCG@10=0.8894
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 4, 5] TEST NDCG@10=0.8597
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 4, 6] TEST NDCG@10=0.9104
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 4, 7] TEST NDCG@10=0.8183
R

DONE
Trying features [1, 2, 5, 6] TEST NDCG@10=0.9037
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 7] TEST NDCG@10=0.81
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 6, 7] TEST NDCG@10=0.8793
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4, 5] TEST NDCG@10=0.9003
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4, 6] TEST NDC

DONE
Trying features [1, 2, 4, 5, 7] TEST NDCG@10=0.8248
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 4, 6, 7] TEST NDCG@10=0.9055
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 6, 7] TEST NDCG@10=0.892
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4, 5, 6] TEST NDCG@10=0.9051
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4

## Compare to model w/ all features

Compare the features output above (something like...)

```
Impact of each feature on the model
7 - 17618.35445148437
4 - 16165.586045512271
3 - 10958.610341321868
5 - 9256.821192289186
1 - 1436.0640878600943
```

to one trained with the full model. Notice how features have different impacts. This is due to feature dependency

In [50]:
from ltr import train
trainLog  = train(client,
                  trainingInFile='data/title_judgments_train.txt',
                  metric2t='NDCG@10',
                  leafs=20,
                  trees=20,
                  features=[1,2,3,4,5,6,7],
                  featureSet='title',
                  modelName='title')

print()
print("Impact of each feature on the model")
for ftrId, impact in trainLog.impacts.items():
    print("{} - {}".format(ftrId, impact))
    
for roundDcg in trainLog.rounds:
    print(roundDcg)
    
print("Train NDCG@10 %s" % trainLog.rounds[-1])

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/title_model.txt  -feature features.txt 
DONE
Submit Model title Ftr Set title [Status: 200]
Feature Set title... [Status: 200]
Deleted Model title [Status: 200]
Created Model title [Status: 200]

Impact of each feature on the model
3 - 33716.81189167293
7 - 28769.019966278764
4 - 14862.855713290153
6 - 5028.751391215197
1 - 1742.7824900498674
5 - 216.52853462896087
2 - 0.0
0.9178
0.9207
0.9207
0.9207
0.9198
0.9216
0.9216
0.922
0.922
0.9217
0.9352
0.9362
0.9364
0.9364
0.9368
0.9368
0.9378
0.9377
0.9369
0.9378
Train NDCG@10 0.9378


## Bias towards fewer features

By adding a 'cost', to feature search, we add a multiplier that punishes models with more features slightly. This results in a tiny bias towards simpler models all things being equal. As we'd prefer one that doesn't need to execute more features

In [51]:
from ltr.train import feature_search
rankLibResult, ndcgPerFeature = feature_search(client,
                                               trainingInFile='data/title_judgments_train.txt',
                                               metric2t='NDCG@10',
                                               leafs=20,
                                               trees=20,
                                               kcv=15,
                                               featureCost=0.1,# 1.0-cost ^ num_features
                                               features=[1,2,3,4,5,6,7],
                                               featureSet='title')

print()
print("Impact of each feature on the model")
trainLogs = rankLibResult.trainingLogs
for ftrId, impact in trainLogs[-1].impacts.items():
    print("{} - {}".format(ftrId, impact))
    
for roundDcg in trainLogs[-1].rounds:
    print(roundDcg)
    
print("Avg NDCG@10 when feature included:")
for ftrId, ndcg in ndcgPerFeature.items():
    print("%s => %s" % (ftrId, ndcg))
    
print("Avg K-Fold NDCG@10 %s" % rankLibResult.kcvTestAvg)

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1] TEST NDCG@10=0.9071 after cost 0.9071
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2] TEST NDCG@10=0.9064 after cost 0.9064
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3] TEST NDCG@10=0.8482 after cost 0.8482
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4] TEST NDCG@10=0.4683 after cost 0.4683
Runn

DONE
Trying features [1, 2, 7] TEST NDCG@10=0.8016 after cost 0.649296
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4] TEST NDCG@10=0.9068 after cost 0.734508
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 5] TEST NDCG@10=0.9036 after cost 0.731916
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 6] TEST NDCG@10=0.9013 after cost 0.7300530000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model

DONE
Trying features [1, 2, 3, 5] TEST NDCG@10=0.9036 after cost 0.6587244000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 3, 6] TEST NDCG@10=0.9013 after cost 0.6570477000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 3, 7] TEST NDCG@10=0.8894 after cost 0.6483726000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 4, 5] TEST NDCG@10=0.8609 after cost 0.6275961000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title

DONE
Trying features [3, 4, 6, 7] TEST NDCG@10=0.9101 after cost 0.6634629000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3, 5, 6, 7] TEST NDCG@10=0.8878 after cost 0.6472062000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4, 5, 6, 7] TEST NDCG@10=0.8766 after cost 0.6390414000000001
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 3, 4, 5] TEST NDCG@10=0.9003 after cost 0.59068683
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title_judg

DONE
Trying features [1, 2, 3, 4, 5, 6, 7] TEST NDCG@10=0.911 after cost 0.48414275100000004

Impact of each feature on the model
7 - 17618.35445148437
4 - 16165.586045512271
3 - 10958.610341321868
5 - 9256.821192289186
1 - 1436.0640878600943
0.9135
0.9097
0.9111
0.9111
0.9111
0.9111
0.9111
0.9292
0.9284
0.9326
0.9326
0.9327
0.9404
0.9373
0.9394
0.9401
0.9398
0.9433
0.9401
0.9429
Avg NDCG@10 when feature included:
1 => 0.8861281250000003
2 => 0.8564515625000003
3 => 0.89500625
4 => 0.8424765625000001
5 => 0.8450390625000002
6 => 0.890534375
7 => 0.8295687500000001
Avg K-Fold NDCG@10 0.92


# Evaluating the Model

It's interesting to see what features our model makes use of, but we need guidance on adding additional features to the model. We know our model is an ensemble of decision trees. Wouldn't it be cool if we could trace where documents end up on that decision tree?

Specifically, we care about problems. Or what we will call affectionately *whoopsies*. 

As a 'whoopsie' example, consider the query "Rambo". if a '0' document like 'First Daughter' ranked the same or higher than a '4' document ("Rambo")., that's a problem. It's also an opportunity for improvement. We'd want to isolate that, see if it's indicative of a broader trend, and thus worth adding a feature for.

Let's see a concrete example

In [52]:
from ltr.MART_model import eval_model
from ltr.judgments import judgments_from_file, judgments_by_qid

features, _ = client.feature_set(index='tmdb', name='title')

judgmentDict = judgments_by_qid(judgments_from_file(filename='data/title_judgments_train.txt'))


rambo=judgmentDict[1]
model = eval_model(modelName='title',
                       features=features,
                       judgments=rambo)

print()
print("## Evaluating graded docs for search keywords '%s'" % rambo[0].keywords)
print()
print(model)

Feature Set title... [Status: 200]
Recognizing 40 queries...

## Evaluating graded docs for search keywords 'rambo'

if title_bm25 > 11.069751:
  if title_has_phrase > 0.0:
    if title_fuzzy > 11.173206:
      if overview_bm25 > 0.0:
        if overview_bm25 > 8.7690325:
          if title_fuzzy > 19.553139:
            <= 0.2000(0/0/)
          else:
            if release_year > 2014.0:
              <= 0.2000(0/0/)
            else:
              <= 0.1970(0/0/)
        else:
          if title_bm25 > 17.690922:
            <= 0.2000(0/0/)
          else:
            if release_year > 1999.0:
              <= 0.2000(0/0/)
            else:
              <= 0.2000(0/0/)
      else:
        if title_bm25 > 15.621795:
          if release_year > 2000.0:
            <= 0.1915(0/0/)
          else:
            if title_bm25 > 15.72525:
              if title_bm25 > 16.656357:
                <= 0.2000(0/0/)
              else:
                <= 0.2000(0/0/)
            else:
          

## Examining our evaluation for whoopsies

Let's looks at one tree in our ensemble, te see how it was evaluated.

```
if title_bm25 > 10.664251:
  if title_phrase > 0.0:
    if title_bm25 > 13.815164:
      if release_year > 2000.0:
        <= 0.1215(0/0/)
      else:
        <= 0.1240(0/0/)
    else:
      if title_bm25 > 10.667803:
        if overview_bm25 > 0.0:
          <= 0.1194(0/0/)
        else:
          <= 0.1161(1/0/)
      else:
        <= 0.1264(0/0/)
  else:
    <= 0.0800(0/0/)
else:
  if title_phrase > 0.0:
    if title_bm25 > 8.115499:
      if title_bm25 > 8.217656:
        <= 0.1097(2/1/qid:40:2(12180)-3(140607))
      else:
        <= 0.1559(0/0/)
    else:
      <= -0.0021(2/1/qid:40:2(1895)-3(330459))
  else:
    <= -0.1093(25/1/qid:40:0(85783)-3(1892))
```

You'll notice here this tree is represented by a series of if statements, where the feature's name is used. This is handy as it lets us take apart the structure of the tree.

You'll also notice the leaf nodes starting with 

```
<=
```

These leaf nodes have a floating point value, corresponding to the relevance score that documents ending up here will have. Each leaf also has three items in paranthesis, such as `(2/1/qid:40:2(1895)-3(330459))`. This is a report summarizing the result of evaluating the tree on the provided judgment list. Indicating:


```



   +--- 2 Documents evaluated to this leaf node                   +-- max grade doc eval'd to this leaf
   |                                                              |
   | +----- 1 'whoopsie' occured                                  |  +-- corresp. doc id of max doc
   | |                                                            |  |
   | |   +--- details on each whoopsie ----------- qid:40:2(1985)-3(330459)
   | |   |                                              | |  |
  (2/1/qid:40:2(1895)-3(330459))                        | |  |
                                                        | |  + doc id of min graded doc
                                                        | |
                                                        | + min grade of docs eval'd to this leaf
                                                        |
                                                        + query id of whoopsie from judgments
```


Looking at Star Wars, our biggest issues in this tree are with the bottom-most leaf. Here

```
if title_bm25 > 10.664251:
  ...
else:
  if title_phrase > 0.0:
    ...
  else:
    <= -0.1093(25/1/qid:40:0(85783)-3(1892))
```


Document 85783 (a '0') and doc 1892 are given the same grade.

### Whoopsie, from the query perspective

Whoopsies can also be examined at the "query" level to see for a query id, how many whoopsies existed, and what was the evaluation for that query at each tree. This can help see if an error was fixed later in the ensemble of trees.

In [53]:
whoopsies = model.whoopsies()
for qid, whoopsie in whoopsies.items():
    print("== QID %s ==" % qid)
    print("%s - %s" % (whoopsie.count, whoopsie.totalMagnitude))
    print(whoopsie.perTreeReport())

== QID 1 ==
20 - 42
tree:0=>0(319074)-4(1368);tree:1=>0(319074)-4(1368);tree:2=>0(319074)-4(1368);tree:3=>0(319074)-4(1368);tree:4=>0(319074)-4(1368);tree:5=>0(319074)-2(13258);tree:6=>0(319074)-2(13258);tree:7=>0(319074)-2(13258);tree:8=>0(319074)-2(13258);tree:9=>0(319074)-2(13258);tree:10=>0(319074)-1(61410);tree:11=>1(31362)-4(1368);tree:12=>0(319074)-1(61410);tree:13=>0(319074)-1(61410);tree:14=>0(319074)-1(61410);tree:15=>0(319074)-1(61410);tree:16=>0(319074)-1(61410);tree:17=>0(319074)-1(208982);tree:18=>0(319074)-1(208982);tree:19=>0(319074)-1(208982)


## Examine problem doc 319074

(notice nothing mentions 'star wars')

In [55]:


client.get_doc(doc_id=319074)

{'id': '319074',
 'title': ['In Football We Trust'],
 'title_bidirect_syn': ['In Football We Trust'],
 'title_directed_syn': ['In Football We Trust'],
 'title_multiterm_syn': ['In Football We Trust'],
 'title_idioms': ['In Football We Trust'],
 'text_all_idioms': ['In Football We Trust',
  '‘In Football We Trust’ captures a snapshot in time amid the rise of the Pacific Islander presence in the NFL. Presenting a new take on the American immigrant story, this feature length documentary transports viewers deep inside the tightly-knit Polynesian community in Salt Lake City, Utah. With unprecedented access and shot over a four-year time period, the film intimately portrays four young Polynesian men striving to overcome gang violence and near poverty through American football. Viewed as the "salvation" for their families, these young players reveal the culture clash they experience as they transform out of their adolescence and into the high stakes world of collegiate recruiting and rigors o

## Add a feature: collection name

We have an intuition about our data, there is a field for the movies "collection name". See it here below:

In [56]:
from ltr.helpers.movies import get_movie
get_movie(1892)

{'id': 1892,
 'title': 'Return of the Jedi',
 'video': False,
 'mlensId': '1210',
 'vote_average': 7.8,
 'backdrop_path': '/koE7aMeR2ATivI18mCbscLsI0Nm.jpg',
 'tagline': 'The Empire Falls...',
 'directors': [{'id': 19800,
   'department': 'Directing',
   'credit_id': '52fe431ec3a36847f803bbfd',
   'name': 'Richard Marquand',
   'profile_path': '/wuO69rNp2mMG9unvRpZhbccoAh9.jpg',
   'job': 'Director'}],
 'release_date': '1983-05-23',
 'belongs_to_collection': {'poster_path': '/ghd5zOQnDaDW1mxO7R5fXXpZMu.jpg',
  'id': 10,
  'backdrop_path': '/d8duYyyC9J5T825Hg7grmaabfxQ.jpg',
  'name': 'Star Wars Collection'},
 'runtime': 135,
 'popularity': 3.914347,
 'status': 'Released',
 'original_language': 'en',
 'cast': [{'order': 0,
   'id': 2,
   'cast_id': 8,
   'credit_id': '52fe431ec3a36847f803bc13',
   'name': 'Mark Hamill',
   'profile_path': '/ws544EgE5POxGJqq9LUfhnDrHtV.jpg',
   'character': 'Luke Skywalker'},
  {'order': 1,
   'id': 3,
   'cast_id': 9,
   'credit_id': '52fe431ec3a36847f8

## Now reindex with collection name...

We'll add collection name, and reindex.

In [57]:
def add_collection_name(src_movie, base_doc):
    if 'belongs_to_collection' in src_movie and src_movie['belongs_to_collection'] is not None:
        if 'name' in src_movie['belongs_to_collection']:
            base_doc['collection_name_en'] = src_movie['belongs_to_collection']['name']
    return base_doc

from ltr.index import rebuild_tmdb
rebuild_tmdb(client, enrich=add_collection_name)

Deleted index tmdb [Status: 200]
Created index tmdb [Status: 200]
Reindexing...
Indexed 0 movies (last Black Mirror: White Christmas)
Indexed 100 movies (last Apocalypse Now)
Indexed 200 movies (last Crooks in Clover)
Indexed 300 movies (last For a Few Dollars More)
Indexed 400 movies (last Downfall)
Flushing 500 movies
Done [Status: 200]
Indexed 500 movies (last Finding Nemo)
Indexed 600 movies (last Platoon)
Indexed 700 movies (last Night of the Living Dead)
Indexed 800 movies (last Evangelion: 1.0: You Are (Not) Alone)
Indexed 900 movies (last Batman: Assault on Arkham)
Flushing 500 movies
Done [Status: 200]
Indexed 1000 movies (last Riley's First Date?)
Indexed 1100 movies (last The Raid)
Indexed 1200 movies (last Falling Down)
Indexed 1300 movies (last Kal Ho Naa Ho)
Indexed 1400 movies (last Elizabeth)
Flushing 500 movies
Done [Status: 200]
Indexed 1500 movies (last Irreversible)
Indexed 1600 movies (last Friday Night Lights)
Indexed 1700 movies (last Ben X)
Indexed 1800 movies (

Done [Status: 200]
Indexed 16000 movies (last The Great Northfield Minnesota Raid)
Indexed 16100 movies (last Lotta Leaves Home)
Indexed 16200 movies (last Just One of the Girls)
Indexed 16300 movies (last Which Way Is The Front Line From Here? The Life and Time of Tim Hetherington)
Indexed 16400 movies (last The Ladies Man)
Flushing 500 movies
Done [Status: 200]
Indexed 16500 movies (last Assassin of the Tsar)
Indexed 16600 movies (last The Adventures of Tarzan)
Indexed 16700 movies (last Vendetta)
Indexed 16800 movies (last Trucker)
Indexed 16900 movies (last Branded)
Flushing 500 movies
Done [Status: 200]
Indexed 17000 movies (last Mariage à Mendoza)
Indexed 17100 movies (last Love Bites)
Indexed 17200 movies (last The Ballad of Ramblin' Jack)
Indexed 17300 movies (last Blade of the Ripper)
Indexed 17400 movies (last Kiler)
Flushing 500 movies
Done [Status: 200]
Indexed 17500 movies (last Kaïrat)
Indexed 17600 movies (last Body Bags)
Indexed 17700 movies (last Dave Attell: Captain M

Confirm it's in our doc now...

In [58]:
client.get_doc(doc_id=1892)

{'id': '1892',
 'title': ['Return of the Jedi'],
 'title_bidirect_syn': ['Return of the Jedi'],
 'title_directed_syn': ['Return of the Jedi'],
 'title_multiterm_syn': ['Return of the Jedi'],
 'title_idioms': ['Return of the Jedi'],
 'text_all_idioms': ['Return of the Jedi',
  "As Rebel leaders map their strategy for an all-out attack on the Emperor's newer, bigger Death Star. Han Solo remains frozen in the cavernous desert fortress of Jabba the Hutt, the most loathsome outlaw in the universe, who is also keeping Princess Leia as a slave girl. Now a master of the Force, Luke Skywalker rescues his friends, but he cannot become a true Jedi Knight until he wages his own crucial battle against Darth Vader, who has sworn to win Luke over to the dark side of the Force.",
  'The Empire Falls...',
  'Richard Marquand',
  "Mark Hamill Harrison Ford Carrie Fisher Billy Dee Williams Anthony Daniels David Prowse Kenny Baker Peter Mayhew Frank Oz Ian McDiarmid James Earl Jones Sebastian Shaw Hayden 

## Add it to the features, and regenerate training data....

In [60]:
config = [
    #1
    {
      "name" : "title_has_phrase",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "title:\"${keywords})\"^=1"
      }
    },
    #2
    {
      "name" : "title_has_terms",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "title:(${keywords})^=1"
      }
    },
    #3
    {
      "name" : "title_bm25",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "title:(${keywords})"
      }
    },
    #4
    {
      "name" : "overview_bm25",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "overview:(${keywords})"
      }
    },
    #5
    {
      "name" : "overview_phrase_bm25",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "overview:\"${keywords}\""
      }
    },
    #6
    {
      "name" : "title_fuzzy",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "{!lucene df=title}${keywords}~"
      }
    },
    #7
    {
      "name" : "release_year",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "{!func}def(release_year,2000)"
      }
    },
    #8 Collection Name BM25 Score
    {
      "name" : "coll_name_bm25",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "collection_name_en:(${keywords})"
      }
    },
    #9 Collection Name Phrase BM25 Score
    {
      "name" : "coll_name_phrase_bm25",
      "store": "title2",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "collection_name_en:\"${keywords}\""
      }
    }

]




from ltr import setup
setup(client, config=config, featureset='title2')

from ltr.log import judgments_to_training_set
trainingSet = judgments_to_training_set(client, 
                                        judgmentInFile='data/title_judgments.txt', 
                                        trainingOutFile='data/title2_judgments_train.txt', 
                                        featureSet='title2')

Deleted title2 Featurestore [Status: 200]
Created title2 feature store under tmdb: [Status: 200]
Recognizing 40 queries...
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for rambo (0/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for rocky (1/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for war games (2/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for crocodile dundee (3/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for matrix (4/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for contact (5/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for space jam (6/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for battlestar galactica (7/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for her (8/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for jobs (9/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DATA for social network (10/40)
Searching tmdb [Status: 200]
REBUILDING TRAINING DAT

## Now a feature search

And do a feature search over these new features (go get some coffee).

We also up the number of trees & leafs to see if it has an impact

In [61]:
from ltr.train import feature_search
rankLibResult, ndcgPerFeature = feature_search(client,
                                               trainingInFile='data/title2_judgments_train.txt',
                                               metric2t='NDCG@10',
                                               leafs=20,
                                               trees=20,
                                               kcv=15,
                                               features=[1,2,3,4,5,6,7,8,9],
                                               featureSet='title2')

print()
print("Impact of each feature on the model")
trainLogs = rankLibResult.trainingLogs
for ftrId, impact in trainLogs[-1].impacts.items():
    print("{} - {}".format(ftrId, impact))
    
for roundDcg in trainLogs[-1].rounds:
    print(roundDcg)
    
print("Avg NDCG@10 when feature included:")
for ftrId, ndcg in ndcgPerFeature.items():
    print("%s => %s" % (ftrId, ndcg))
    
print("Avg K-Fold NDCG@10 %s" % rankLibResult.kcvTestAvg)

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1] TEST NDCG@10=0.9071
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2] TEST NDCG@10=0.9064
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3] TEST NDCG@10=0.8482
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4] TEST NDCG@10=0.4683
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -

DONE
Trying features [5, 6] TEST NDCG@10=0.8801
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [5, 7] TEST NDCG@10=0.3738
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [5, 8] TEST NDCG@10=0.751
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [5, 9] TEST NDCG@10=0.7595
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [6, 7] TEST NDCG@10=0.8574
Running java 

DONE
Trying features [1, 7, 8] TEST NDCG@10=0.8238
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 7, 9] TEST NDCG@10=0.812
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 8, 9] TEST NDCG@10=0.8943
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 3, 4] TEST NDCG@10=0.8931
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 3, 5] TEST NDCG@10=0.875

DONE
Trying features [3, 6, 9] TEST NDCG@10=0.896
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3, 7, 8] TEST NDCG@10=0.8735
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3, 7, 9] TEST NDCG@10=0.8796
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [3, 8, 9] TEST NDCG@10=0.8591
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4, 5, 6] TEST NDCG@10=0.869

DONE
Trying features [1, 2, 5, 6] TEST NDCG@10=0.9037
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 7] TEST NDCG@10=0.81
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 8] TEST NDCG@10=0.8341
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 9] TEST NDCG@10=0.8419
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 6, 7] TEST

DONE
Trying features [1, 5, 6, 7] TEST NDCG@10=0.894
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 5, 6, 8] TEST NDCG@10=0.8989
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 5, 6, 9] TEST NDCG@10=0.8987
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 5, 7, 8] TEST NDCG@10=0.8345
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 5, 7, 9] TES

DONE
Trying features [2, 5, 6, 7] TEST NDCG@10=0.8744
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 5, 6, 8] TEST NDCG@10=0.8606
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 5, 6, 9] TEST NDCG@10=0.8788
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 5, 7, 8] TEST NDCG@10=0.7537
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 5, 7, 9] TE

DONE
Trying features [4, 5, 8, 9] TEST NDCG@10=0.6237
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4, 6, 7, 8] TEST NDCG@10=0.8639
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4, 6, 7, 9] TEST NDCG@10=0.8646
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4, 6, 8, 9] TEST NDCG@10=0.8619
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [4, 7, 8, 9] TE

DONE
Trying features [1, 2, 4, 8, 9] TEST NDCG@10=0.8643
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 6, 7] TEST NDCG@10=0.892
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 6, 8] TEST NDCG@10=0.8972
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 6, 9] TEST NDCG@10=0.9053
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 

DONE
Trying features [1, 4, 5, 7, 8] TEST NDCG@10=0.8489
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 4, 5, 7, 9] TEST NDCG@10=0.8299
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 4, 5, 8, 9] TEST NDCG@10=0.8559
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 4, 6, 7, 8] TEST NDCG@10=0.9134
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1,

DONE
Trying features [2, 4, 5, 6, 9] TEST NDCG@10=0.8881
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 4, 5, 7, 8] TEST NDCG@10=0.7226
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 4, 5, 7, 9] TEST NDCG@10=0.6953
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 4, 5, 8, 9] TEST NDCG@10=0.7382
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2,

DONE
Trying features [1, 2, 3, 4, 5, 6] TEST NDCG@10=0.9051
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 3, 4, 5, 7] TEST NDCG@10=0.92
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 3, 4, 5, 8] TEST NDCG@10=0.924
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 3, 4, 5, 9] TEST NDCG@10=0.919
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying feat

DONE
Trying features [1, 2, 6, 7, 8, 9] TEST NDCG@10=0.8872
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4, 5, 6, 7] TEST NDCG@10=0.911
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4, 5, 6, 8] TEST NDCG@10=0.9229
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 3, 4, 5, 6, 9] TEST NDCG@10=0.9137
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying f

DONE
Trying features [2, 3, 5, 6, 8, 9] TEST NDCG@10=0.876
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 3, 5, 7, 8, 9] TEST NDCG@10=0.895
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 3, 6, 7, 8, 9] TEST NDCG@10=0.9003
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [2, 4, 5, 6, 7, 8] TEST NDCG@10=0.8811
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying fe

DONE
Trying features [1, 2, 4, 5, 6, 8, 9] TEST NDCG@10=0.9086
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 4, 5, 7, 8, 9] TEST NDCG@10=0.8512
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 4, 6, 7, 8, 9] TEST NDCG@10=0.9054
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 
DONE
Trying features [1, 2, 5, 6, 7, 8, 9] TEST NDCG@10=0.8941
Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/temp_model.txt  -feature features.txt  -kcv 15 


## Review new feature impacts

Impact of each feature on the model... this is the best mix. Feature 8 helps, but not feature 9 as much. Interesting

```
4 - 18032.527656827504
3 - 9801.409052757816
5 - 8051.741259194476
7 - 5711.964176322393
8 - 3798.6132329430748
1 - 1439.2180228991883
```

## Now save away this model

In [62]:
from ltr import train
trainLog  = train(client,
                  trainingInFile='data/title2_judgments_train.txt',
                  metric2t='NDCG@10',
                  leafs=20,
                  trees=20,
                  features=[1,3,4,5,7,8],
                  featureSet='title2',
                  modelName='title2')

print()
print("Impact of each feature on the model")
for ftrId, impact in trainLog.impacts.items():
    print("{} - {}".format(ftrId, impact))
    
for roundDcg in trainLog.rounds:
    print(roundDcg)
    
print("Train NDCG@10 %s" % trainLog.rounds[-1])

Running java -jar data/RankyMcRankFace.jar -ranker 6 -metric2t NDCG@10 -tree 20 -leaf 20 -train data/title2_judgments_train.txt -save data/title2_model.txt  -feature features.txt 
DONE
Submit Model title2 Ftr Set title2 [Status: 200]
Feature Set title2... [Status: 200]
Deleted Model title2 [Status: 200]
Created Model title2 [Status: 200]

Impact of each feature on the model
3 - 45970.19084259676
4 - 14863.915643776061
5 - 6010.83353619383
8 - 5937.817286358266
1 - 1747.9503211101292
7 - 903.9746857424649
0.9169
0.9261
0.9226
0.9259
0.9265
0.9265
0.9265
0.9282
0.9282
0.9282
0.9422
0.9463
0.9465
0.9469
0.9465
0.9474
0.9482
0.9508
0.9508
0.951
Train NDCG@10 0.951


In [63]:
from ltr import search
search(client, "star wars", modelName='title2')

Query {'fl': '*,... [Status: 200]
['Star Wars'] 
2.4088867 
1977 
['Adventure', 'Action', 'Science Fiction'] 
['Princess Leia is captured and held hostage by the evil Imperial forces in their effort to take over the galactic Empire. Venturesome Luke Skywalker and dashing captain Han Solo team together with the loveable robot duo R2-D2 and C-3PO to rescue the beautiful princess and restore peace and justice in the Empire.'] 
---------------------------------------
['Star Wars: The Clone Wars'] 
1.8170158 
2008 
['Thriller', 'Animation', 'Action', 'Science Fiction', 'Adventure', 'Fantasy'] 
["Set between Episode II and III the Clone Wars is the first computer animated Star Wars film. Anakin and Obi Wan must find out who kidnapped Jabba the Hutts son and return him safely. The Seperatists will try anything to stop them and ruin any chance of a diplomatic agreement between the Hutt's and the Republic."] 
---------------------------------------
['Star Wars: Episode I - The Phantom Menace'] 

## Examine Model 2

In [None]:
from ltr.MART_model import eval_model
from ltr.judgments import judgments_from_file, judgments_by_qid

features, _ = client.feature_set(index='tmdb', name='title2')

judgmentDict = judgments_by_qid(judgments_from_file(filename='data/title2_judgments_train.txt'))


rambo=judgmentDict[1]
model = eval_model(modelName='title2',
                       features=features,
                       judgments=rambo)

print()
print("## Evaluating graded docs for search keywords '%s'" % rambo[0].keywords)
print()
print(model)

In [None]:
whoopsies = model.whoopsies()
for qid, whoopsie in whoopsies.items():
    print("== QID %s ==" % qid)
    print("%s - %s" % (whoopsie.count, whoopsie.totalMagnitude))
    print(whoopsie.perTreeReport())

```
== QID 1 ==
10 - 40
tree:0=>0(319074)-4(1368);tree:1=>0(319074)-4(1368);tree:2=>0(319074)-4(1368);tree:3=>0(319074)-4(1368);tree:4=>0(319074)-4(1368);tree:5=>0(319074)-4(1368);tree:6=>0(319074)-4(1368);tree:7=>0(319074)-4(1368);tree:8=>0(319074)-4(1368);tree:9=>0(319074)-4(1368)
```