# We're Gonna Need a Bigger Bot

Let's map all the bits and pieces to a meatier example, to see how hello-ltr's abstractions make it easier to play with LTR via ipynbs

Genome-tags is a crowdsourced movie tagging resource project. Each movie is assigned from 0-1 how close a movie matches a tag. Luckily the tags look remarkably like search queries (ie `star trek` or `berlin`). We derrived judgments from the genome-tags data, and use them to experiment with search.

BUT

While some tags are straight forward (`Star Trek`) others are much tougher (`boxing` or `french art movie`) where its unlikely any text has a match. 

We're not going to solve those problems here, but we give this to you as a sandbox to apply your skills after the class to see how close you can get to approximating the genome tags data. 


### Clients 

While syntaxes differ, the LTR process is nearly identical between Solr and Elastic. So you can repeat a lot of the labs with Solr (some have been already translated)

But we'll stick with Solr

In [1]:
from ltr.client import SolrClient
client = SolrClient()

### Download data if you need to

In [2]:
from ltr import download

judgments='http://es-learn-to-rank.labs.o19s.com/genome_judgments.txt'
corpus='http://es-learn-to-rank.labs.o19s.com/tmdb.json'
download([corpus, judgments]);

data/tmdb.json already exists
data/genome_judgments.txt already exists


### Reindex if you need to

In [None]:
from ltr.index import rebuild
from ltr.helpers.movies import indexable_movies

movies=indexable_movies(movies='data/tmdb.json')
rebuild(client, index='tmdb', doc_src=movies)

###  Feature Sets in ipynb

You played with creating a feature set in the last lab, see the same process repeated here.

Learning to rank requires creating feature set. Each feature has a name like `title_bm25` and as part of a list an ordinal `title_bm25` is the 0th item. Confusingly, Ranklib uses 1-based feature numbering, so feature 0 in this list corresponds to feature 1 in Ranklib training file, that we'll see soon.

Notice also:

- Each feature is a templated query with `{{keywords}}` parameter, that is passed at query time
- We've added a `validation` block, which will run these queries with the specified parameters and index and return any query errors

In [3]:
client.reset_ltr(index='tmdb')

config = [
    {
        "store": "genome",
        "name" : "title_bm25",
        "class" : "org.apache.solr.ltr.feature.SolrFeature",
        "params" : {
          "q" : "title:(${keywords})"
        }
    },
    {
        "store": "genome",
        "name" : "overview_bm25",
        "class" : "org.apache.solr.ltr.feature.SolrFeature",
        "params" : {
          "q" : "overview:(${keywords})"
        }
    }
]

client.create_featureset(index='tmdb', name='genome', ftr_config=config)

Deleted classic model [Status: 200]
Deleted latest model [Status: 200]
Deleted release Featurestore [Status: 200]
Created genome feature store under tmdb: [Status: 200]


### Logging Queries

Logging is one of the more complex operations from an engineering perspective. 

The same query you ran manually when reviewing the slides is rerun here for every query in the source judgment list `judgmentInFile` with some batching when needed.

In [9]:
from ltr.judgments import judgments_open
from ltr.log import FeatureLogger
from itertools import groupby
from tqdm import tqdm

ftr_logger=FeatureLogger(client, index='tmdb', feature_set='genome')
with judgments_open('data/genome_judgments.txt') as judg_list:
    # For each query's judgments log features, and add to our training set...
    for qid, query_judgments in groupby(judg_list, key=lambda j: j.qid):
        ftr_logger.log_for_qid(judgments=query_judgments,
                               qid=qid, 
                               keywords=judg_list.keywords(qid))

Recognizing 1128 queries...
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 24549
Missing doc 12773
Missing doc 225130
Missing doc 61917
Discarded 4 Keep 1061
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 12773
Missing doc 67479
Missing doc 37106
Discarded 3 Keep 1065
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 164721
Missing doc 67479
Missing doc 13057
Missing doc 61917
Missing doc 37106
Discarded 5 Keep 1089
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13716
Missing doc 253941
Missing doc 12773
Discarded 3 Keep 1031
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13057
Missing doc 64699
Discarded 2 Keep 1102
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1031
Searching

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13057
Discarded 1 Keep 1070
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1037
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Discarded 1 Keep 1133
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Discarded 1 Keep 1025
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Discarded 1 Keep 1045
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 110414
Discarded 1 Keep 1055
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253941
Missing doc 225130
Missing doc 13716
Discarded 3 Keep 1042
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
M

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Missing doc 64699
Missing doc 61919
Missing doc 110414
Discarded 4 Keep 1061
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 133252
Missing doc 164721
Missing doc 15533
Missing doc 68149
Missing doc 253768
Missing doc 37106
Discarded 7 Keep 1551
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 15738
Discarded 1 Keep 1148
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 61917
Discarded 1 Keep 1135
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 61920
Missing doc 94174
Missing doc 156078
Missing doc 15533
Discarded 4 Keep 1174
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 164721
Missing 

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 61919
Missing doc 61920
Missing doc 17882
Missing doc 253768
Discarded 4 Keep 1042
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 156078
Missing doc 64699
Discarded 2 Keep 1037
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Discarded 1 Keep 1059
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1034
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1046
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253941
Missing doc 110639
Missing doc 10700
Discarded 3 Keep 950
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 61920
Missing doc 68149
Discarded 3 Keep 1034
Searching tmdb [Status: 200]

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 133252
Missing doc 61919
Discarded 2 Keep 1231
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13716
Missing doc 211779
Missing doc 253941
Missing doc 67479
Missing doc 17882
Discarded 5 Keep 1034
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 64699
Discarded 2 Keep 1097
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13716
Missing doc 156078
Discarded 2 Keep 1275
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253768
Missing doc 67479
Discarded 2 Keep 953
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 15738
Missing doc 206216
Missing doc 67479
Missing doc 10700
Discarded 4 Keep 1049
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Search

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 37106
Missing doc 12773
Discarded 2 Keep 909
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 133252
Discarded 1 Keep 1270
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 61920
Missing doc 61919
Missing doc 61917
Missing doc 67479
Missing doc 110414
Missing doc 10700
Missing doc 133252
Missing doc 15533
Discarded 8 Keep 1256
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Discarded 1 Keep 944
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 37106
Discarded 1 Keep 1598
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253941
Missing doc 61919
Missing doc 15738
Discarded 3 Keep 1058
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 982
Search

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 133252
Missing doc 15533
Discarded 2 Keep 1044
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 133252
Discarded 1 Keep 1040
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Missing doc 61919
Discarded 2 Keep 1036
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 12773
Missing doc 13057
Discarded 2 Keep 1058
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 211779
Missing doc 67479
Missing doc 110639
Discarded 4 Keep 1312
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Missing doc 13716
Discarded 2 Keep 1206
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216


Searching tmdb [Status: 200]
Missing doc 13716
Missing doc 253941
Missing doc 133252
Missing doc 15533
Missing doc 17882
Discarded 5 Keep 1079
Duplicate Doc in qid:375 105045
Duplicate Doc in qid:375 105045
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 64699
Missing doc 253941
Missing doc 94174
Missing doc 13057
Missing doc 61919
Discarded 5 Keep 1055
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 64699
Discarded 1 Keep 1077
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1150
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Missing doc 37106
Discarded 2 Keep 1014
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 64699
Discarded 2 Keep 1041
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching

Searching tmdb [Status: 200]
Missing doc 15738
Missing doc 225130
Discarded 2 Keep 1039
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 24549
Missing doc 225130
Missing doc 206216
Missing doc 61920
Discarded 4 Keep 1080
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 17882
Discarded 1 Keep 1044
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 110414
Discarded 2 Keep 1042
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1080
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 10700
Missing doc 164721
Missing doc 37106
Discarded 3 Keep 1036
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 15738
Discarded 1 Keep 1125
Searching tmdb [Status: 200]
Searching tmdb [Status: 200

Duplicate Doc in qid:483 105045
Duplicate Doc in qid:483 105045
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Missing doc 13057
Missing doc 17882
Discarded 3 Keep 1037
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 12773
Missing doc 68149
Missing doc 225130
Missing doc 13716
Missing doc 110414
Discarded 5 Keep 1043
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13716
Missing doc 225130
Discarded 2 Keep 1045
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 37106
Discarded 1 Keep 1057
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 110414
Missing doc 61920
Missing doc 64699
Missing doc 211779
Missing doc 58423
Missing doc 17882
Discarded 6 Keep 1061
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 37106
Missing doc 133252
Missing doc 15738
Missing doc 156078
Missing doc 24549
Discarded 5 Keep 1096
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13057
Missing doc 94174
Missing doc 61919
Missing doc 15533
Discarded 4 Keep 1444
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 12773
Missing doc 61919
Discarded 2 Keep 1047
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Discarded 1 Keep 1059
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1045
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 64699
Missing doc 253768
Missing doc 225130
Missing doc 15533
Missing doc 133252
Missing doc 61919
Discarded 6 Keep 1088
Searching tmdb [Status: 200]
Searching tmdb [

Discarded 6 Keep 1146
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13716
Missing doc 58423
Missing doc 67479
Missing doc 94174
Missing doc 15738
Discarded 5 Keep 1154
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Missing doc 58423
Discarded 2 Keep 1043
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Missing doc 206216
Discarded 2 Keep 1046
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 17882
Missing doc 164721
Missing doc 110414
Missing doc 13057
Discarded 4 Keep 1129
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 211779
Missing doc 15533
Discarded 2 Keep 1056
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1003
Searching tmdb [Status: 200]
Searching tmdb [Stat

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 206216
Discarded 2 Keep 1037
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 133252
Missing doc 156078
Discarded 2 Keep 1037
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Missing doc 61920
Missing doc 206216
Missing doc 94174
Discarded 4 Keep 1036
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13716
Discarded 1 Keep 1027
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 94174
Missing doc 17882
Missing doc 67479
Discarded 3 Keep 890
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 211779
Discarded 1 Keep 1041
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 94174
Missing doc 12773
Missing doc 1

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253768
Discarded 1 Keep 1152
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 94174
Missing doc 15533
Missing doc 13057
Missing doc 110414
Discarded 4 Keep 1037
Parsing QID 700
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 211779
Missing doc 68149
Discarded 3 Keep 1036
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 211779
Missing doc 61917
Missing doc 13057
Discarded 3 Keep 1087
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 133252
Missing doc 13716
Missing doc 211779
Discarded 3 Keep 1023
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253768
Missing doc 10700
Missing doc 253941
Missing doc 110414
Missing doc 58423
D

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 110639
Missing doc 64699
Discarded 3 Keep 1104
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 13057
Missing doc 61917
Missing doc 61919
Discarded 4 Keep 1261
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 58423
Discarded 1 Keep 1069
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1019
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 110639
Missing doc 58423
Missing doc 24549
Discarded 3 Keep 1053
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13057
Missing doc 17882
Missing doc 10700
Discarded 3 Keep 1036
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Miss

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253768
Missing doc 13057
Missing doc 225130
Discarded 3 Keep 1030
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 10700
Missing doc 58423
Missing doc 13716
Missing doc 12773
Discarded 4 Keep 1055
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1087
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 825
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 110414
Missing doc 164721
Missing doc 13716
Missing doc 15533
Missing doc 13057
Discarded 5 Keep 1049
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 15533
Missing doc 156078
Missing doc 61920
Discarded 3 Keep 957
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 15533
Missing doc 225130
Missing

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 156078
Missing doc 13716
Missing doc 15533
Discarded 3 Keep 1051
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 37106
Missing doc 13716
Missing doc 110414
Missing doc 68149
Discarded 4 Keep 1078
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253941
Missing doc 61917
Discarded 2 Keep 1040
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 156078
Missing doc 15738
Missing doc 13716
Missing doc 110414
Discarded 4 Keep 1073
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 133252
Missing doc 24549
Missing doc 61920
Discarded 4 Keep 1089
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13057
Missing doc 253941
Missing doc 24549


Searching tmdb [Status: 200]
Missing doc 110414
Missing doc 225130
Missing doc 61917
Missing doc 15533
Missing doc 15738
Missing doc 24549
Discarded 6 Keep 1188
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 211779
Missing doc 12773
Missing doc 253941
Missing doc 64699
Missing doc 17882
Discarded 5 Keep 1052
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 110414
Missing doc 156078
Discarded 2 Keep 1077
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 253768
Missing doc 110414
Missing doc 225130
Missing doc 164721
Discarded 4 Keep 965
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 61920
Missing doc 61919
Discarded 3 Keep 1005
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Discarded 0 Keep 1068
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missi

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 110639
Missing doc 37106
Discarded 2 Keep 1134
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 24549
Missing doc 253941
Missing doc 68149
Discarded 3 Keep 1178
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 61917
Discarded 1 Keep 1136
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 156078
Missing doc 110414
Discarded 3 Keep 1158
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 164721
Missing doc 253768
Discarded 2 Keep 1041
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 211779
Missing doc 61920
Missing doc 94174
Discarded 3 Keep 1063
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 61917
Missing doc 253941
Missing doc 133252
Missing doc 110639
Discarded 4 Keep 1036
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 64699
Missing doc 225130
Discarded 2 Keep 1047
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 206216
Missing doc 58423
Missing doc 164721
Discarded 3 Keep 1111
Duplicate Doc in qid:1013 105045
Duplicate Doc in qid:1013 105045
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 225130
Missing doc 94174
Missing doc 61919
Missing doc 64699
Discarded 4 Keep 1233
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 58423
Missing doc 13057
Missing doc 206216
Missing doc 133252
Discarded 4 Keep 1307
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [

Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 58423
Missing doc 110414
Missing doc 24549
Missing doc 67479
Discarded 4 Keep 1013
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 13057
Missing doc 225130
Discarded 2 Keep 1048
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 12773
Missing doc 110639
Discarded 2 Keep 1027
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 58423
Discarded 1 Keep 792
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 94174
Missing doc 12773
Missing doc 156078
Missing doc 24549
Discarded 5 Keep 1069
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 15738
Discarded 1 Keep 1046
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 127

Searching tmdb [Status: 200]
Discarded 0 Keep 1049
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 164721
Missing doc 64699
Missing doc 61917
Missing doc 110414
Discarded 4 Keep 941
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Missing doc 110639
Missing doc 64699
Missing doc 67479
Missing doc 37106
Discarded 5 Keep 1072
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 164721
Discarded 1 Keep 1100
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 10700
Missing doc 61919
Missing doc 13716
Discarded 3 Keep 1040
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 68149
Missing doc 206216
Discarded 2 Keep 1056
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Searching tmdb [Status: 200]
Missing doc 67479
Missing doc 15738
Discarded 2 Keep 1089
Searching tmdb

### Training

Here's where we train the model, under the hood this executes Ranklib just as you ran during the training exercises.

Notice here we're optimizing for NDCG@10


In [10]:
from ltr.ranklib import train
trainResponse = train(client,
                 training_set=ftr_logger.logged,
                 metric2t='NDCG@10',
                 featureSet='genome',
                 index='tmdb',
                 modelName='genome')

/var/folders/vc/thmh159x5xddb6_cgtx778sc0000gn/T/RankyMcRankFace.jar already exists
Running java -jar /var/folders/vc/thmh159x5xddb6_cgtx778sc0000gn/T/RankyMcRankFace.jar -ranker 6 -shrinkage 0.1 -metric2t NDCG@10 -tree 50 -bag 1 -leaf 10 -frate 1.0 -srate 1.0 -train /var/folders/vc/thmh159x5xddb6_cgtx778sc0000gn/T/training.txt -save data/genome_model.txt 
DONE
Submit Model genome Ftr Set genome [Status: 200]
Feature Set genome... [Status: 200]
Deleted Model genome [Status: 200]
Created Model genome [Status: 200]


Now that training is done, we can output some statistics about the model, including the training metrics. In future units we'll get more into what this looks like.

Notice the training NDCG isn't that great. When originally run, it was only 0.5885. So pretty far off of the genome data. One challenge of Learning to Rank (and Relevance in general) is trying to figure out the features that can close the gap. 

In [11]:
print("Impact of each feature on the model")
trainLog = trainResponse.trainingLogs[0]
for ftrId, impact in trainLog.impacts.items():
    print("{} - {}".format(ftrId, impact))
    
print("trainLog Metric %s" % trainLog.metric())

Impact of each feature on the model
1 - 2356791.866082119
2 - 777415.5791925859
trainLog Metric 0.7085


### Search with our model

Here we're going to search using the `genome` model. You can see the LTR query being output (sent to Elasticsearch). You're encouraged to run that directly against Elasticsearch if you like.

Please note, this isn't rescoring. And that's fine for our purposes of directly evaluating the model, in real life you really should run a rescore query.

In [12]:
from ltr import search
search(client, "batman", modelName='genome')

['The Dark Knight'] 
3.9317484 
2008 
['Drama', 'Action', 'Crime', 'Thriller'] 
['Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker.'] 
---------------------------------------
['The Dark Knight Rises'] 
3.9317484 
2012 
['Action', 'Crime', 'Drama', 'Thriller'] 
["Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect