# Learning to Rank (LTR) Model Training

This notebook demonstrates how to train a Learning to Rank model using judgment lists and feature extraction with Elasticsearch.

In [1]:
%pip install -U elasticsearch eland "eland[scikit-learn]" xgboost tqdm -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/opt/python@3.11/bin/python3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Setup and Requirements

### Installing Dependencies
Install required libraries for Elasticsearch integration, machine learning, and feature extraction. We use eland for LTR model integration, XGBoost for ranking, and scikit-learn for data splitting utilities.


In [None]:
from elasticsearch import Elasticsearch, helpers
from dotenv import load_dotenv
import pandas as pd
import os
import json

load_dotenv()

# load dotenv variables
ELASTICSEARCH_URL = os.getenv("ELASTICSEARCH_URL")
ELASTICSEARCH_API_KEY = os.getenv("ELASTICSEARCH_API_KEY")
INDEX_NAME = "movies"

### Initialize Elasticsearch Connection
Configure the connection to Elasticsearch using credentials from environment variables. 


In [3]:
es_client = Elasticsearch(
    ELASTICSEARCH_URL,
    api_key=ELASTICSEARCH_API_KEY,
)

## Indexing the movies data

In [None]:
try:
    mappings = {
        "properties": {
            "text": {"type": "text"},
            "genre": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
        }
    }

    index_exists = es_client.indices.exists(index=INDEX_NAME)

    if not index_exists:
        print(f"Index {INDEX_NAME} does not exist, creating it...")
        es_client.indices.create(index=INDEX_NAME, body={"mappings": mappings})
    else:
        print(f"Index {INDEX_NAME} already exists, skipping creation...")

except Exception as e:
    print(e)

Index movies does not exist, creating it...


In [10]:
# Load dataset from JSON file
with open("dataset.json", "r") as f:
    dataset = json.load(f)


def build_data(dataset, index_name):
    for doc in dataset:
        action = {
            "_index": index_name,
            "_id": doc["_id"],
            "_source": {"text": doc["text"], "genre": doc["genre"]},
        }

        yield action


try:
    success, failed = helpers.bulk(
        es_client,
        build_data(dataset, INDEX_NAME),
    )
    print(f"Successfully indexed {success} documents")
    if failed:
        print(f"Failed to index {len(failed)} documents")
except Exception as e:
    print(e)

Successfully indexed 24 documents


## Feature Extraction and Training Dataset

### Loading Judgment Lists
Judgment lists contain relevance grades (0-4 scale) for query-document pairs. These labels represent human assessment of relevance and serve as the ground truth for training the LTR model.

In [None]:
# Load judgment list from JSON file
# Grade scale: 0=not relevant, 1=somewhat relevant, 2=moderately relevant, 3=highly relevant, 4=perfectly relevant
with open("judgments.json", "r") as f:
    judgments_data = json.load(f)

judgments_df = pd.DataFrame(judgments_data)

### Creating a Feature Set
In this section, we define feature extractors for our LTR model. The `QueryFeatureExtractor` executes queries against Elasticsearch and extracts relevance scores as features. We define two extractors: `text_bm25_score` for textual relevance and `genre_boost_score` for genre-based relevance, enabling the model to learn from multiple signal types.

In [40]:
from eland.ml.ltr import LTRModelConfig, QueryFeatureExtractor

# Creating a Feature Set
feature_extractors = [
    QueryFeatureExtractor(
        feature_name="text_bm25_score", query={"match": {"text": "{{query}}"}}
    ),
    QueryFeatureExtractor(
        feature_name="genre_boost_score", query={"match": {"genre": "{{query}}"}}
    ),
]

ltr_config = LTRModelConfig(feature_extractors)

### Logging Feature Values
Once the feature set is defined, we verify that features are computed correctly for each query-document pair. Feature logging allows us to inspect the raw feature values that will be used to train the LTR model, showing exactly how each document is represented numerically for a given query.

This is a demonstration - features will be extracted for all queries in the training loop below
- doc1, doc2: highly relevant (grade 4)
- doc3, doc4: moderately relevant (grade 3)
- doc5: somewhat relevant (grade 1)

In [None]:
from eland.ml.ltr import FeatureLogger

# Logging Feature Values
feature_logger = FeatureLogger(es_client, INDEX_NAME, ltr_config)

# Example: Extract features for a single query to demonstrate feature extraction
features = feature_logger.extract_features(
    query_params={"query": "DiCaprio performance"},
    doc_ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
)


features_df = pd.DataFrame(features).T
features_df.columns = ltr_config.feature_names
features_df

Unnamed: 0,text_bm25_score,genre_boost_score
doc1,1.48137,
doc2,2.508342,
doc3,1.48137,
doc4,,
doc5,1.48137,


### Building the Training Dataset
Now we combine judgments with extracted features to create the training dataset. Each row represents a query-document pair enriched with its relevance label and computed feature values. 

In [42]:
# Building the Training Dataset
# Extract features for all query-document pairs
for query_id in judgments_df["query_id"].unique():
    # Get the query text
    query_text = judgments_df[judgments_df["query_id"] == query_id]["query"].iloc[0]

    # Get document IDs for this query
    doc_ids = judgments_df[judgments_df["query_id"] == query_id]["doc_id"].tolist()

    # Extract features from Elasticsearch
    features = feature_logger.extract_features(
        query_params={"query": query_text}, doc_ids=doc_ids
    )

    # Update the judgments with the actual scores
    for doc_id, feature_values in features.items():
        mask = (judgments_df["query_id"] == query_id) & (
            judgments_df["doc_id"] == doc_id
        )
        # Assign features in the same order as defined in feature_extractors
        judgments_df.loc[mask, "text_bm25_score"] = feature_values[0]
        judgments_df.loc[mask, "genre_boost_score"] = feature_values[1]


judgments_df

Unnamed: 0,query_id,query,doc_id,text,genre,grade,text_bm25_score,genre_boost_score
0,query1,DiCaprio performance,doc1,DiCaprio's performance in The Revenant was bre...,Drama,4,1.48137,
1,query1,DiCaprio performance,doc2,Inception shows Leonardo DiCaprio in one of hi...,Sci-Fi,4,2.508342,
2,query1,DiCaprio performance,doc3,The Wolf of Wall Street features DiCaprio's ch...,Drama,3,1.48137,
3,query1,DiCaprio performance,doc4,Titanic showcases DiCaprio's early career brea...,Romance,3,,
4,query1,DiCaprio performance,doc5,Brad Pitt delivers a solid performance in this...,Crime,0,1.48137,
5,query1,DiCaprio performance,doc6,Tom Hanks gives an outstanding performance in ...,War,0,1.58419,
6,query1,DiCaprio performance,doc15,An action-packed adventure with stunning visua...,Action,0,,
7,query1,DiCaprio performance,doc12,A lighthearted comedy that will make you laugh...,Comedy,0,,
8,query2,sad movies that make you cry,doc8,A heartbreaking story of love and loss that ma...,Drama,4,3.930469,
9,query2,sad movies that make you cry,doc9,One of the saddest movies ever made — bring ti...,Drama,4,4.160228,


## Training the LTR Model

### Training and Model Evaluation
With the training dataset prepared, we train an XGBoost ranking model using NDCG (Normalized Discounted Cumulative Gain) as the objective metric. The model learns to rank documents within the context of each query. We use an 80/20 train-evaluation split while preserving query boundaries to ensure proper generalization.

In [43]:
from xgboost import XGBRanker
from sklearn.model_selection import GroupShuffleSplit

# Training Process & Model Evaluation
# Create the ranker model:
ranker = XGBRanker(
    objective="rank:ndcg",
    eval_metric=["ndcg@10"],
    early_stopping_rounds=20,
)

# Shaping training and eval data in the expected format.
X = judgments_df[ltr_config.feature_names]
y = judgments_df["grade"]
groups = judgments_df["query_id"]

# Split the dataset in two parts respectively used for training and evaluation of the model.
group_preserving_splitter = GroupShuffleSplit(n_splits=1, train_size=0.8).split(
    X, y, groups
)
train_idx, eval_idx = next(group_preserving_splitter)

train_features, eval_features = X.loc[train_idx], X.loc[eval_idx]
train_target, eval_target = y.loc[train_idx], y.loc[eval_idx]
train_query_groups, eval_query_groups = groups.loc[train_idx], groups.loc[eval_idx]

# Training the model
ranker.fit(
    X=train_features,
    y=train_target,
    group=train_query_groups.value_counts().sort_index().values,
    eval_set=[(eval_features, eval_target)],
    eval_group=[eval_query_groups.value_counts().sort_index().values],
    verbose=True,  # Use verbose=True to see the training progress
)

[0]	validation_0-ndcg@10:0.99739
[1]	validation_0-ndcg@10:0.99739
[2]	validation_0-ndcg@10:0.99739
[3]	validation_0-ndcg@10:0.99739
[4]	validation_0-ndcg@10:0.99739
[5]	validation_0-ndcg@10:0.99739
[6]	validation_0-ndcg@10:0.99739
[7]	validation_0-ndcg@10:0.99739
[8]	validation_0-ndcg@10:0.99739
[9]	validation_0-ndcg@10:0.99739
[10]	validation_0-ndcg@10:0.99739
[11]	validation_0-ndcg@10:0.99739
[12]	validation_0-ndcg@10:0.99739
[13]	validation_0-ndcg@10:0.99739
[14]	validation_0-ndcg@10:0.99739
[15]	validation_0-ndcg@10:0.99739
[16]	validation_0-ndcg@10:0.99739
[17]	validation_0-ndcg@10:0.99739
[18]	validation_0-ndcg@10:0.99739
[19]	validation_0-ndcg@10:0.99739
[20]	validation_0-ndcg@10:0.99739


## Deploying the Model to Elasticsearch

### Uploading the Model
The final step is to deploy the trained model into Elasticsearch so it can be used as a second-stage reranker at query time. The model is converted into a format Elasticsearch understands and registered with a unique model identifier, making it available for production ranking pipelines.


In [None]:
from eland.ml import MLModel

LEARNING_TO_RANK_MODEL_ID = "ltr-model-xgboost"

# Uploading the Model
MLModel.import_ltr_model(
    es_client=es_client,
    model=ranker,
    model_id=LEARNING_TO_RANK_MODEL_ID,
    ltr_model_config=ltr_config,
    es_if_exists="replace",  # ensures that the model can be updated without a manual cleanup
)

<eland.ml.ml_model.MLModel at 0x11b6ad350>

## Testing the Results

### Verifying Model Deployment
After uploading the model, we verify it was successfully registered in Elasticsearch. The output shows the model settings, feature extractors, and other configuration details used during training.


In [45]:
response = es_client.ml.get_trained_models(model_id=LEARNING_TO_RANK_MODEL_ID)

model_list = response["trained_model_configs"]
print(json.dumps(model_list, indent=4))

[
    {
        "model_id": "ltr-model-xgboost",
        "model_type": "tree_ensemble",
        "created_by": "api_user",
        "version": "12.0.0",
        "create_time": 1767038669607,
        "model_size_bytes": 14952,
        "estimated_operations": 46,
        "license_level": "platinum",
        "tags": [],
        "input": {
            "field_names": []
        },
        "inference_config": {
            "learning_to_rank": {
                "num_top_feature_importance_values": 0,
                "feature_extractors": [
                    {
                        "query_extractor": {
                            "feature_name": "text_bm25_score",
                            "query": {
                                "match": {
                                    "text": "{{query}}"
                                }
                            }
                        }
                    },
                    {
                        "query_extractor": {
               

## Using the Rescorer

### Comparing BM25 vs LTR Performance
We now compare the ranking results using traditional BM25 scoring against the LTR-enhanced rescoring. 

### DiCaprio Performance Test - BM25 Baseline


In [None]:
query = "DiCaprio performance"

bm25_query = {"multi_match": {"query": query, "fields": ["text"]}}

bm25_search_response = es_client.search(index=INDEX_NAME, query=bm25_query)

# Convert to DataFrame
bm25_results = [
    {
        "doc_id": movie["_id"],
        "text": movie["_source"]["text"],
        "bm25_score": movie["_score"],
    }
    for movie in bm25_search_response["hits"]["hits"]
]
bm25_df_dicaprio = pd.DataFrame(bm25_results)
bm25_df_dicaprio["rank_bm25"] = range(1, len(bm25_df_dicaprio) + 1)
bm25_df_dicaprio["query"] = query
bm25_df_dicaprio[["rank_bm25", "doc_id", "text", "bm25_score"]]

Unnamed: 0,rank_bm25,doc_id,text,bm25_score
0,1,doc2,Inception shows Leonardo DiCaprio in one of hi...,2.508342
1,2,doc7,Meryl Streep's performance in this biographica...,1.641145
2,3,doc6,Tom Hanks gives an outstanding performance in ...,1.58419
3,4,doc1,DiCaprio's performance in The Revenant was bre...,1.48137
4,5,doc3,The Wolf of Wall Street features DiCaprio's ch...,1.48137
5,6,doc5,Brad Pitt delivers a solid performance in this...,1.48137


### Di caprio performance test with LTR rescorer

In [None]:
ltr_rescorer = {
    "learning_to_rank": {
        "model_id": LEARNING_TO_RANK_MODEL_ID,
        "params": {"query": query},
    },
    "window_size": 100,
}

rescored_search_response = es_client.search(
    index=INDEX_NAME, query=bm25_query, rescore=ltr_rescorer
)

# Convert to DataFrame
ltr_results = [
    {
        "doc_id": movie["_id"],
        "text": movie["_source"]["text"],
        "ltr_score": movie["_score"],
    }
    for movie in rescored_search_response["hits"]["hits"]
]
ltr_df_dicaprio = pd.DataFrame(ltr_results)
ltr_df_dicaprio["rank_ltr"] = range(1, len(ltr_df_dicaprio) + 1)
ltr_df_dicaprio["query"] = query
ltr_df_dicaprio[["rank_ltr", "doc_id", "text", "ltr_score"]]

Unnamed: 0,rank_ltr,doc_id,text,ltr_score
0,1,doc2,Inception shows Leonardo DiCaprio in one of hi...,1.515389
1,2,doc1,DiCaprio's performance in The Revenant was bre...,0.914665
2,3,doc3,The Wolf of Wall Street features DiCaprio's ch...,0.914665
3,4,doc5,Brad Pitt delivers a solid performance in this...,0.914665
4,5,doc6,Tom Hanks gives an outstanding performance in ...,0.914665
5,6,doc7,Meryl Streep's performance in this biographica...,0.483204


### Movies that make you cry test with BM25 (without LTR rescorer)

In [None]:
query = "sad movies that make you cry"

bm25_query = {"multi_match": {"query": query, "fields": ["text"]}}

bm25_search_response = es_client.search(index=INDEX_NAME, query=bm25_query)

# Convert to DataFrame
bm25_results = [
    {
        "doc_id": movie["_id"],
        "text": movie["_source"]["text"],
        "bm25_score": movie["_score"],
    }
    for movie in bm25_search_response["hits"]["hits"]
]
bm25_df_sad = pd.DataFrame(bm25_results)
bm25_df_sad["rank_bm25"] = range(1, len(bm25_df_sad) + 1)
bm25_df_sad["query"] = query
bm25_df_sad[["rank_bm25", "doc_id", "text", "bm25_score"]]

Unnamed: 0,rank_bm25,doc_id,text,bm25_score
0,1,doc12,A lighthearted comedy that will make you laugh...,5.796269
1,2,doc9,One of the saddest movies ever made — bring ti...,4.160228
2,3,doc8,A heartbreaking story of love and loss that ma...,3.930469
3,4,doc14,An absurdist comedy that will have you rolling...,2.855625
4,5,doc18,A psychological thriller that will keep you on...,2.527532
5,6,doc23,An eye-opening documentary that reveals shocki...,1.259684
6,7,doc21,A beautiful love story that spans decades and ...,1.177925
7,8,doc10,A tragic tale of separation and longing that l...,1.140901


### Movies that make you cry test with LTR rescorer

In [None]:
ltr_rescorer = {
    "learning_to_rank": {
        "model_id": LEARNING_TO_RANK_MODEL_ID,
        "params": {"query": query},
    },
    "window_size": 100,
}

rescored_search_response = es_client.search(
    index=INDEX_NAME, query=bm25_query, rescore=ltr_rescorer
)

# Convert to DataFrame
ltr_results = [
    {
        "doc_id": movie["_id"],
        "text": movie["_source"]["text"],
        "ltr_score": movie["_score"],
    }
    for movie in rescored_search_response["hits"]["hits"]
]
ltr_df_sad = pd.DataFrame(ltr_results)
ltr_df_sad["rank_ltr"] = range(1, len(ltr_df_sad) + 1)
ltr_df_sad["query"] = query
ltr_df_sad[["rank_ltr", "doc_id", "text", "ltr_score"]]

Unnamed: 0,rank_ltr,doc_id,text,ltr_score
0,1,doc8,A heartbreaking story of love and loss that ma...,2.16965
1,2,doc9,One of the saddest movies ever made — bring ti...,2.16965
2,3,doc14,An absurdist comedy that will have you rolling...,1.515389
3,4,doc18,A psychological thriller that will keep you on...,1.515389
4,5,doc12,A lighthearted comedy that will make you laugh...,1.275434
5,6,doc10,A tragic tale of separation and longing that l...,0.815503
6,7,doc21,A beautiful love story that spans decades and ...,0.815503
7,8,doc23,An eye-opening documentary that reveals shocki...,0.815503


## Comprehensive Comparison

### BM25 vs LTR Rescoring Results

In [None]:
# Prepare BM25 DataFrames
bm25_df_dicaprio_clean = bm25_df_dicaprio[
    ["query", "doc_id", "text", "rank_bm25", "bm25_score"]
].copy()
bm25_df_dicaprio_clean.rename(
    columns={"rank_bm25": "rank", "bm25_score": "score"}, inplace=True
)
bm25_df_dicaprio_clean["method"] = "BM25"

bm25_df_sad_clean = bm25_df_sad[
    ["query", "doc_id", "text", "rank_bm25", "bm25_score"]
].copy()
bm25_df_sad_clean.rename(
    columns={"rank_bm25": "rank", "bm25_score": "score"}, inplace=True
)
bm25_df_sad_clean["method"] = "BM25"

# Prepare LTR DataFrames
ltr_df_dicaprio_clean = ltr_df_dicaprio[
    ["query", "doc_id", "text", "rank_ltr", "ltr_score"]
].copy()
ltr_df_dicaprio_clean.rename(
    columns={"rank_ltr": "rank", "ltr_score": "score"}, inplace=True
)
ltr_df_dicaprio_clean["method"] = "LTR"

ltr_df_sad_clean = ltr_df_sad[
    ["query", "doc_id", "text", "rank_ltr", "ltr_score"]
].copy()
ltr_df_sad_clean.rename(
    columns={"rank_ltr": "rank", "ltr_score": "score"}, inplace=True
)
ltr_df_sad_clean["method"] = "LTR"

# Combine all results
comparison_df = pd.concat(
    [
        bm25_df_dicaprio_clean,
        ltr_df_dicaprio_clean,
        bm25_df_sad_clean,
        ltr_df_sad_clean,
    ],
    ignore_index=True,
)

# Pivot to create comparison table
comparison_pivot = comparison_df.pivot_table(
    index=["query", "doc_id", "text"],
    columns="method",
    values=["rank", "score"],
    aggfunc="first",
).reset_index()

# Flatten column names
comparison_pivot.columns = [
    "query",
    "doc_id",
    "text",
    "rank_bm25",
    "rank_ltr",
    "score_bm25",
    "score_ltr",
]
comparison_pivot = comparison_pivot[
    ["query", "doc_id", "text", "rank_bm25", "rank_ltr", "score_bm25", "score_ltr"]
]

# Sort by query and BM25 rank
comparison_pivot = comparison_pivot.sort_values(["query", "rank_bm25"])

# Display comparison table
print("Comparison: BM25 vs LTR Rescoring")
comparison_pivot

Comparison: BM25 vs LTR Rescoring


Unnamed: 0,query,doc_id,text,rank_bm25,rank_ltr,score_bm25,score_ltr
1,DiCaprio performance,doc2,Inception shows Leonardo DiCaprio in one of hi...,1,1,2.508342,1.515389
5,DiCaprio performance,doc7,Meryl Streep's performance in this biographica...,2,6,1.641145,0.483204
4,DiCaprio performance,doc6,Tom Hanks gives an outstanding performance in ...,3,5,1.58419,0.914665
0,DiCaprio performance,doc1,DiCaprio's performance in The Revenant was bre...,4,2,1.48137,0.914665
2,DiCaprio performance,doc3,The Wolf of Wall Street features DiCaprio's ch...,5,3,1.48137,0.914665
3,DiCaprio performance,doc5,Brad Pitt delivers a solid performance in this...,6,4,1.48137,0.914665
7,sad movies that make you cry,doc12,A lighthearted comedy that will make you laugh...,1,5,5.796269,1.275434
13,sad movies that make you cry,doc9,One of the saddest movies ever made — bring ti...,2,2,4.160228,2.16965
12,sad movies that make you cry,doc8,A heartbreaking story of love and loss that ma...,3,1,3.930469,2.16965
8,sad movies that make you cry,doc14,An absurdist comedy that will have you rolling...,4,3,2.855625,1.515389



## Clean up

In [None]:
es_client.indices.delete(index=INDEX_NAME)

ObjectApiResponse({'acknowledged': True})