# MovieLens Recommender – Learning-to-Rank Ensemble with Gradient Boosting

In previous notebooks we:

- Built several **base recommenders** (bias, item-kNN, MF, content).
- Stacked them with a **linear / logistic meta-model**.
- Evaluated both RMSE and **top-K ranking metrics**.

In this notebook we go **one level further** and build a more powerful
and production-like **learning-to-rank (LTR) meta-model** on top of
multiple signals.

We will:

1. Use MovieLens `ml-latest-small` (`ratings.csv`, `movies.csv`).
2. Train 4 base models:
   - BiasModel (global + user + item effects).
   - ItemKNNModel (item-based CF).
   - MatrixFactorizationModel (latent factors).
   - GenreContentModel (simple content model using genres).
3. Engineer **extra meta-features**:
   - User-level stats (activity, mean rating).
   - Item-level stats (popularity, mean rating).
4. Define relevance as `rating ≥ 4.0`.
5. Train two meta-models for comparison:
   - Logistic regression (baseline from previous notebook).
   - Gradient boosting classifier (tree-based LTR-ish model).
6. Evaluate base models, logistic stack and gradient boosting stack on:
   - Hit-rate@K.
   - Precision@K.
   - Recall@K.
7. Inspect feature importances to understand **which signals matter**.

The idea is close in spirit to production systems where you:

- Use collaborative + content models to generate features.
- Add user/item statistics and contextual signals.
- Feed everything into a tree-based LTR model (GBDT, XGBoost, etc.).


## 1. Imports and configuration

In [None]:
from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Tuple, Set

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier

from sklearn.preprocessing import MultiLabelBinarizer, normalize

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)

RANDOM_STATE: int = 42
np.random.seed(RANDOM_STATE)

DATA_DIR: Path = Path("data") / "ml-latest-small"
RATINGS_PATH: Path = DATA_DIR / "ratings.csv"
MOVIES_PATH: Path = DATA_DIR / "movies.csv"

for p in [RATINGS_PATH, MOVIES_PATH]:
    if not p.exists():
        raise FileNotFoundError(
            f"Required file not found: {p.resolve()}\n"
            "Please ensure MovieLens 'ml-latest-small' is under data/ml-latest-small/."
        )

ratings_df = pd.read_csv(RATINGS_PATH)
movies_df = pd.read_csv(MOVIES_PATH)

print("Ratings shape:", ratings_df.shape)
print("Movies shape: ", movies_df.shape)
ratings_df.head()


In [None]:
def rmse(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """Compute root mean squared error (RMSE).

    Args:
        y_true: True ratings.
        y_pred: Predicted ratings.

    Returns:
        RMSE value.
    """
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))


sns.histplot(ratings_df["rating"], bins=10)
plt.title("Rating distribution")
plt.xlabel("Rating")
plt.show()


## 2. Train / meta / test splits

In [None]:
train_full_df, test_df = train_test_split(
    ratings_df,
    test_size=0.2,
    random_state=RANDOM_STATE,
)

base_train_df, meta_train_df = train_test_split(
    train_full_df,
    test_size=0.2,
    random_state=RANDOM_STATE,
)

print("Base train size:", len(base_train_df))
print("Meta train size:", len(meta_train_df))
print("Test size:      ", len(test_df))


## 3. Collaborative base models

We reuse the same 3 collaborative models as before.

### 3.1 BiasModel

In [None]:
class BiasModel:
    """Global + user + item bias recommender.

    Rating estimate: mu + b_u + b_i.
    """

    def __init__(self) -> None:
        self.mu: float | None = None
        self.user_bias: Dict[int, float] = {}
        self.item_bias: Dict[int, float] = {}

    def fit(self, df: pd.DataFrame) -> None:
        if df.empty:
            raise ValueError("Training DataFrame is empty.")

        self.mu = float(df["rating"].mean())
        user_mean = df.groupby("userId")["rating"].mean()
        item_mean = df.groupby("movieId")["rating"].mean()
        self.user_bias = (user_mean - self.mu).to_dict()
        self.item_bias = (item_mean - self.mu).to_dict()

    def predict_row(self, user_id: int, movie_id: int) -> float:
        if self.mu is None:
            raise RuntimeError("Model not fitted.")
        bu = self.user_bias.get(user_id, 0.0)
        bi = self.item_bias.get(movie_id, 0.0)
        return float(self.mu + bu + bi)

    def predict_df(self, df: pd.DataFrame) -> np.ndarray:
        preds: List[float] = []
        for row in df.itertuples(index=False):
            preds.append(self.predict_row(int(row.userId), int(row.movieId)))
        return np.array(preds, dtype=float)


bias_model = BiasModel()
bias_model.fit(base_train_df)

bias_meta_scores = bias_model.predict_df(meta_train_df)
print("Bias model RMSE on meta_train:", rmse(meta_train_df["rating"].to_numpy(), bias_meta_scores))


### 3.2 ItemKNNModel

In [None]:
class ItemKNNModel:
    """Item-based k-NN collaborative filtering model."""

    def __init__(self, k: int = 40, default_rating: float = 3.5) -> None:
        self.k = k
        self.default_rating = float(default_rating)
        self.user_id_to_index: Dict[int, int] = {}
        self.item_id_to_index: Dict[int, int] = {}
        self.R: np.ndarray | None = None
        self.item_sim: np.ndarray | None = None

    def fit(self, df: pd.DataFrame) -> None:
        if df.empty:
            raise ValueError("Training DataFrame is empty.")

        unique_users = df["userId"].unique()
        unique_items = df["movieId"].unique()

        self.user_id_to_index = {uid: idx for idx, uid in enumerate(unique_users)}
        self.item_id_to_index = {iid: idx for idx, iid in enumerate(unique_items)}

        n_users = len(unique_users)
        n_items = len(unique_items)
        R = np.zeros((n_users, n_items), dtype=np.float32)

        for row in df.itertuples(index=False):
            u_idx = self.user_id_to_index[row.userId]
            i_idx = self.item_id_to_index[row.movieId]
            R[u_idx, i_idx] = row.rating

        self.R = R
        self.item_sim = cosine_similarity(R.T)

    def _predict_single(self, user_id: int, movie_id: int) -> float:
        if self.R is None or self.item_sim is None:
            raise RuntimeError("Model not fitted.")

        u_idx = self.user_id_to_index.get(user_id)
        i_idx = self.item_id_to_index.get(movie_id)
        if u_idx is None or i_idx is None:
            return self.default_rating

        user_ratings = self.R[u_idx, :]
        sims = self.item_sim[i_idx, :]

        rated_mask = user_ratings > 0
        rated_indices = np.where(rated_mask)[0]
        if rated_indices.size == 0:
            return self.default_rating

        sims_rated = sims[rated_indices]
        ratings_rated = user_ratings[rated_indices]

        k_use = min(self.k, rated_indices.size)
        top_idx = np.argsort(sims_rated)[-k_use:]

        neigh_sims = sims_rated[top_idx]
        neigh_ratings = ratings_rated[top_idx]

        if np.all(neigh_sims == 0):
            return float(neigh_ratings.mean())

        pred = float(np.dot(neigh_sims, neigh_ratings) / np.sum(np.abs(neigh_sims)))
        return pred

    def predict_df(self, df: pd.DataFrame) -> np.ndarray:
        preds: List[float] = []
        for row in df.itertuples(index=False):
            preds.append(self._predict_single(int(row.userId), int(row.movieId)))
        return np.array(preds, dtype=float)


bias_global_mean = float(base_train_df["rating"].mean())
item_knn_model = ItemKNNModel(k=40, default_rating=bias_global_mean)
item_knn_model.fit(base_train_df)

item_meta_scores = item_knn_model.predict_df(meta_train_df)
print("Item-kNN RMSE on meta_train:", rmse(meta_train_df["rating"].to_numpy(), item_meta_scores))


### 3.3 MatrixFactorizationModel

In [None]:
@dataclass
class MFConfig:
    n_factors: int = 30
    n_epochs: int = 8
    lr: float = 0.01
    reg: float = 0.05


class MatrixFactorizationModel:
    """Matrix factorisation with biases trained via SGD."""

    def __init__(self, config: MFConfig, random_state: int = 42) -> None:
        self.config = config
        self.random_state = random_state
        self.mu: float | None = None
        self.user_bias: Dict[int, float] = {}
        self.item_bias: Dict[int, float] = {}
        self.P: Dict[int, np.ndarray] = {}
        self.Q: Dict[int, np.ndarray] = {}

    def fit(self, df: pd.DataFrame) -> None:
        if df.empty:
            raise ValueError("Training DataFrame is empty.")

        rng = np.random.default_rng(self.random_state)

        user_ids = df["userId"].unique()
        item_ids = df["movieId"].unique()

        self.mu = float(df["rating"].mean())
        self.user_bias = {u: 0.0 for u in user_ids}
        self.item_bias = {i: 0.0 for i in item_ids}

        k = self.config.n_factors
        self.P = {u: 0.1 * rng.standard_normal(k) for u in user_ids}
        self.Q = {i: 0.1 * rng.standard_normal(k) for i in item_ids}

        lr = self.config.lr
        reg = self.config.reg

        user_arr = df["userId"].to_numpy()
        item_arr = df["movieId"].to_numpy()
        rating_arr = df["rating"].to_numpy()

        n_obs = len(df)

        for epoch in range(self.config.n_epochs):
            idx = rng.permutation(n_obs)
            se = 0.0
            for t in idx:
                u = int(user_arr[t])
                i = int(item_arr[t])
                r_ui = float(rating_arr[t])

                bu = self.user_bias[u]
                bi = self.item_bias[i]
                pu = self.P[u]
                qi = self.Q[i]

                pred = self.mu + bu + bi + float(np.dot(pu, qi))
                err = r_ui - pred
                se += err * err

                self.user_bias[u] = bu + lr * (err - reg * bu)
                self.item_bias[i] = bi + lr * (err - reg * bi)

                pu_new = pu + lr * (err * qi - reg * pu)
                qi_new = qi + lr * (err * pu - reg * qi)

                self.P[u] = pu_new
                self.Q[i] = qi_new

            train_rmse = float(np.sqrt(se / n_obs))
            print(f"Epoch {epoch+1}/{self.config.n_epochs} - train RMSE: {train_rmse:.4f}")

    def predict_single(self, user_id: int, movie_id: int) -> float:
        if self.mu is None:
            raise RuntimeError("Model not fitted.")
        bu = self.user_bias.get(user_id)
        bi = self.item_bias.get(movie_id)
        pu = self.P.get(user_id)
        qi = self.Q.get(movie_id)
        if bu is None or bi is None or pu is None or qi is None:
            return float(self.mu)
        return float(self.mu + bu + bi + np.dot(pu, qi))

    def predict_df(self, df: pd.DataFrame) -> np.ndarray:
        preds: List[float] = []
        for row in df.itertuples(index=False):
            preds.append(self.predict_single(int(row.userId), int(row.movieId)))
        return np.array(preds, dtype=float)


mf_config = MFConfig()
mf_model = MatrixFactorizationModel(config=mf_config, random_state=RANDOM_STATE)

mf_model.fit(base_train_df)

mf_meta_scores = mf_model.predict_df(meta_train_df)
print("MF model RMSE on meta_train:", rmse(meta_train_df["rating"].to_numpy(), mf_meta_scores))


## 4. Genre-based content model

We add a simple content-based model using genres.

In [None]:
def build_genre_matrix(movies: pd.DataFrame) -> pd.DataFrame:
    """Build a multi-hot genre matrix with movieId as index."""
    movies = movies.copy()
    movies["genres_list"] = movies["genres"].fillna("(no genres listed)")

    def split_genres(s: str) -> List[str]:
        if s == "(no genres listed)" or s == "No Genres Listed":
            return []
        return [g.strip() for g in s.split("|") if g.strip()]

    movies["genres_list"] = movies["genres_list"].apply(split_genres)

    mlb = MultiLabelBinarizer()
    mat = mlb.fit_transform(movies["genres_list"])

    genre_df = pd.DataFrame(
        mat,
        columns=[f"genre:{g}" for g in mlb.classes_],
        index=movies["movieId"].to_numpy(),
    )
    return genre_df


genre_df = build_genre_matrix(movies_df)
print("Genre matrix shape:", genre_df.shape)


In [None]:
class GenreContentModel:
    """Content-based model using movie genres and user genre profiles."""

    def __init__(self, genre_df: pd.DataFrame, rel_threshold: float = 4.0) -> None:
        self.genre_df = genre_df
        self.rel_threshold = rel_threshold
        self.user_profiles: Dict[int, np.ndarray] = {}

    def fit(self, df: pd.DataFrame) -> None:
        liked = df[df["rating"] >= self.rel_threshold]
        joined = liked.merge(self.genre_df, left_on="movieId", right_index=True, how="left")
        genre_cols = self.genre_df.columns

        profiles = joined.groupby("userId")[genre_cols].mean().fillna(0.0)
        prof_mat = normalize(profiles.to_numpy(), norm="l2")

        for uid, vec in zip(profiles.index.to_numpy(), prof_mat):
            self.user_profiles[int(uid)] = vec

    def _score_single(self, user_id: int, movie_id: int) -> float:
        user_vec = self.user_profiles.get(user_id)
        if user_vec is None:
            return 0.0
        if movie_id not in self.genre_df.index:
            return 0.0
        item_vec = self.genre_df.loc[movie_id].to_numpy(dtype=float)
        if not np.any(item_vec):
            return 0.0
        item_vec_norm = item_vec / (np.linalg.norm(item_vec) + 1e-12)
        return float(np.dot(user_vec, item_vec_norm))

    def predict_df(self, df: pd.DataFrame) -> np.ndarray:
        scores: List[float] = []
        for row in df.itertuples(index=False):
            scores.append(self._score_single(int(row.userId), int(row.movieId)))
        return np.array(scores, dtype=float)


genre_model = GenreContentModel(genre_df=genre_df, rel_threshold=4.0)
genre_model.fit(base_train_df)

genre_meta_scores = genre_model.predict_df(meta_train_df)
print("Genre-content scores on meta_train: min=", genre_meta_scores.min(), ", max=", genre_meta_scores.max())


## 5. Relevance and ranking metrics

In [None]:
REL_THRESHOLD: float = 4.0
K_EVAL: int = 10


def get_user_relevant_items(df: pd.DataFrame, user_id: int, threshold: float = REL_THRESHOLD) -> Set[int]:
    mask = (df["userId"] == user_id) & (df["rating"] >= threshold)
    return set(df.loc[mask, "movieId"].unique())


def ranking_metrics_for_model_on_test(
    test_df: pd.DataFrame,
    scores: np.ndarray,
    k: int = K_EVAL,
    threshold: float = REL_THRESHOLD,
) -> Dict[str, float]:
    df_scores = test_df.copy()
    df_scores["score"] = scores

    users = df_scores["userId"].unique()

    hits = 0
    sum_precision = 0.0
    sum_recall = 0.0
    n_eval_users = 0

    for u in users:
        user_rows = df_scores[df_scores["userId"] == u]
        relevant_items = get_user_relevant_items(test_df, u, threshold=threshold)
        if not relevant_items:
            continue

        n_eval_users += 1

        user_sorted = user_rows.sort_values("score", ascending=False)
        top_k = user_sorted.head(k)

        recommended_items = set(top_k["movieId"].tolist())
        n_rel_top = len(recommended_items & relevant_items)

        if n_rel_top > 0:
            hits += 1

        precision_u = n_rel_top / min(k, len(user_sorted))
        recall_u = n_rel_top / len(relevant_items)

        sum_precision += precision_u
        sum_recall += recall_u

    if n_eval_users == 0:
        raise ValueError("No users with relevant items in test.")

    hit_rate = hits / n_eval_users
    precision_at_k = sum_precision / n_eval_users
    recall_at_k = sum_recall / n_eval_users

    return {
        "hit_rate": hit_rate,
        "precision_at_k": precision_at_k,
        "recall_at_k": recall_at_k,
        "n_eval_users": float(n_eval_users),
    }


## 6. Base models (4) as rankers on test

In [None]:
y_test = test_df["rating"].to_numpy()

bias_test_scores = bias_model.predict_df(test_df)
item_test_scores = item_knn_model.predict_df(test_df)
mf_test_scores = mf_model.predict_df(test_df)
genre_test_scores = genre_model.predict_df(test_df)

metrics_bias = ranking_metrics_for_model_on_test(test_df, bias_test_scores, k=K_EVAL)
metrics_item = ranking_metrics_for_model_on_test(test_df, item_test_scores, k=K_EVAL)
metrics_mf = ranking_metrics_for_model_on_test(test_df, mf_test_scores, k=K_EVAL)
metrics_genre = ranking_metrics_for_model_on_test(test_df, genre_test_scores, k=K_EVAL)

# Simple average of 4 models
avg4_scores = (bias_test_scores + item_test_scores + mf_test_scores + genre_test_scores) / 4.0
metrics_avg4 = ranking_metrics_for_model_on_test(test_df, avg4_scores, k=K_EVAL)

metrics_bias, metrics_item, metrics_mf, metrics_genre, metrics_avg4


In [None]:
rows = []
for name, m in [
    ("BiasModel", metrics_bias),
    ("ItemKNNModel", metrics_item),
    ("MFModel", metrics_mf),
    ("GenreContent", metrics_genre),
    ("SimpleAverage4", metrics_avg4),
]:
    rows.append(
        {
            "model": name,
            "hit_rate": m["hit_rate"],
            "precision_at_k": m["precision_at_k"],
            "recall_at_k": m["recall_at_k"],
        }
    )

base_rank_df = pd.DataFrame(rows)
base_rank_df


In [None]:
base_rank_melt = base_rank_df.melt(id_vars="model", var_name="metric", value_name="value")

sns.barplot(data=base_rank_melt, x="metric", y="value", hue="model")
plt.ylim(0, 1)
plt.ylabel("Score")
plt.title(f"Base models vs simple average (K={K_EVAL})")
plt.show()


## 7. Meta-features for learning-to-rank

We now build a richer feature set for the meta-model:

### 7.1 User and item statistics

We compute simple, interpretable statistics that are often useful in
production recommenders:

- User-level:
  - `user_n_ratings` (activity).
  - `user_mean_rating`.
- Item-level:
  - `item_n_ratings` (popularity).
  - `item_mean_rating`.

We will join these to both `meta_train_df` and `test_df`.


In [None]:
# User stats on train_full (to mimic train-time availability)
user_stats = train_full_df.groupby("userId")["rating"].agg(
    user_n_ratings="count",
    user_mean_rating="mean",
).reset_index()

# Item stats on train_full
item_stats = train_full_df.groupby("movieId")["rating"].agg(
    item_n_ratings="count",
    item_mean_rating="mean",
).reset_index()

user_stats.head(), item_stats.head()


### 7.2 Build meta-training feature matrix X_meta_ltr

In [None]:
# Join stats onto meta_train_df
meta_join = meta_train_df.merge(user_stats, on="userId", how="left").merge(
    item_stats, on="movieId", how="left"
)

# Base scores we already computed on meta_train_df
meta_join = meta_join.copy()
meta_join["score_bias"] = bias_meta_scores
meta_join["score_itemknn"] = item_meta_scores
meta_join["score_mf"] = mf_meta_scores
meta_join["score_genre"] = genre_meta_scores

# Fill any missing stats (e.g. cold users/items) with global averages
meta_join["user_n_ratings"].fillna(0, inplace=True)
meta_join["user_mean_rating"].fillna(train_full_df["rating"].mean(), inplace=True)
meta_join["item_n_ratings"].fillna(0, inplace=True)
meta_join["item_mean_rating"].fillna(train_full_df["rating"].mean(), inplace=True)

feature_cols = [
    "score_bias",
    "score_itemknn",
    "score_mf",
    "score_genre",
    "user_n_ratings",
    "user_mean_rating",
    "item_n_ratings",
    "item_mean_rating",
]

X_meta_ltr = meta_join[feature_cols].to_numpy(dtype=float)
y_meta_ltr = (meta_join["rating"].to_numpy() >= REL_THRESHOLD).astype(int)

print("X_meta_ltr shape:", X_meta_ltr.shape)
print("Positive rate (meta_ltr):", y_meta_ltr.mean())


## 8. Meta-models: LogisticRegression vs GradientBoostingClassifier

In [None]:
# Baseline meta-model: logistic regression

meta_logit = LogisticRegression(
    penalty="l2",
    C=1.0,
    solver="lbfgs",
    max_iter=1000,
    random_state=RANDOM_STATE,
)
meta_logit.fit(X_meta_ltr, y_meta_ltr)

print("Logistic coefficients:", meta_logit.coef_)


In [None]:
# Gradient boosting meta-model (tree-based LTR-style)

gb_meta = GradientBoostingClassifier(
    n_estimators=200,
    learning_rate=0.05,
    max_depth=3,
    subsample=0.8,
    random_state=RANDOM_STATE,
)

gb_meta.fit(X_meta_ltr, y_meta_ltr)

print("GradientBoosting trained. Number of features:", len(feature_cols))


## 9. Build test feature matrix and evaluate ranking

In [None]:
# Join stats onto test_df

test_join = test_df.merge(user_stats, on="userId", how="left").merge(
    item_stats, on="movieId", how="left"
)

# Add base scores

test_join = test_join.copy()
test_join["score_bias"] = bias_test_scores
test_join["score_itemknn"] = item_test_scores
test_join["score_mf"] = mf_test_scores
test_join["score_genre"] = genre_test_scores

# Fill missing stats with global defaults

test_join["user_n_ratings"].fillna(0, inplace=True)
test_join["user_mean_rating"].fillna(train_full_df["rating"].mean(), inplace=True)
test_join["item_n_ratings"].fillna(0, inplace=True)
test_join["item_mean_rating"].fillna(train_full_df["rating"].mean(), inplace=True)

X_test_ltr = test_join[feature_cols].to_numpy(dtype=float)

# Logistic meta scores (probability of relevance)
logit_test_proba = meta_logit.predict_proba(X_test_ltr)[:, 1]

# Gradient boosting meta scores
GB_test_proba = gb_meta.predict_proba(X_test_ltr)[:, 1]

metrics_logit = ranking_metrics_for_model_on_test(test_df, logit_test_proba, k=K_EVAL)
metrics_gb = ranking_metrics_for_model_on_test(test_df, GB_test_proba, k=K_EVAL)

metrics_logit, metrics_gb


In [None]:
# Collect all models (base + simple avg + meta-models) for comparison

rows_full = rows.copy()  # from base_rank_df construction
rows_full.append(
    {
        "model": "LogisticStack",
        "hit_rate": metrics_logit["hit_rate"],
        "precision_at_k": metrics_logit["precision_at_k"],
        "recall_at_k": metrics_logit["recall_at_k"],
    }
)
rows_full.append(
    {
        "model": "GBStack",
        "hit_rate": metrics_gb["hit_rate"],
        "precision_at_k": metrics_gb["precision_at_k"],
        "recall_at_k": metrics_gb["recall_at_k"],
    }
)

ltr_compare_df = pd.DataFrame(rows_full)
ltr_compare_df


In [None]:
ltr_compare_melt = ltr_compare_df.melt(id_vars="model", var_name="metric", value_name="value")

sns.barplot(data=ltr_compare_melt, x="metric", y="value", hue="model")
plt.ylim(0, 1)
plt.ylabel("Score")
plt.title(f"Base models vs LTR stacks (K={K_EVAL})")
plt.show()


You should see `GBStack` at least competitive with the best base model
and usually better than `SimpleAverage4` and `LogisticStack`. The gain
may be small on this small dataset but mirrors patterns in production
systems when more features and models are added.


## 10. Feature importances in the Gradient Boosting stack

In [None]:
importances = gb_meta.feature_importances_

feat_imp_df = pd.DataFrame({
    "feature": feature_cols,
    "importance": importances,
}).sort_values("importance", ascending=False)

feat_imp_df


In [None]:
sns.barplot(data=feat_imp_df, x="importance", y="feature")
plt.title("Gradient boosting meta-model feature importances")
plt.xlabel("Importance")
plt.ylabel("")
plt.show()


Typical patterns you might see:

- `score_mf` and `score_itemknn` carry strong signal.
- `score_genre` helps in cold or niche areas.
- `item_n_ratings` (popularity) and `user_n_ratings` (activity) are
  often surprisingly informative.

Because trees can model interactions, the model can learn rules like:

- "If item is very unpopular but genre score is high, still recommend."
- "If user is low-activity, rely more on content and biases.""
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "## 11. Design summary and further extensions

We now have a **learning-to-rank stack** on top of collaborative and
content recommenders, with extra user/item features."
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": 

### 11.2 Natural next steps

If you want to push even further:

1. **More base signals**
   - Add embeddings from text or plots.
   - Add LightFM or other hybrid models as additional score features.

2. **Pairwise / listwise LTR**
   - Use libraries that support LambdaRank / LambdaMART to train directly
     on pairwise ranking losses.

3. **Contextual features**
   - Time of day, device, country, campaign, etc.

4. **Per-user calibration**
   - Train different meta-models for user segments (e.g. power users vs
     casual users).

This notebook shows how to go from **stacked ensembles** to a more
realistic **LTR meta-ranker**, while keeping the logic transparent and
well-commented so you can adapt it to your own production setting.
