# MovieLens Recommender – Modern Libraries & Advanced Evaluation

This notebook pushes the MovieLens project further using **specialised
recommender libraries** and richer evaluation.

We will:

1. Use **Surprise** to train strong classic models (SVD, KNNBaseline).
2. Build an **implicit feedback** dataset and train a **LightFM** model.
3. Evaluate both explicit and implicit models with:
   - RMSE (ratings).
   - Precision@K / Recall@K / Hit-rate (ranking).
4. Inspect **embeddings** visually for items.

> This notebook assumes you have already downloaded MovieLens
> `ml-latest-small` and placed it under:
>
> ```text
> data/ml-latest-small/ratings.csv
> data/ml-latest-small/movies.csv   # optional but recommended
> ```

> It also assumes you can install extra packages in your environment
> (Surprise, LightFM).


## 0. Environment setup (to run outside this notebook)

Run these commands in your environment **before** executing the notebook
cells that import Surprise or LightFM:

```bash
pip install scikit-surprise lightfm
```

If you use conda:

```bash
conda install -c conda-forge scikit-surprise lightfm
```


## 1. Imports and configuration

We use:

- `pandas`, `numpy` – general data handling.
- `matplotlib`, `seaborn` – visualisations.
- `surprise` – SVD and KNNBaseline models for explicit ratings.
- `lightfm` – matrix factorisation model for implicit feedback.


In [None]:
from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Iterable, List, Tuple, Set

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)

RANDOM_STATE: int = 42
np.random.seed(RANDOM_STATE)

DATA_DIR: Path = Path("data") / "ml-latest-small"
RATINGS_PATH: Path = DATA_DIR / "ratings.csv"
MOVIES_PATH: Path = DATA_DIR / "movies.csv"

if not RATINGS_PATH.exists():
    raise FileNotFoundError(
        f"Ratings file not found at {RATINGS_PATH.resolve()}. "
        "Please download MovieLens (ml-latest-small) and place ratings.csv under data/ml-latest-small/."
    )

ratings_df = pd.read_csv(RATINGS_PATH)
print("Ratings shape:", ratings_df.shape)
ratings_df.head()


Quick sanity checks and visuals similar to the previous notebook, but
kept brief here to focus on modelling.


In [None]:
n_users = ratings_df["userId"].nunique()
n_items = ratings_df["movieId"].nunique()

print(f"Users: {n_users}, Movies: {n_items}, Ratings: {len(ratings_df)}")
print(f"Density: {len(ratings_df) / (n_users * n_items):.6f}")

sns.histplot(ratings_df["rating"], bins=10)
plt.title("Rating distribution")
plt.xlabel("Rating")
plt.show()


## 2. Surprise – Explicit Feedback Models (SVD, KNNBaseline)

The **Surprise** library provides efficient implementations of many
classic collaborative filtering algorithms with convenient evaluation.

We will:

1. Wrap MovieLens ratings into a Surprise `Dataset`.
2. Train:
   - **SVD** (matrix factorisation with biases).
   - **KNNBaseline** (neighbour-based with baseline estimates).
3. Evaluate **RMSE** and **MAE** on a hold-out set.
4. Derive **ranking metrics** (Precision@K, Recall@K) from predictions.


In [None]:
from surprise import Dataset, Reader, SVD, KNNBaseline
from surprise.model_selection import train_test_split as surprise_train_test_split
from surprise import accuracy

# Wrap DataFrame into Surprise Dataset
reader = Reader(rating_scale=(ratings_df["rating"].min(), ratings_df["rating"].max()))
data = Dataset.load_from_df(ratings_df[["userId", "movieId", "rating"]], reader)

trainset, testset = surprise_train_test_split(data, test_size=0.2, random_state=RANDOM_STATE)
print("Surprise train size:", len(trainset.all_ratings()))
print("Surprise test size: ", len(testset))


### 2.1 SVD model

We start with SVD, which is a strong baseline: latent factors + biases
trained with SGD.


In [None]:
svd = SVD(
    n_factors=80,
    n_epochs=25,
    lr_all=0.005,
    reg_all=0.02,
    random_state=RANDOM_STATE,
)

svd.fit(trainset)

svd_predictions = svd.test(testset)

rmse_svd = accuracy.rmse(svd_predictions, verbose=True)
mae_svd = accuracy.mae(svd_predictions, verbose=True)

print(f"SVD RMSE: {rmse_svd:.4f}, MAE: {mae_svd:.4f}")


### 2.2 KNNBaseline model (item-based)

Next we use `KNNBaseline`, which combines:

- Baseline estimates: \( \mu + b_u + b_i \).
- k-NN similarity on items.


In [None]:
sim_options = {
    "name": "pearson_baseline",
    "user_based": False,  # item-based
}

knn_baseline = KNNBaseline(
    k=40,
    min_k=2,
    sim_options=sim_options,
)

knn_baseline.fit(trainset)
knn_predictions = knn_baseline.test(testset)

rmse_knn = accuracy.rmse(knn_predictions, verbose=True)
mae_knn = accuracy.mae(knn_predictions, verbose=True)

print(f"KNNBaseline RMSE: {rmse_knn:.4f}, MAE: {mae_knn:.4f}")


In [None]:
# Visual comparison (RMSE)

rmse_df = pd.DataFrame(
    {
        "model": ["SVD", "KNNBaseline"],
        "rmse": [rmse_svd, rmse_knn],
    }
)

sns.barplot(data=rmse_df, x="model", y="rmse")
plt.title("Surprise models – RMSE")
plt.ylabel("RMSE")
plt.show()

rmse_df.sort_values("rmse")


### 2.3 Ranking metrics from Surprise predictions

Surprise focuses on rating prediction, but we can **post-process** its
predictions to compute ranking metrics.

Strategy:

- For each user in the test set:
  - Consider items in the test set as candidate items.
  - Treat ratings ≥ 4.0 as relevant.
  - Use SVD predictions to rank items.
  - Compute Precision@K / Recall@K / Hit-rate.

This focuses on **generalisation** to seen-in-test items. A more realistic
setting would use all unseen items as candidates.


In [None]:
from collections import defaultdict


def build_user_item_true_pred(
    predictions,
    rel_threshold: float = 4.0,
) -> Dict[int, Dict[str, Dict[int, float]]]:
    """Organise Surprise predictions by user.

    Args:
        predictions: List of Surprise Prediction objects.
        rel_threshold: Threshold for relevance.

    Returns:
        Nested dict: user -> {"true": {item: rating}, "pred": {item: est}}.
    """
    data: Dict[int, Dict[str, Dict[int, float]]] = defaultdict(lambda: {"true": {}, "pred": {}})
    for pred in predictions:
        uid = int(pred.uid)
        iid = int(pred.iid)
        true_r = float(pred.r_ui)
        est_r = float(pred.est)
        data[uid]["true"][iid] = true_r
        data[uid]["pred"][iid] = est_r
    return data


def ranking_metrics_from_predictions(
    predictions,
    k: int = 10,
    rel_threshold: float = 4.0,
) -> Dict[str, float]:
    """Compute hit-rate, precision@K, recall@K from Surprise predictions.

    Args:
        predictions: List of Surprise Prediction objects.
        k: Cutoff for top-K.
        rel_threshold: Relevance threshold on true ratings.

    Returns:
        Dict with metrics.
    """
    data = build_user_item_true_pred(predictions, rel_threshold=rel_threshold)

    hits = 0
    total_users_with_rel = 0
    sum_precision = 0.0
    sum_recall = 0.0
    n_eval_users = 0

    for uid, d in data.items():
        true_dict = d["true"]
        pred_dict = d["pred"]

        relevant_items: Set[int] = {i for i, r in true_dict.items() if r >= rel_threshold}
        if not relevant_items:
            continue

        total_users_with_rel += 1
        n_eval_users += 1

        # Rank items by predicted rating (descending)
        sorted_items = sorted(pred_dict.items(), key=lambda x: x[1], reverse=True)
        top_items = [iid for iid, _ in sorted_items[:k]]
        top_set = set(top_items)

        n_rel_in_top = len(top_set & relevant_items)

        if n_rel_in_top > 0:
            hits += 1

        precision_u = n_rel_in_top / k
        recall_u = n_rel_in_top / len(relevant_items)

        sum_precision += precision_u
        sum_recall += recall_u

    if n_eval_users == 0:
        raise ValueError("No users with relevant items for ranking metrics.")

    hit_rate = hits / total_users_with_rel
    precision_at_k = sum_precision / n_eval_users
    recall_at_k = sum_recall / n_eval_users

    return {
        "hit_rate": hit_rate,
        "precision_at_k": precision_at_k,
        "recall_at_k": recall_at_k,
        "n_eval_users": float(n_eval_users),
    }


svd_rank_metrics_k10 = ranking_metrics_from_predictions(svd_predictions, k=10, rel_threshold=4.0)
knn_rank_metrics_k10 = ranking_metrics_from_predictions(knn_predictions, k=10, rel_threshold=4.0)

svd_rank_metrics_k10, knn_rank_metrics_k10


In [None]:
# Visual comparison of ranking metrics (K=10)

metrics_svd = pd.DataFrame(
    {
        "metric": ["hit_rate", "precision_at_k", "recall_at_k"],
        "value": [
            svd_rank_metrics_k10["hit_rate"],
            svd_rank_metrics_k10["precision_at_k"],
            svd_rank_metrics_k10["recall_at_k"],
        ],
    }
)
metrics_svd["model"] = "SVD"

metrics_knn = pd.DataFrame(
    {
        "metric": ["hit_rate", "precision_at_k", "recall_at_k"],
        "value": [
            knn_rank_metrics_k10["hit_rate"],
            knn_rank_metrics_k10["precision_at_k"],
            knn_rank_metrics_k10["recall_at_k"],
        ],
    }
)
metrics_knn["model"] = "KNNBaseline"

metrics_all = pd.concat([metrics_svd, metrics_knn], axis=0)

sns.barplot(data=metrics_all, x="metric", y="value", hue="model")
plt.ylim(0, 1)
plt.title("Surprise models – ranking metrics (K=10)")
plt.ylabel("Score")
plt.show()


Surprise gives us strong baselines with relatively little code. Next we
shift to **implicit feedback**, which is closer to many production setups.


## 3. LightFM – Implicit Feedback Model

Many modern recommenders are based on **implicit feedback**:

- Views, clicks, watches, purchases.
- No explicit ratings, but we interpret interactions as *positive signals*.

To approximate this, we convert MovieLens ratings into implicit events:

- Consider ratings ≥ 4.0 as **positive interactions**.
- Build a sparse user–item matrix of 1s (interaction) and 0s.

We then train a **LightFM** model with a ranking loss (BPR).


In [None]:
from lightfm import LightFM
from lightfm.data import Dataset as LFMDataset
from lightfm.evaluation import precision_at_k as lfm_precision_at_k, recall_at_k as lfm_recall_at_k

# Threshold for positive interaction
POS_THRESH: float = 4.0

implicit_df = ratings_df.copy()
implicit_df["interaction"] = (implicit_df["rating"] >= POS_THRESH).astype(int)

print(implicit_df["interaction"].value_counts(normalize=True))


In [None]:
# Build LightFM Dataset object

lfm_dataset = LFMDataset()
lfm_dataset.fit(
    users=implicit_df["userId"].unique(),
    items=implicit_df["movieId"].unique(),
)

# Build interactions matrix
(interactions, weights) = lfm_dataset.build_interactions(
    (row.userId, row.movieId) for row in implicit_df.itertuples(index=False) if row.interaction == 1
)

print("Interactions shape:", interactions.shape)
print("Num non-zero interactions:", interactions.getnnz())


We now train a LightFM model with **BPR** (Bayesian Personalised Ranking)
loss, which is commonly used for implicit recommenders.


In [None]:
# Train-test split for implicit data

from lightfm.cross_validation import random_train_test_split

lfm_train, lfm_test = random_train_test_split(
    interactions,
    test_percentage=0.2,
    random_state=np.random.RandomState(RANDOM_STATE),
)

print("Train interactions:", lfm_train.getnnz())
print("Test interactions: ", lfm_test.getnnz())

lfm_model = LightFM(
    no_components=40,
    loss="bpr",
    learning_rate=0.05,
    random_state=RANDOM_STATE,
)

lfm_model.fit(
    lfm_train,
    epochs=25,
    num_threads=4,
    verbose=True,
)


### 3.1 Evaluate LightFM with ranking metrics

LightFM comes with `precision_at_k` and `recall_at_k` evaluators that work
on the interaction matrices.


In [None]:
k_eval = 10

precision_lfm = lfm_precision_at_k(
    lfm_model,
    lfm_test,
    train_interactions=lfm_train,
    k=k_eval,
    num_threads=4,
).mean()

recall_lfm = lfm_recall_at_k(
    lfm_model,
    lfm_test,
    train_interactions=lfm_train,
    k=k_eval,
    num_threads=4,
).mean()

print(f"LightFM (BPR) precision@{k_eval}: {precision_lfm:.4f}")
print(f"LightFM (BPR) recall@{k_eval}:    {recall_lfm:.4f}")


In [None]:
# Compare LightFM ranking metrics with Surprise SVD ranking metrics (approx)

ranking_compare_df = pd.DataFrame(
    {
        "model": ["SVD (explicit)", "LightFM (implicit)"],
        "precision_at_10": [svd_rank_metrics_k10["precision_at_k"], precision_lfm],
        "recall_at_10": [svd_rank_metrics_k10["recall_at_k"], recall_lfm],
    }
)

ranking_compare_df


In [None]:
ranking_melt = ranking_compare_df.melt(id_vars="model", var_name="metric", value_name="value")

sns.barplot(data=ranking_melt, x="metric", y="value", hue="model")
plt.ylim(0, 1)
plt.title("Ranking metrics – explicit vs implicit models")
plt.ylabel("Score")
plt.show()


Comparing explicit SVD and implicit LightFM is not strictly apples-to-apples,
but this gives a sense of how both behave as rankers.


## 4. Visualising item embeddings (LightFM)

Modern recommender models often learn **embedding vectors** for users and
items. We can visualize item embeddings to inspect the learned structure.

We will:

1. Extract item embedding matrix from LightFM.
2. Reduce to 2D via t-SNE or PCA.
3. Plot a sample of movies with labels (if `movies.csv` exists).


In [None]:
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA

# Get item embeddings (latent factors)
item_embeddings = lfm_model.get_item_representations()[1]  # (n_items, no_components)

print("Item embeddings shape:", item_embeddings.shape)

# First reduce with PCA for stability, then t-SNE for 2D
pca = PCA(n_components=min(20, item_embeddings.shape[1]))
item_pca = pca.fit_transform(item_embeddings)

# t-SNE in 2D (be careful with runtime for large n_items; we sample)
max_items_viz = 500
n_items_total = item_pca.shape[0]
indices = np.random.choice(n_items_total, size=min(max_items_viz, n_items_total), replace=False)

item_pca_sample = item_pca[indices]

tsne = TSNE(n_components=2, init="pca", random_state=RANDOM_STATE, perplexity=30.0)
item_tsne = tsne.fit_transform(item_pca_sample)

emb_df = pd.DataFrame(item_tsne, columns=["x", "y"])
emb_df["item_internal_id"] = indices

sns.scatterplot(data=emb_df, x="x", y="y", s=20, alpha=0.7)
plt.title("LightFM item embeddings (t-SNE, sample)")
plt.xlabel("dim 1")
plt.ylabel("dim 2")
plt.show()


If you want to annotate a few points with movie titles, you can match
LightFM's internal item ids to `movieId`s. Below we show how to
approximate that.


In [None]:
# Map internal LightFM item ids back to movieIds

item_id_map, _ = lfm_dataset.mapping()[1]  # dict: movieId -> internal_id

# Reverse mapping
internal_to_movie = {internal: movie for movie, internal in item_id_map.items()}

emb_df["movieId"] = emb_df["item_internal_id"].map(internal_to_movie)

if MOVIES_PATH.exists():
    movies_df = pd.read_csv(MOVIES_PATH)
    emb_with_titles = emb_df.merge(movies_df[["movieId", "title"]], on="movieId", how="left")

    # Show a few random points with titles to inspect
    emb_with_titles.sample(10, random_state=RANDOM_STATE)[["movieId", "title", "x", "y"]]
else:
    print("movies.csv not found; cannot add titles to embeddings.")
    emb_df.head()


You can also manually pick a **cluster** in the embedding plot and inspect
titles within it to see if they are semantically related.


## 5. Putting it together – model comparison summary

We now summarise the main models from this notebook:

- **Surprise SVD** – explicit MF, strong RMSE + decent ranking.
- **Surprise KNNBaseline** – baseline + neighbours.
- **LightFM (BPR)** – implicit MF optimised for ranking.

You can combine ideas:

- Use explicit SVD for rating prediction tasks.
- Use LightFM or other implicit methods for large–scale ranking.
- Mix collaborative and content features (LightFM supports that).


### 5.1 Simple metric table

Below is a small template; adapt as you run the notebook and capture
actual numbers from your execution.

```text
Model                | Type        | RMSE (explicit) | Precision@10 | Recall@10
---------------------|------------|-----------------|-------------|----------
SVD (Surprise)       | explicit    | ~...            | ~...        | ~...
KNNBaseline (Surprise)| explicit   | ~...            | ~...        | ~...
LightFM (BPR)        | implicit    | n/a             | ~...        | ~...
```

The exact values will depend on random seeds and hyperparameters.


## 6. Where to go next

You now have:

- A **manual, from-scratch** recommender notebook (baseline + MF) and
  ranking evaluation.
- A **modern-libraries** notebook using Surprise and LightFM, with
  embedding visualisations.

Natural next steps:

1. **Hyperparameter search**
   - Use Surprise's cross-validation utilities to tune SVD / KNN.
   - Tune LightFM `no_components`, `learning_rate`, epoch count, and loss
     (e.g. WARP vs BPR).

2. **Hybrid models**
   - Add movie genres or tags as item features in LightFM.
   - Add user features (age, location, segments).

3. **Sequence-aware models**
   - For more recent datasets with timestamps, build session-based
     recommenders using RNNs or Transformers (e.g. `RecBole`, `transformers4rec`).

4. **Productionisation**
   - Export embeddings and scores.
   - Build a simple API for top-N recommendations.
   - Integrate with a UI or downstream pipeline.

This notebook completes a more **modern stack** for recommender systems
while keeping the code transparent and editable.
