# MovieLens Recommender System – Extended Project

This project focuses on **personalized recommendations** using the
**MovieLens (ml-latest-small)** dataset.

We go beyond a basic recommender and include:

- Solid **EDA with visuals** on users, items and ratings.
- Multiple **baseline models** and visual comparison.
- **Item-based collaborative filtering** with cosine similarity.
- **Matrix factorization** with latent factors (implemented from scratch).
- **Evaluation in both rating space and ranking space**:
  - RMSE.
  - Precision@K, Recall@K, Hit-rate.
- A short section on **modern tools / libraries** in recommender systems.

We assume you have downloaded MovieLens `ml-latest-small` and placed:

```text
data/ml-latest-small/ratings.csv
```

Optionally, if you also have:

```text
data/ml-latest-small/movies.csv
```

we will use it to display movie titles in recommendations.


## 1. Imports and configuration

We use:

- `pandas`, `numpy` – core data handling.
- `matplotlib`, `seaborn` – visual analysis.
- `scikit-learn` – train/test split, cosine similarity, RMSE.

All recommender logic is implemented in **plain Python/NumPy** for clarity.


In [None]:
from __future__ import annotations

from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Iterable, List, Tuple, Set

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import mean_squared_error
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)

RANDOM_STATE: int = 42
np.random.seed(RANDOM_STATE)

RATINGS_PATH: Path = Path("data") / "ml-latest-small" / "ratings.csv"
MOVIES_PATH: Path = Path("data") / "ml-latest-small" / "movies.csv"

if not RATINGS_PATH.exists():
    raise FileNotFoundError(
        f"Ratings file not found at {RATINGS_PATH.resolve()}. "
        "Please download MovieLens (ml-latest-small) and place ratings.csv under data/ml-latest-small/."
    )


## 2. Load and inspect MovieLens ratings

MovieLens ratings have the structure:

- `userId` – integer user identifier.
- `movieId` – integer item identifier.
- `rating` – explicit rating (0.5–5.0).
- `timestamp` – unix time.

We focus on `userId`, `movieId`, and `rating`.


In [None]:
def load_movielens_ratings(path: Path) -> pd.DataFrame:
    """Load MovieLens ratings from CSV.

    Args:
        path: Path to `ratings.csv`.

    Returns:
        DataFrame with ratings.
    """
    if not path.exists():
        raise FileNotFoundError(f"File not found: {path!s}")

    df: pd.DataFrame = pd.read_csv(path)
    if df.empty:
        raise ValueError(f"Loaded ratings DataFrame is empty: {path!s}")
    return df


ratings_raw: pd.DataFrame = load_movielens_ratings(RATINGS_PATH)

print("Shape:", ratings_raw.shape)
ratings_raw.head()


### 2.1 Basic statistics and visuals

We look at:

- Number of users / movies / ratings.
- Sparsity of the user–item matrix.
- Rating distribution.
- Ratings per user and per movie (long–tail behaviour).


In [None]:
n_users: int = ratings_raw["userId"].nunique()
n_movies: int = ratings_raw["movieId"].nunique()

print(f"Users:  {n_users}")
print(f"Movies: {n_movies}")
print(f"Ratings: {len(ratings_raw)}")
print(f"Density: {len(ratings_raw) / (n_users * n_movies):.6f}")

fig, ax = plt.subplots(1, 1)
sns.histplot(ratings_raw["rating"], bins=10, kde=False, ax=ax)
ax.set_title("Rating distribution")
ax.set_xlabel("Rating")
plt.show()


In [None]:
user_counts = ratings_raw.groupby("userId")["rating"].count()
movie_counts = ratings_raw.groupby("movieId")["rating"].count()

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

sns.histplot(user_counts, bins=30, ax=axes[0])
axes[0].set_title("Ratings per user")
axes[0].set_xlabel("# ratings")

sns.histplot(movie_counts, bins=30, ax=axes[1])
axes[1].set_title("Ratings per movie")
axes[1].set_xlabel("# ratings")

plt.tight_layout()
plt.show()

print("Users – quantiles of ratings per user:")
print(user_counts.quantile([0.25, 0.5, 0.9, 0.99]))
print("\nMovies – quantiles of ratings per movie:")
print(movie_counts.quantile([0.25, 0.5, 0.9, 0.99]))


We see a typical **long–tail** pattern: a few very active users and very
popular movies, and many with few interactions.


## 3. Train/test split

We randomly split into train and test:

- Train: used to fit baselines and models.
- Test: used **only for evaluation**.

For simplicity we do not use a temporal split here, but in production a
**time–based split** is often more realistic.


In [None]:
train_df, test_df = train_test_split(
    ratings_raw,
    test_size=0.2,
    random_state=RANDOM_STATE,
)

print("Train size:", train_df.shape[0])
print("Test size: ", test_df.shape[0])


## 4. Baseline models (with comparison plot)

We define several baselines:

1. **Global mean** – always predict the same rating.
2. **Movie mean** – per–movie average rating.
3. **User + movie bias** model:

\begin{align}
\hat r_{ui} = \mu + b_u + b_i
\end{align}

where:

- `μ` is global mean.
- `b_u` is user bias.
- `b_i` is movie bias.

We will compute **RMSE** on the test set and compare visually.


In [None]:
def rmse(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """Compute root mean squared error.

    Args:
        y_true: True ratings.
        y_pred: Predicted ratings.

    Returns:
        RMSE value.
    """
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))


y_true = test_df["rating"].to_numpy()

# 4.1 Global mean

global_mean: float = float(train_df["rating"].mean())
print("Global mean rating:", global_mean)

y_pred_global = np.full_like(y_true, fill_value=global_mean, dtype=float)
rmse_global = rmse(y_true, y_pred_global)
print(f"Global mean RMSE: {rmse_global:.4f}")

# 4.2 Movie mean

movie_mean = train_df.groupby("movieId")["rating"].mean()
movie_mean_map = movie_mean.to_dict()

y_pred_movie = test_df["movieId"].map(movie_mean_map).fillna(global_mean).to_numpy()
rmse_movie = rmse(y_true, y_pred_movie)
print(f"Movie mean RMSE:  {rmse_movie:.4f}")

# 4.3 User + movie bias

mu: float = global_mean
user_bias = train_df.groupby("userId")["rating"].mean() - mu
movie_bias = train_df.groupby("movieId")["rating"].mean() - mu

user_bias_map: Dict[int, float] = user_bias.to_dict()
movie_bias_map: Dict[int, float] = movie_bias.to_dict()


def predict_bias_row(row: pd.Series) -> float:
    """Predict rating using user+movie bias model for a single row.

    Args:
        row: Row with userId and movieId.

    Returns:
        Predicted rating.
    """
    bu: float = float(user_bias_map.get(row["userId"], 0.0))
    bi: float = float(movie_bias_map.get(row["movieId"], 0.0))
    return mu + bu + bi


y_pred_bias = test_df.apply(predict_bias_row, axis=1).to_numpy()
rmse_bias = rmse(y_true, y_pred_bias)
print(f"User+movie bias RMSE: {rmse_bias:.4f}")


In [None]:
# Visual comparison of baselines

baseline_results = pd.DataFrame(
    {
        "model": ["Global mean", "Movie mean", "User+movie bias"],
        "rmse": [rmse_global, rmse_movie, rmse_bias],
    }
)

sns.barplot(data=baseline_results, x="model", y="rmse")
plt.title("Baseline models – RMSE on test set")
plt.ylabel("RMSE")
plt.xticks(rotation=15)
plt.show()

# Visualise distribution of user and movie biases
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

sns.histplot(user_bias, bins=30, ax=axes[0])
axes[0].set_title("User bias distribution")
axes[0].set_xlabel("b_u")

sns.histplot(movie_bias, bins=30, ax=axes[1])
axes[1].set_title("Movie bias distribution")
axes[1].set_xlabel("b_i")

plt.tight_layout()
plt.show()


The user+movie bias model often becomes a strong **baseline**. More complex
models should meaningfully improve over this.


## 5. User–item matrix

We build a **dense** user–item matrix for the training set.

This is useful for:

- Collaborative filtering with similarities.
- Visual checks of sparsity.


In [None]:
# Map ids to consecutive indices

unique_users: np.ndarray = train_df["userId"].unique()
unique_movies: np.ndarray = train_df["movieId"].unique()

user_id_to_index: Dict[int, int] = {uid: idx for idx, uid in enumerate(unique_users)}
movie_id_to_index: Dict[int, int] = {mid: idx for idx, mid in enumerate(unique_movies)}

n_users_train: int = len(unique_users)
n_movies_train: int = len(unique_movies)

print(f"Train users: {n_users_train}, Train movies: {n_movies_train}")

R = np.zeros((n_users_train, n_movies_train), dtype=np.float32)

for row in train_df.itertuples(index=False):
    u_idx = user_id_to_index[row.userId]
    m_idx = movie_id_to_index[row.movieId]
    R[u_idx, m_idx] = row.rating

R[:5, :5]


In [None]:
# Quick visual: small random block of the rating matrix

sample_users = min(40, n_users_train)
sample_movies = min(40, n_movies_train)

sns.heatmap(
    R[:sample_users, :sample_movies],
    cmap="viridis",
    cbar=True,
)
plt.title("User–item rating matrix (small block)")
plt.xlabel("Movies (subset)")
plt.ylabel("Users (subset)")
plt.show()


Most entries are zero (missing ratings). Collaborative filtering leverages
**overlaps** between rows/columns to make predictions.


## 6. Item-based collaborative filtering (k-NN)

We use **cosine similarity** between movie rating vectors.

For a given `(user, movie)` pair:

1. Consider all movies the user has rated.
2. Look at similarities between the target movie and these movies.
3. Take a similarity–weighted average of their ratings.

We restrict evaluation to a sample of the test set for speed.


In [None]:
def compute_item_similarity_matrix(R: np.ndarray) -> np.ndarray:
    """Compute cosine similarity between items.

    Args:
        R: User–item matrix of shape (n_users, n_items).

    Returns:
        Item–item similarity matrix of shape (n_items, n_items).
    """
    item_matrix = R.T  # items x users
    sim = cosine_similarity(item_matrix)
    return sim


item_sim_matrix = compute_item_similarity_matrix(R)
item_sim_matrix.shape


In [None]:
def predict_item_based(
    user_id: int,
    movie_id: int,
    R: np.ndarray,
    item_sim: np.ndarray,
    user_id_to_index: Dict[int, int],
    movie_id_to_index: Dict[int, int],
    k: int = 20,
    default: float | None = None,
) -> float:
    """Predict rating via item-based k-NN collaborative filtering.

    Args:
        user_id: User identifier.
        movie_id: Movie identifier.
        R: User–item rating matrix.
        item_sim: Item–item similarity matrix.
        user_id_to_index: Map from userId to matrix row.
        movie_id_to_index: Map from movieId to matrix column.
        k: Number of neighbours.
        default: Fallback prediction if user/movie unknown.

    Returns:
        Predicted rating.
    """
    if default is None:
        default = float(global_mean)

    u_idx = user_id_to_index.get(user_id)
    m_idx = movie_id_to_index.get(movie_id)

    if u_idx is None or m_idx is None:
        return float(default)

    user_ratings = R[u_idx, :]
    sims = item_sim[m_idx, :]

    rated_mask = user_ratings > 0
    rated_indices = np.where(rated_mask)[0]

    if rated_indices.size == 0:
        return float(default)

    sims_rated = sims[rated_indices]

    k_use = min(k, rated_indices.size)
    top_idx = np.argsort(sims_rated)[-k_use:]

    neighbour_indices = rated_indices[top_idx]
    neighbour_sims = sims_rated[top_idx]
    neighbour_ratings = user_ratings[neighbour_indices]

    if np.all(neighbour_sims == 0):
        return float(neighbour_ratings.mean())

    pred = np.dot(neighbour_sims, neighbour_ratings) / np.sum(np.abs(neighbour_sims))
    return float(pred)


# Evaluate on a subset of test set

test_sample = test_df.sample(n=min(5000, len(test_df)), random_state=RANDOM_STATE)

y_true_sample = test_sample["rating"].to_numpy()

item_cf_preds: List[float] = []
for row in test_sample.itertuples(index=False):
    item_cf_preds.append(
        predict_item_based(
            user_id=row.userId,
            movie_id=row.movieId,
            R=R,
            item_sim=item_sim_matrix,
            user_id_to_index=user_id_to_index,
            movie_id_to_index=movie_id_to_index,
            k=30,
            default=global_mean,
        )
    )

rmse_item_cf = rmse(y_true_sample, np.array(item_cf_preds))
print(f"Item-based CF RMSE (sample): {rmse_item_cf:.4f}")


Item-based CF usually improves over simple baselines and is easy to
explain: *"movies similar to what you rated highly"*.


## 7. Matrix factorization (latent factors, SGD)

We now build a **latent factor** model with biases:

\begin{align}
\hat r_{ui} = \mu + b_u + b_i + p_u^T q_i
\end{align}

Where:

- `μ` – global mean.
- `b_u`, `b_i` – user and item biases.
- `p_u`, `q_i` – k–dimensional latent vectors.

We train using **stochastic gradient descent** on the training ratings.


In [None]:
@dataclass
class MFParams:
    n_factors: int = 40
    n_epochs: int = 20
    lr: float = 0.01
    reg: float = 0.05


class MatrixFactorization:
    """Matrix factorization with biases trained via SGD.

    This class is intentionally simple and readable, not optimised.
    """

    def __init__(self, params: MFParams, random_state: int = 42) -> None:
        self.params = params
        self.random_state = random_state

        self.mu: float | None = None
        self.user_bias: Dict[int, float] = {}
        self.item_bias: Dict[int, float] = {}
        self.P: Dict[int, np.ndarray] = {}
        self.Q: Dict[int, np.ndarray] = {}

    def fit(self, df_ratings: pd.DataFrame) -> None:
        """Fit the model on a ratings DataFrame.

        Args:
            df_ratings: DataFrame with (`userId`, `movieId`, `rating`).
        """
        rng = np.random.default_rng(self.random_state)

        user_ids = df_ratings["userId"].unique()
        item_ids = df_ratings["movieId"].unique()

        self.mu = float(df_ratings["rating"].mean())

        self.user_bias = {u: 0.0 for u in user_ids}
        self.item_bias = {i: 0.0 for i in item_ids}

        k = self.params.n_factors

        self.P = {u: 0.1 * rng.standard_normal(k) for u in user_ids}
        self.Q = {i: 0.1 * rng.standard_normal(k) for i in item_ids}

        lr = self.params.lr
        reg = self.params.reg

        user_arr = df_ratings["userId"].to_numpy()
        item_arr = df_ratings["movieId"].to_numpy()
        rating_arr = df_ratings["rating"].to_numpy()
        n = len(df_ratings)

        for epoch in range(self.params.n_epochs):
            idx = rng.permutation(n)
            se = 0.0

            for t in idx:
                u = int(user_arr[t])
                i = int(item_arr[t])
                r_ui = float(rating_arr[t])

                bu = self.user_bias[u]
                bi = self.item_bias[i]
                pu = self.P[u]
                qi = self.Q[i]

                pred = self.mu + bu + bi + float(np.dot(pu, qi))
                err = r_ui - pred

                se += err * err

                # Bias updates
                self.user_bias[u] += lr * (err - reg * bu)
                self.item_bias[i] += lr * (err - reg * bi)

                # Latent factors updates
                pu_new = pu + lr * (err * qi - reg * pu)
                qi_new = qi + lr * (err * pu - reg * qi)

                self.P[u] = pu_new
                self.Q[i] = qi_new

            rmse_epoch = float(np.sqrt(se / n))
            print(f"Epoch {epoch+1}/{self.params.n_epochs} - train RMSE: {rmse_epoch:.4f}")

    def predict_single(self, user_id: int, movie_id: int, default: float | None = None) -> float:
        """Predict rating for a single user–movie pair.

        Args:
            user_id: User identifier.
            movie_id: Movie identifier.
            default: Fallback rating when user/movie unseen.

        Returns:
            Predicted rating.
        """
        if self.mu is None:
            raise RuntimeError("Model must be fitted before prediction.")

        if default is None:
            default = self.mu

        bu = self.user_bias.get(user_id)
        bi = self.item_bias.get(movie_id)
        pu = self.P.get(user_id)
        qi = self.Q.get(movie_id)

        if bu is None or bi is None or pu is None or qi is None:
            return float(default)

        return float(self.mu + bu + bi + float(np.dot(pu, qi)))

    def predict_df(self, df_pairs: pd.DataFrame) -> np.ndarray:
        """Predict ratings for a set of (userId, movieId) pairs.

        Args:
            df_pairs: DataFrame with `userId` and `movieId`.

        Returns:
            Numpy array with predicted ratings.
        """
        preds: List[float] = []
        for row in df_pairs.itertuples(index=False):
            preds.append(self.predict_single(int(row.userId), int(row.movieId)))
        return np.array(preds, dtype=float)


mf_params = MFParams(
    n_factors=40,
    n_epochs=20,
    lr=0.01,
    reg=0.05,
)

mf_model = MatrixFactorization(params=mf_params, random_state=RANDOM_STATE)

mf_model.fit(train_df)

# Evaluate on full test set
y_pred_mf = mf_model.predict_df(test_df)
rmse_mf = rmse(test_df["rating"].to_numpy(), y_pred_mf)
print(f"Matrix factorization RMSE (test): {rmse_mf:.4f}")


In [None]:
# Visual: predicted vs true ratings

plt.scatter(test_df["rating"], y_pred_mf, alpha=0.2)
plt.xlabel("True rating")
plt.ylabel("Predicted rating (MF)")
plt.title("Matrix factorization – true vs predicted")
plt.plot([0.5, 5], [0.5, 5], linestyle="--", color="black")
plt.xlim(0.5, 5.0)
plt.ylim(0.5, 5.0)
plt.show()

# Error distribution
errors = test_df["rating"].to_numpy() - y_pred_mf
sns.histplot(errors, bins=30)
plt.title("Prediction error distribution (MF)")
plt.xlabel("True - predicted")
plt.show()


We now compare **all main models** on RMSE: baselines, item-based CF
(sampled), and matrix factorization.


In [None]:
rmse_summary = pd.DataFrame(
    {
        "model": [
            "Global mean",
            "Movie mean",
            "User+movie bias",
            "Item-based CF (sample)",
            "Matrix factorization",
        ],
        "rmse": [
            rmse_global,
            rmse_movie,
            rmse_bias,
            rmse_item_cf,
            rmse_mf,
        ],
    }
)

sns.barplot(data=rmse_summary, x="model", y="rmse")
plt.xticks(rotation=20)
plt.title("Model comparison – RMSE")
plt.ylabel("RMSE (lower is better)")
plt.show()

rmse_summary.sort_values("rmse")


RMSE gives a sense of **absolute rating prediction quality**, but in
recommenders we often care more about **ranking**: which items are in
the top–N list.


## 8. Ranking metrics: Precision@K, Recall@K, Hit-rate

We evaluate the MF model as a **ranker**:

For each user in the test set:

1. Treat items with rating ≥ 4.0 as **relevant**.
2. Generate a top–K list of recommendations among unseen items.
3. Compute:
   - Hit-rate: fraction of users with at least one relevant item in top–K.
   - Precision@K: average proportion of recommended items that are relevant.
   - Recall@K: average proportion of relevant items that appear in top–K.

We do this on a subset of users for speed.


In [None]:
def get_user_seen_movies(df_ratings: pd.DataFrame, user_id: int) -> Set[int]:
    """Return set of movieIds rated by a user.

    Args:
        df_ratings: Ratings DataFrame.
        user_id: User identifier.

    Returns:
        Set of movieIds.
    """
    return set(df_ratings.loc[df_ratings["userId"] == user_id, "movieId"].unique())


def get_user_relevant_movies(df_ratings: pd.DataFrame, user_id: int, threshold: float = 4.0) -> Set[int]:
    """Return set of relevant (liked) movieIds for user in a DataFrame.

    Args:
        df_ratings: Ratings DataFrame.
        user_id: User identifier.
        threshold: Rating threshold for relevance.

    Returns:
        Set of relevant movieIds.
    """
    mask = (df_ratings["userId"] == user_id) & (df_ratings["rating"] >= threshold)
    return set(df_ratings.loc[mask, "movieId"].unique())


def recommend_top_n(
    user_id: int,
    mf_model: MatrixFactorization,
    all_movie_ids: Iterable[int],
    train_ratings: pd.DataFrame,
    n: int = 10,
) -> pd.DataFrame:
    """Generate top–N recommendations for a user via MF.

    Args:
        user_id: User identifier.
        mf_model: Fitted MatrixFactorization model.
        all_movie_ids: Iterable of candidate movieIds.
        train_ratings: Training ratings DataFrame.
        n: Number of recommendations.

    Returns:
        DataFrame with movieId, predicted_rating.
    """
    seen = get_user_seen_movies(train_ratings, user_id)

    candidates: List[int] = [mid for mid in all_movie_ids if mid not in seen]
    if not candidates:
        raise ValueError(f"User {user_id} has rated all movies; no candidates.")

    preds: List[Tuple[int, float]] = []
    for mid in candidates:
        score = mf_model.predict_single(user_id, mid)
        preds.append((mid, score))

    preds_sorted = sorted(preds, key=lambda x: x[1], reverse=True)[:n]
    return pd.DataFrame(preds_sorted, columns=["movieId", "predicted_rating"])


all_movie_ids: List[int] = sorted(ratings_raw["movieId"].unique())


In [None]:
def evaluate_ranking_mf(
    mf_model: MatrixFactorization,
    train_df: pd.DataFrame,
    test_df: pd.DataFrame,
    all_movie_ids: List[int],
    k: int = 10,
    rel_threshold: float = 4.0,
    max_users: int = 200,
) -> Dict[str, float]:
    """Evaluate MF ranking via hit-rate, precision@K, recall@K.

    Args:
        mf_model: Fitted MatrixFactorization.
        train_df: Train ratings.
        test_df: Test ratings.
        all_movie_ids: List of movieIds.
        k: Cutoff for top-K.
        rel_threshold: Relevance rating threshold.
        max_users: Max number of users to sample for evaluation.

    Returns:
        Dictionary with metrics.
    """
    users_in_test = test_df["userId"].unique()
    rng = np.random.default_rng(RANDOM_STATE)
    if len(users_in_test) > max_users:
        eval_users = rng.choice(users_in_test, size=max_users, replace=False)
    else:
        eval_users = users_in_test

    hits = 0
    total_users_with_relevant = 0
    sum_precision = 0.0
    sum_recall = 0.0
    n_eval_users = 0

    for u in eval_users:
        relevant = get_user_relevant_movies(test_df, u, threshold=rel_threshold)
        if not relevant:
            # Skip users with no relevant items in test
            continue

        n_eval_users += 1
        total_users_with_relevant += 1

        recs_df = recommend_top_n(
            user_id=int(u),
            mf_model=mf_model,
            all_movie_ids=all_movie_ids,
            train_ratings=train_df,
            n=k,
        )

        recs = list(recs_df["movieId"])
        recs_set = set(recs)

        n_relevant_in_recs = len(recs_set & relevant)

        if n_relevant_in_recs > 0:
            hits += 1

        precision_u = n_relevant_in_recs / k
        recall_u = n_relevant_in_recs / len(relevant)

        sum_precision += precision_u
        sum_recall += recall_u

    if n_eval_users == 0:
        raise ValueError("No users with relevant items in test for evaluation.")

    hit_rate = hits / total_users_with_relevant
    precision_at_k = sum_precision / n_eval_users
    recall_at_k = sum_recall / n_eval_users

    return {
        "hit_rate": hit_rate,
        "precision_at_k": precision_at_k,
        "recall_at_k": recall_at_k,
        "n_eval_users": float(n_eval_users),
    }


ranking_metrics_k10 = evaluate_ranking_mf(
    mf_model=mf_model,
    train_df=train_df,
    test_df=test_df,
    all_movie_ids=all_movie_ids,
    k=10,
    rel_threshold=4.0,
    max_users=200,
)

ranking_metrics_k10


In [None]:
# Visualise ranking metrics

metrics_df = pd.DataFrame(
    {
        "metric": ["hit_rate", "precision_at_k", "recall_at_k"],
        "value": [
            ranking_metrics_k10["hit_rate"],
            ranking_metrics_k10["precision_at_k"],
            ranking_metrics_k10["recall_at_k"],
        ],
    }
)

sns.barplot(data=metrics_df, x="metric", y="value")
plt.ylim(0, 1)
plt.title("MF ranking performance (K=10)")
plt.ylabel("Score")
plt.show()


These ranking metrics better reflect how a user experiences the system:
**are good movies appearing near the top of their list?**


## 9. Example recommendations with titles

We can inspect recommendations for a random user to see if they look
reasonable.


In [None]:
sample_user_id: int = int(train_df["userId"].sample(1, random_state=RANDOM_STATE).iloc[0])
print("Sample user:", sample_user_id)

user_seen = get_user_seen_movies(train_df, sample_user_id)
print("Movies rated in train:", len(user_seen))

recs_df = recommend_top_n(
    user_id=sample_user_id,
    mf_model=mf_model,
    all_movie_ids=all_movie_ids,
    train_ratings=train_df,
    n=10,
)

if MOVIES_PATH.exists():
    movies_df = pd.read_csv(MOVIES_PATH)
    recs_with_titles = recs_df.merge(movies_df, on="movieId", how="left")
    recs_with_titles[["movieId", "title", "predicted_rating"]]
else:
    print("movies.csv not found; showing IDs only.")
    recs_df


You can manually inspect whether these recommended films match the
user's taste, based on the movies they rated highly in the training set.


## 10. Modern tools and extensions for recommender systems

In this notebook, we implemented core models **manually** to keep things
transparent. In practice, many teams rely on specialised libraries and
newer techniques.

### 10.1 Python libraries commonly used

You can explore these outside this notebook (code sketch below):

- **`implicit`** – matrix factorization and nearest-neighbour models for
  implicit feedback (clicks, views, purchases).
- **`lightfm`** – hybrid (content + collaborative) models with various
  losses (BPR, WARP) for ranking.
- **`scikit-surprise`** – classic algorithms (SVD, KNNBaseline, etc.) with
  built-in evaluation tools.
- **`RecBole`, `Spotlight`** – more research-oriented frameworks for
  deep learning recommenders.

Example (for your environment, not run here):

```python
# !pip install scikit-surprise
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split, accuracy

reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(ratings_raw[["userId", "movieId", "rating"]], reader)
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

algo = SVD(n_factors=50, reg_all=0.02, lr_all=0.005)
algo.fit(trainset)

predictions = algo.test(testset)
rmse_surprise = accuracy.rmse(predictions)
```

### 10.2 Modelling directions

Modern recommender systems often add:

- **Implicit feedback**:
  - Treat interactions as positive events (view, click, watch) instead of
    relying only on explicit ratings.
  - Use models like BPR, WARP, or implicit MF.
- **Sequence-aware / session-based models**:
  - Use RNNs, Transformers or attention to model user sessions as
    sequences of events.
- **Context-aware models**:
  - Include time, device, location, or other context as features.
- **Bandits / online learning**:
  - Contextual bandits for balancing exploration and exploitation.

### 10.3 Production considerations

For deployment, you typically add:

- **Feature store** for user/item features.
- **Candidate generation** + **ranking** stages.
- **Real-time serving API** (e.g. via FastAPI).
- **Monitoring** and evaluation pipelines (A/B testing, offline metrics).


## 11. Summary

In this extended recommender project we:

1. Performed EDA on MovieLens with several **visuals**.
2. Built and compared **baseline models**:
   - Global mean, movie mean, user+movie bias.
3. Constructed a **user–item rating matrix** and visualised sparsity.
4. Implemented **item-based collaborative filtering**.
5. Implemented **matrix factorization** with SGD and biases.
6. Evaluated with **RMSE** and several **ranking metrics**.
7. Generated **top–N recommendations** and, when titles are available,
   printed them for inspection.
8. Discussed **modern tools and research directions** in recommender systems.

This notebook should give you a complete, end-to-end template for
recommendation projects, while still being compact enough to adapt to your
own datasets or to plug into a larger system.
