In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import concurrent.futures
import cvxpy as cp
import import_ipynb

from content_table_recommendater import load_data, als, baseline_model, nuclear_norm_model_df, spectral_regularization_model, recommend_anime, RMSE

## 0. Helper Functions and set-ups

- this includes data loading and index helplers that finds the user and animation id and their indices

In [8]:
df, pivot = load_data("data/100x100.csv")
data_df = pivot
delta=df[['u_id', 'a_id', 'score']]

user_id_index_map = {u_id: idx for idx, u_id in enumerate(data_df.index)}
anime_id_index_map = {a_id: idx for idx, a_id in enumerate(data_df.columns)}

def user_id_to_index(uid):
    return user_id_index_map.get(uid, -1)  # returns -1 if uid not found

def anime_id_to_index(aid):
    return anime_id_index_map.get(aid, -1) # returns -1 if aid not found

## 1. Baseline Model

The Baseline Model aims to predict missing entries in the matrix by accounting for deviations in user and item behaviors. Specifically, it models the predicted rating as a combination of:

- the global average rating,
- the individual user bias, and
- the individual item bias.

We express the **objective function** as:

$$
\min_{R \in \mathbb{R}^{m \times n}} \sum_{(u,i) \in \Delta} (R_{ui} - A_{ui})^2 + \lambda_1 \left( \sum_{u \in U} (b_u)^2 + \sum_{i \in I} (b_i)^2 \right)
$$

Where:

- $\Delta$ is the index set of observed entries  
- $U$ is the set of all users $\{u_1, u_2, \ldots, u_m\}$  
- $I$ is the set of all animes $\{i_1, i_2, \ldots, i_n\}$  
- $A \in \mathbb{R}^{m \times n}$ is the incomplete rating matrix, where $A_{ui}$ is user $u$'s actual rating of anime $i$  
- $R \in \mathbb{R}^{m \times n}$ is the completed rating matrix, where $R_{ui}$ is user $u$'s predicted rating of anime $i$  
- $b_u$ is the deviation of user $u$'s rating from the average  
- $b_i$ is the deviation of anime $i$'s rating from the average  
- $\lambda_1$ is the regularization parameter

Optionally, we can express the predicted rating as:

\[
R_{ui} = \mu + b_u + b_i
\]

Where $\mu$ is the mean of all known ratings.

---

In summary, the first term of the objective function minimizes the squared error between the observed ratings $A_{ui}$ and the predicted ratings $R_{ui}$. The second term penalizes the sum of squared biases of users and items to avoid overfitting. The regularization parameter $\lambda_1 > 0$ discourages excessively large bias terms.


In [3]:
df, pivot = load_data("data/100x100.csv")

# R_baseline = baseline_model(10000, df)
R_baseline = baseline_model(df, regularization=1.0)

recommendations = recommend_anime(R_baseline, u_id=31, original_df=df, x=5)
print(recommendations)



    a_id  predicted_rating                          title
0     33          9.091289           Kenpuu Denki Berserk
1     19          8.949335                        Monster
2  11741          8.633195           Fate/Zero 2nd Season
3    199          8.617139  Sen to Chihiro no Kamikakushi
4    323          8.495727                Mousou Dairinin


## 2. $\ell_2$-Regularized Matrix Factorization Model (ALS)


Matrix factorization is another method that attempts to find a low-rank approximation of the incomplete rating matrix $A$. Specifically, the goal is to factor $A$ as the product of two low-rank matrices $X$ and $Y$:

$$
R = X Y^T
$$

Where:

- Each row $x_u$ in $X$ is the **latent feature vector** for user $u$
- Each row $y_i$ in $Y$ is the **latent feature vector** for anime item $i$

The predicted rating for user $u$ on anime $i$ is given by the dot product:

$$
R_{ui} = x_u^T y_i
$$

---

We define the **objective function** to minimize the reconstruction error on known ratings while regularizing the latent vectors:

$$
\min_{X, Y} \sum_{(u,i) \in \Delta} (A_{ui} - x_u^T y_i)^2 + \lambda_2 \left( \sum_u \|x_u\|^2 + \sum_i \|y_i\|^2 \right)
$$

This optimization problem is **non-convex**, but we use **Alternating Least Squares (ALS)** to solve it efficiently by alternating between solving for $X$ while fixing $Y$, and vice versa.

Each subproblem reduces to a regularized least squares problem:

$$
A^T A \, \mathbf{x} \approx A^T \mathbf{b}
$$

---

### ALS Update Rules

For each user $u$:

- Let $\Delta_u$ be the set of anime rated by user $u$
- Let $Y_u$ be the submatrix of $Y$ with rows $y_i$ for $i \in \Delta_u$
- Let $A_{u,\Delta_u}$ be the known ratings for those items

Then:

$$
x_u = (Y_u^T Y_u + \lambda_2 I)^{-1} Y_u^T A_{u,\Delta_u}
$$

For each anime $i$:

- Let $\Delta_i$ be the set of users who rated anime $i$
- Let $X_i$ be the submatrix of $X$ with rows $x_u$ for $u \in \Delta_i$
- Let $A_{\Delta_i,i}$ be the known ratings for those users

Then:

$$
y_i = (X_i^T X_i + \lambda_2 I)^{-1} X_i^T A_{\Delta_i,i}
$$

---

By alternating these updates, the model gradually learns low-dimensional embeddings for users and anime, producing a completed matrix $R$ that approximates the original rating matrix $A$.


In [4]:
X, Y = als(df, rank=20, iterations=100)
R_als = X @ Y.T
recommend_anime(R_als, u_id=31, original_df=df, x=5)

Unnamed: 0,a_id,predicted_rating,title
0,9253,9.951663,Steins;Gate
1,7724,9.652122,Shiki
2,2966,9.290651,Ookami to Koushinryou
3,34933,9.119918,Kakegurui
4,2251,9.106991,Baccano!


## 3. Nuclear Norm Minimization Model

We also explore the classic, convex-relaxed **nuclear norm optimization** problem. This model directly completes the matrix by minimizing its nuclear norm, subject to equality constraints on known entries:

$$
\min_{R \in \mathbb{R}^{m \times n}} \left\| R \right\|_\ast \quad \text{subject to } R_{ui} = A_{ui}, \quad \forall (u,i) \in \Delta
$$

Where:

- $\left\| R \right\|_\ast$ is the **nuclear norm** of $R$, equal to the sum of its singular values.
- $\Delta$ is the set of indices $(u, i)$ for which the rating $A_{ui}$ is known.

---

Unlike spectral regularization or ALS models, this approach **does not include a reconstruction error loss term** — it assumes that the known entries are exact, and finds the **lowest-rank matrix** (in convex approximation) consistent with those observations.

This method is particularly useful when data is sparse and **low-rank structure** is expected, such as in recommendation systems.


In [5]:
R_nuclear = nuclear_norm_model_df(df)

recommendations = recommend_anime(R_nuclear, u_id=31, original_df=df, x=5)
print(recommendations)

Nuclear norm model succeeded.
   a_id  predicted_rating                  title
0    19          9.607469                Monster
1  7724          9.343664                  Shiki
2  2966          9.290221  Ookami to Koushinryou
3    66          9.190788         Azumanga Daioh
4  4181          9.013552   Clannad: After Story


## 4. Spectral Regularization Model


This spectral regularization model uses the **nuclear norm** of the recovered matrix \( R \). If \( \sigma_i \) are the singular values of \( R \), then the nuclear norm is the sum of those singular values. The objective of this model is to balance the minimization of the approximation errors in the known entries and the nuclear norm of \( R \):

$$
\min_{R \in \mathbb{R}^{m \times n}} \frac{1}{2} \sum_{(u,i) \in \Delta} (R_{ui} - A_{ui})^2 + \lambda_3 \left\| R \right\|_\ast
$$

Where:

- $ R \in \mathbb{R}^{m \times n} $ is the full constructed matrix  
- $ \left\| R \right\|_\ast = \sum_i \sigma_i(R) $ is the **nuclear norm**, or the sum of the singular values of $ R $  
- $ \lambda_3 > 0 $ is the regularization parameter controlling the weight of the nuclear norm

---

Unlike the $ \ell_2 $-regularized matrix factorization model, this formulation does not factor the matrix into $ X $ and $ Y $; instead, it directly optimizes over the matrix $ R $, penalizing its nuclear norm to encourage **low-rank solutions**.

The nuclear norm serves as a **convex relaxation of the matrix rank function**, making this optimization problem convex and solvable using standard convex programming techniques. Overall, this model seeks to:

- Minimize squared reconstruction error on known ratings
- Promote a low-rank structure in the completed matrix via nuclear norm regularization


In [6]:
R_spectral = spectral_regularization_model(_lambda=1.0, df=df)

recommendations = recommend_anime(R_spectral, u_id=31, original_df=df, x=5)
print(recommendations)



Spectral regularization model succeeded.
   a_id  predicted_rating                          title
0    19          9.509024                        Monster
1  2966          9.230938          Ookami to Koushinryou
2  7724          9.140160                          Shiki
3    66          9.051557                 Azumanga Daioh
4   199          8.852536  Sen to Chihiro no Kamikakushi


## 5. Compare each model using RSME

In [7]:
print("Model Evaluation (RMSE on training set):")
print(f"Baseline RMSE:           {RMSE(R_baseline, df):.4f}")
print(f"ALS RMSE:                {RMSE(R_als, df):.4f}")
print(f"Nuclear Norm RMSE:       {RMSE(R_nuclear, df):.4f}")
print(f"Spectral Reg. RMSE:      {RMSE(R_spectral, df):.4f}")

Model Evaluation (RMSE on training set):
Baseline RMSE:           1.4486
ALS RMSE:                0.0202
Nuclear Norm RMSE:       0.0000
Spectral Reg. RMSE:      0.1627
