# Anime Recommender
Item-based Collaborative Filtering | Matrix Factorization | Neural Matrix Factorization (NeuMF) | Two-Tower

✨ Highlights

- Models: Matrix Factorization (MF), NeuMF, Two-Tower, and ItemCF baseline.

- Objectives: MSE (explicit ratings), BPR (implicit ranking), and MSE→BPR fine-tune.

- Reproducible runs: auto-timestamped run dirs, TensorBoard logs, best/last checkpoints.

- Batch evaluation: generate a single summary table (RMSE/MAE + HR@K/NDCG@K) across all variants.

- Feature support: plug item features (item_feats.npy) directly into NeuMF’s MLP tower.

In [1]:
REPO = "https://github.com/HenryNVP/anime_recomender.git"
!git clone $REPO

import os
%cd anime_recomender/

!pip -q install -r requirements.txt

import torch
print("CUDA available:", torch.cuda.is_available())

Cloning into 'anime_recomender'...
remote: Enumerating objects: 267, done.[K
remote: Counting objects: 100% (267/267), done.[K
remote: Compressing objects: 100% (143/143), done.[K
remote: Total 267 (delta 155), reused 208 (delta 106), pack-reused 0 (from 0)[K
Receiving objects: 100% (267/267), 91.62 KiB | 15.27 MiB/s, done.
Resolving deltas: 100% (155/155), done.
/content/anime_recomender
CUDA available: True


In [2]:
import sys, shutil

PROJECT_ROOT = os.getcwd()
DATA_RAW   = os.path.join(PROJECT_ROOT, "data/raw")
os.makedirs(DATA_RAW, exist_ok=True)

from google.colab import files
uploaded = files.upload()   # select anime.csv and rating.csv

shutil.move("anime.csv", "data/raw/anime.csv")
shutil.move("rating.csv", "data/raw/rating.csv")

Saving anime.csv to anime.csv
Saving rating.csv to rating.csv


'data/raw/rating.csv'

## Data Preprocessing

In [3]:
!python scripts/preprocess_data.py --data_dir data/raw --out_dir data/processed --build_item_features --check


Users: 60970 | Items: 8027 | Ratings: 6314631
Splits -> train: 4392123, val: 920565, test: 1001943
Saved cleaned data to: data/processed
=== Anime metadata (cleaned) ===
Total items: 8027
Feature matrix shape: (8027, 57)

=== Missing values in anime.csv ===
Missing genres: 0 (0.00%)
Missing types: 0 (0.00%)
Missing ratings: 0 (0.00%)

=== Numeric fields summary (before scaling) ===

episodes:
count    8027.000000
mean       11.939330
std        38.550412
min         1.000000
25%         1.000000
50%         2.000000
75%        12.000000
max      1787.000000
Name: episodes, dtype: float64

rating:
count    8027.000000
mean        6.804482
std         0.837150
min         2.000000
25%         6.310000
50%         6.830000
75%         7.370000
max         9.370000
Name: rating, dtype: float64

members:
count    8.027000e+03
mean     2.691042e+04
std      6.545656e+04
min      1.290000e+02
25%      1.399000e+03
50%      4.372000e+03
75%      2.049050e+04
max      1.013917e+06
Name: members

## Baseline Item-based Collaborative Filtering

In [4]:
!python -m src.baselines.itemcf.train --data_dir data/processed --out_prefix runs/itemcf --k 50 --shrink 25


[val] {'rmse': 1.2460050582885742, 'mae': 0.9429856538772583}


In [None]:
!python -m src.baselines.itemcf.eval --model_prefix runs/itemcf --splits_dir data/processed/splits --split test --k 10,20

[rating] test | RMSE=1.2673 MAE=0.9606
[ranking] K= 10 | HR=0.6047 NDCG=0.3827 P=0.1140 R=0.1435 MAP=0.0919
[ranking] K= 20 | HR=0.6968 NDCG=0.3976 P=0.0863 R=0.2013 MAP=0.0887


## Matrix Factorization

In [None]:
!python -m src.train --config configs/config_mf.yaml


In [None]:
!python -m src.eval --ckpt runs/neumf/[latest_run]/best.ckpt --config configs/config_mf.yaml


## Neural Matrix Factorization (NeuMF)

In [None]:
!python -m src.train --config configs/config_neumf.yaml


In [None]:
!python -m src.eval --ckpt runs/neumf/[latest_run]/best.ckpt --config configs/config_neumf.yaml


## Batch Training & Evaluation
Run shell scripts to train and evaluate variants: MF, NeuMF, Two Tower, and fine tuning Two Tower with Approximate NDCG loss.

In [5]:
!bash ./scripts/train_variants.sh


[train] (config_twotower) MSE run -> runs/variants/config_twotower/20251015_081335/mse
2025-10-15 08:13:37.940175: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1760516017.960610    3394 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760516017.966658    3394 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760516017.981869    3394 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760516017.981892    3394 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.


In [6]:
!bash ./scripts/evaluate_all.sh

[eval] config_twotower (mse) -> runs/evaluation/20251015_110134/config_twotower__mse.json
2025-10-15 11:01:37.000399: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1760526097.022111   48462 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760526097.028712   48462 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760526097.044835   48462 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760526097.044857   48462 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than onc