# Appendix: Algorithm Comparison and Exploratory Experiments

This notebook contains exploratory experiments related to collaborative filtering.
The purpose of this appendix is to compare alternative algorithms and deepen the understanding
of recommendation models. These experiments are not part of the main production pipeline.


## Purpose of this appendix

The main project uses SVD (matrix factorization) as a production-ready baseline.
In this appendix, additional algorithms are evaluated to better understand their behavior
and relative performance on the MovieLens dataset.


In [1]:
import pandas as pd
from surprise import Dataset, Reader
from surprise import SVD, SVDpp, NMF
from surprise.model_selection import cross_validate, GridSearchCV

## Dataset

The same MovieLens ratings dataset is used for all experiments in this appendix
to ensure a fair comparison between algorithms.


In [2]:
ratings = pd.read_csv(
    "ml-latest-small/ratings.csv",
    usecols=["userId", "movieId", "rating"]
)

reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(
    ratings[["userId", "movieId", "rating"]],
    reader
)

## Comparison methodology

- All algorithms are evaluated using cross-validation.
- RMSE is used as the primary evaluation metric.
- SVD hyperparameters are tuned using GridSearchCV.
- SVD++ and NMF are evaluated using default parameters to keep the comparison lightweight.


In [3]:
param_grid = {
    'n_epochs': [10, 20],
    'lr_all': [0.002, 0.005],
    'n_factors': [50, 100],
    'reg_all': [0.02, 0.05]
}

gs_svd = GridSearchCV(
    SVD,
    param_grid,
    measures=["rmse"],
    cv=3
)

gs_svd.fit(data)

best_params_SVD = gs_svd.best_params["rmse"]
best_rmse_SVD = gs_svd.best_score["rmse"]

print("Best SVD RMSE:", best_rmse_SVD)
print("Best SVD parameters:", best_params_SVD)

Best SVD RMSE: 0.8751126329994019
Best SVD parameters: {'n_epochs': 20, 'lr_all': 0.005, 'n_factors': 100, 'reg_all': 0.05}


The best hyperparameters found for the SVD model are used in the comparison below.
Other algorithms are evaluated with their default configurations.

In [4]:
algorithms = {
    "SVD": SVD(**best_params_SVD),  # tuned via cross-validation
    "SVD++": SVDpp(),               # default parameters
    "NMF": NMF()                    # default parameters
}

In [5]:
results = {}

for name, algo in algorithms.items():
    cv_results = cross_validate(
        algo,
        data,
        measures=["RMSE"],
        cv=3,
        verbose=False
    )
    results[name] = cv_results["test_rmse"].mean()

results

{'SVD': 0.8755747948666762,
 'SVD++': 0.8696960110342767,
 'NMF': 0.936315654336482}

In [6]:
pd.DataFrame.from_dict(
    results,
    orient="index",
    columns=["RMSE"]
).sort_values("RMSE")

Unnamed: 0,RMSE
SVD++,0.869696
SVD,0.875575
NMF,0.936316


## Results interpretation

The comparison shows that SVD provides a strong and stable baseline for collaborative filtering.
More complex algorithms such as SVD++ do not necessarily lead to significant improvements
in this setting, while requiring higher computational cost.


## Conclusion

This appendix demonstrates exploratory evaluation of alternative collaborative filtering algorithms.
Based on the results and model simplicity, SVD was selected as the main approach
for the production-oriented recommendation pipeline.

All results shown in this notebook were obtained by executing the notebook end-to-end.
They are included to ensure reproducibility and transparency of the exploratory experiments.
