# Notebook 4: Matrix Factorization Baseline

En este notebook se implementa un baseline de factorización matricial usando SVD de `Surprise`.
Se evaluará el desempeño sobre los conjuntos de entrenamiento y prueba generados en el notebook 0.

In [1]:
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy
from pathlib import Path

# Carga de datos procesados

Se emplean los CSV generados en el notebook 0:
- `ratings.csv`
- `users.csv`
- `movies.csv`

In [2]:
data_path = Path("../data/processed")

ratings = pd.read_csv(data_path / "ratings.csv")
users = pd.read_csv(data_path / "users.csv")
movies = pd.read_csv(data_path / "movies.csv")

ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


# Preparación de datos para Surprise

In [3]:
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

# Entrenamiento de modelo SVD

In [4]:
svd = SVD(random_state=42)
svd.fit(trainset)
predictions = svd.test(testset)

# Evaluación

In [5]:
rmse = accuracy.rmse(predictions)
mae = accuracy.mae(predictions)
print(f"RMSE: {rmse:.4f}, MAE: {mae:.4f}")

RMSE: 0.8729
MAE:  0.6845
RMSE: 0.8729, MAE: 0.6845


Este modelo servirá como baseline para comparar futuras técnicas de factorización y recomendaciones híbridas.