# Miembros del grupo "Los hiperpar√°metros" üôç‚Äç‚ôÇÔ∏èüôç‚Äç‚ôÄÔ∏èüôç‚Äç‚ôÇÔ∏è

- MIGUEL GONZ√ÅLEZ GARC√çA
- ROSA L√ìPEZ ESCALONA
- JAVIER QUESADA PAJARES

---

# MODELO H√çBRIDO: BNMF + SVD (Public Score: 1.232)
---

# Imports üîß

In [None]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
from surprise import SVD, Dataset, Reader

# Preprocesamiento del dataset üßπ

In [5]:
train_data = pd.read_csv("data/train.csv", sep=",")
test_data = pd.read_csv("data/test.csv", sep=",")

global_mean = train_data['rating'].mean()
print(f"Media global de ratings: {global_mean:.4f}")

Media global de ratings: 7.6047


In [6]:
train_data.head(5)

Unnamed: 0,user,item,rating
0,1,25715,7.0
1,1,25716,10.0
2,5,25851,9.0
3,6,25923,5.0
4,7,25924,6.0


In [7]:
test_data.head(5)

Unnamed: 0,ID,user,item
0,0,8117,268
1,1,10512,24393
2,2,534,1334
3,3,10984,6550
4,4,9093,22128


In [5]:
#train_data, val_split = train_test_split(train_data, test_size=0.2, random_state=42)

## Creaci√≥n de matriz dispersa üß©

In [None]:
unique_users = train_data['user'].unique()
unique_items = train_data['item'].unique()

user_to_index = {user: idx for idx, user in enumerate(unique_users)}
item_to_index = {item: idx for idx, item in enumerate(unique_items)}

num_users = len(unique_users)
num_items = len(unique_items)

print(f"N√∫mero de usuarios: {num_users}")
print(f"N√∫mero de √≠tems: {num_items}")

N√∫mero de usuarios: 73456
N√∫mero de √≠tems: 171171


In [None]:
# Crea la matriz dispersa
rows = train_data['user'].map(user_to_index).values
cols = train_data['item'].map(item_to_index).values
ratings = train_data['rating'].values

R_sparse = csr_matrix((ratings, (rows, cols)), shape=(num_users, num_items))

print(f"Matriz dispersa creada con dimensiones: {R_sparse.shape}")
print("N√∫mero de ratings no nulos en la matriz:", R_sparse.nnz)

Matriz dispersa creada con dimensiones: (73456, 171171)
N√∫mero de ratings no nulos en la matriz: 390351


# Entrenamiento BNMF (con bias) üèãÔ∏è

In [None]:
# Hiperpar√°metros
k = 20  # N√∫mero de factores latentes
lambda_ = 0.02  # Regularizaci√≥n
learning_rate = 0.002  # Tasa de aprendizaje
num_epochs = 50  # N√∫mero de iteraciones

mu = global_mean

# Inicializaci√≥n aleatoria de las matrices latentes U y V
U = np.random.normal(scale=1./k, size=(num_users, k))
V = np.random.normal(scale=1./k, size=(num_items, k))

# Inicializaci√≥n de sesgos
bu = np.zeros(num_users)
bi = np.zeros(num_items)

print(f"Factores latentes inicializados: U ({U.shape}), V ({V.shape})")
print(f"Sesgos inicializados: bu ({bu.shape}), bi ({bi.shape})")


Factores latentes inicializados: U ((73456, 20)), V ((171171, 20))
Sesgos inicializados: bu ((73456,)), bi ((171171,))


In [None]:
def train_bnmf_with_bias(R_sparse, U, V, bu, bi, mu, lambda_, lr, num_epochs):
    rows, cols = R_sparse.nonzero()
    num_ratings = len(rows)

    for epoch in range(num_epochs):
        total_cost = 0
        for idx in range(num_ratings):
            i = rows[idx]  # Usuario
            j = cols[idx]  # √çtem
            r_ij = R_sparse[i, j]

            # Predicci√≥n incluyendo el sesgo
            pred_ij = mu + bu[i] + bi[j] + np.dot(U[i, :], V[j, :])
            error = r_ij - pred_ij

            # Actualizaci√≥n de los sesgos
            bu[i] += lr * (error - lambda_ * bu[i])
            bi[j] += lr * (error - lambda_ * bi[j])

            # Actualizaci√≥n de los factores latentes
            U[i, :] += lr * (error * V[j, :] - lambda_ * U[i, :])
            V[j, :] += lr * (error * U[i, :] - lambda_ * V[j, :])

            # Forzar no-negatividad
            U[i, :] = np.maximum(U[i, :], 0)
            V[j, :] = np.maximum(V[j, :], 0)

            # Costo regularizado
            total_cost += error**2 + (lambda_ / 2) * (
                np.linalg.norm(U[i, :])**2 +
                np.linalg.norm(V[j, :])**2 +
                bu[i]**2 + bi[j]**2
            )

        print(f"√âpoca {epoch + 1}/{num_epochs} - Costo total: {total_cost:.4f}")

    return U, V, bu, bi


In [None]:
print("Iniciando entrenamiento BNMF...")
U, V, bu, bi = train_bnmf_with_bias(R_sparse, U, V, bu, bi, mu, lambda_, learning_rate, num_epochs)
print("Entrenamiento completado.")


Iniciando entrenamiento BNMF...
√âpoca 1/50 - Costo total: 1260646.8514
√âpoca 2/50 - Costo total: 1194846.7935
√âpoca 3/50 - Costo total: 1157408.4361
√âpoca 4/50 - Costo total: 1130087.5177
√âpoca 5/50 - Costo total: 1108180.0322
√âpoca 6/50 - Costo total: 1089699.7672
√âpoca 7/50 - Costo total: 1073602.1324
√âpoca 8/50 - Costo total: 1059263.8631
√âpoca 9/50 - Costo total: 1046281.7793
√âpoca 10/50 - Costo total: 1034380.1185
√âpoca 11/50 - Costo total: 1023361.3034
√âpoca 12/50 - Costo total: 1013078.6270
√âpoca 13/50 - Costo total: 1003419.7170
√âpoca 14/50 - Costo total: 994296.1939
√âpoca 15/50 - Costo total: 985636.8998
√âpoca 16/50 - Costo total: 977383.1568
√âpoca 17/50 - Costo total: 969485.7243
√âpoca 18/50 - Costo total: 961902.1431
√âpoca 19/50 - Costo total: 954595.1160
√âpoca 20/50 - Costo total: 947532.2440
√âpoca 21/50 - Costo total: 940684.4110
√âpoca 22/50 - Costo total: 934025.8708
√âpoca 23/50 - Costo total: 927533.3188
√âpoca 24/50 - Costo total: 921185.8150
√âpo

# Entrenamiento SVD üèãÔ∏è

In [None]:
reader = Reader(rating_scale=(1, 10))
data = Dataset.load_from_df(train_data[['user', 'item', 'rating']], reader)

trainset = data.build_full_trainset()

# Divisi√≥n en conjunto de entrenamiento y validaci√≥n
#trainset, valset = train_test_split(data, test_size=0.2, random_state=42)

# Configuraci√≥n del modelo SVD
svd = SVD(n_factors=20, n_epochs=50, biased=True, lr_all=0.002, reg_all= 0.02, random_state=42)

print("Entrenando el modelo SVD...")
svd.fit(trainset)
print("Modelo SVD entrenado correctamente.")

Entrenando el modelo SVD...
Modelo SVD entrenado correctamente.


# Predicciones y generaci√≥n de CSV üß†

In [None]:
# combinacion media de las pred de BNMF y SVD
def predict_rating(uid, iid, U, V, bu, bi, mu):
    try:
        # Predicci√≥n usando SVD
        pred_svd = svd.predict(uid, iid).est
    except:
        pred_svd = None

    # Predicci√≥n usando PMF con bias
    if uid in user_to_index and iid in item_to_index:
        user_idx = user_to_index[uid]
        item_idx = item_to_index[iid]
        pred_pmf = mu + bu[user_idx] + bi[item_idx] + np.dot(U[user_idx, :], V[item_idx, :])
    else:
        pred_pmf = None

    # Promedio de predicciones si ambas son v√°lidas
    if pred_svd is not None and pred_pmf is not None:
        pred = (pred_svd + pred_pmf) / 2
    elif pred_svd is not None:
        pred = pred_svd
    elif pred_pmf is not None:
        pred = pred_pmf
    else:
        pred = mu  # Si ambos fallan, usar la media global

    return f"{round(max(1, min(10, pred)))}.0"

In [None]:
print("Generando predicciones finales...")
predictions = []

for _, row in test_data.iterrows():
    uid, iid, row_id = row['user'], row['item'], row['ID']
    pred_rating = predict_rating(uid, iid, U, V, bu, bi, mu)
    predictions.append((row_id, pred_rating))

predictions_df = pd.DataFrame(predictions, columns=["ID", "rating"])

output_filename = "predictions_bnmf_svd.csv"
predictions_df.to_csv(output_filename, index=False)
print(f"Archivo '{output_filename}' generado correctamente.")

Generando predicciones finales...
Archivo 'predictions_bnmf_svd.csv' generado correctamente.
