# Práctico Multi-armed bandits para recomendación

Diplomado en Machine Learning Aplicado - UC

**Profesor:** Vicente Dominguez <br/>

**Nombre del alumno:** Sebastián Latorre <br/>


## Reinforcement Learning

Un agente de RL busca tomar acciones que logren maximizar la ganancia acumulativa.

![RL setup](https://github.com/bamine/recsys-summer-school/raw/12e57cc4fd1cb26164d2beebf3ca29ebe2eab960/notebooks/images/rl-setup.png)


## Exploration vs. Exploitation

Se busca encontrar un balance entre la exploration (decidir tomar una acción para ganar conocimiento) y exploitation (decidir la acción que se calcula que tendrá la mejor ganancia).

![texto alternativo](https://miro.medium.com/max/1400/1*_5dltx4BcI8rRmCK2Sq_kw.png)

Compararemos el desempeño en cuanto a posibles recompensas de las siguientes políticas de aprendizaje:

- Epsilon Greedy
- Random
- Upper Confidence Bound (UCB1)

Evaluaremos los siguientes escenarios:
- Simulación de multiarmed-bandits incorporando información contextual
- Simulación de multiarmed-bandits sin información contextual



## Importar paquetes necesarios:

In [234]:
!pip install mabwiser
!pip install category_encoders
# -*- coding: utf-8 -*-

import json
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy
import random
from mabwiser.simulator import Simulator
from time import time
import numpy as np
import warnings
import category_encoders as ce
from sklearn.preprocessing import MinMaxScaler

warnings.filterwarnings('ignore')

from tqdm import tqdm
from functools import partial
tqdm = partial(tqdm, position=0, leave=True)




## Cargar datos

In [235]:
!wget http://jmcauley.ucsd.edu/cse190/data/beer/beer_50000.json

--2024-08-20 04:07:03--  http://jmcauley.ucsd.edu/cse190/data/beer/beer_50000.json
Resolving jmcauley.ucsd.edu (jmcauley.ucsd.edu)... 137.110.160.73
Connecting to jmcauley.ucsd.edu (jmcauley.ucsd.edu)|137.110.160.73|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61156124 (58M) [application/json]
Saving to: ‘beer_50000.json.11’


2024-08-20 04:07:04 (71.8 MB/s) - ‘beer_50000.json.11’ saved [61156124/61156124]



In [236]:
appearances = []
tastes = []
names = []
ratings = []
users = []
items = []
aromas = []

days = []
months = []
years = []

with open('beer_50000.json') as f:

  for line in f:
    l = line.replace('\n' , '')
    formated_l = eval(l)

    appearance = formated_l['review/appearance']
    taste = formated_l['review/taste']
    name = formated_l['beer/name']
    rating = formated_l['review/overall']
    user_id = formated_l['beer/brewerId']
    item_id = formated_l['beer/beerId']
    aroma = formated_l['review/aroma']

    day = formated_l['review/timeStruct']['mday']
    month = formated_l['review/timeStruct']['mon']
    year = formated_l['review/timeStruct']['year']

    appearances.append(appearance)
    tastes.append(taste)
    names.append(name)
    ratings.append(rating)
    users.append(user_id)
    items.append(item_id)
    aromas.append(aroma)

    days.append(day)
    months.append(month)
    years.append(year)


df = pd.DataFrame()

df['user_id'] = users
df['item_id'] = items
df['rating'] = ratings
df['aroma'] = aromas
df['taste'] = tastes
df['appearance'] = appearances
df['day'] = days
df['month'] = months
df['year'] = years

df

Unnamed: 0,user_id,item_id,rating,aroma,taste,appearance,day,month,year
0,10325,47986,1.5,2.0,1.5,2.5,16,2,2009
1,10325,48213,3.0,2.5,3.0,3.0,1,3,2009
2,10325,48215,3.0,2.5,3.0,3.0,1,3,2009
3,10325,47969,3.0,3.0,3.0,3.5,15,2,2009
4,1075,64883,4.0,4.5,4.5,4.0,30,12,2010
...,...,...,...,...,...,...,...,...,...
49995,394,20539,4.0,4.0,4.0,4.0,4,12,2007
49996,394,20539,4.0,4.0,4.0,3.5,30,11,2007
49997,394,20539,3.5,3.5,4.5,4.0,28,11,2007
49998,394,20539,4.0,4.0,4.5,4.0,27,11,2007


## procesamiento de datos
- MinMax Scaler de ratings , aroma, taste y appearance.
- Target Encoding. https://contrib.scikit-learn.org/category_encoders/targetencoder.html

Target encoding:
- Calcula el rating promedio de la categoria (ej. aroma) y la reemplaza por este valor.
- Se sugiere normalizar el rating entre 0 y 1 para que la variable categorica tenga ahora valores continuos.
- Incopora parametro `smoothing` que quita del promedio aquellas categorias con una frecuencia menor a un valor entregado (ej. 10).   


In [237]:
# MinMax Scaler a datos entre 0 y 1
scaler = MinMaxScaler()
df['rating_scaled'] = scaler.fit_transform(df['rating'].values.reshape(-1,1))
df['aroma_scaled'] = scaler.fit_transform(df['aroma'].values.reshape(-1,1))
df['taste_scaled'] = scaler.fit_transform(df['taste'].values.reshape(-1,1))
df['appearance_scaled'] = scaler.fit_transform(df['appearance'].values.reshape(-1,1))

# Crear target encoder
encoder = ce.TargetEncoder(smoothing=100)
df['user_id_encoded'] = encoder.fit_transform(df['user_id'], df['rating_scaled'])

# Considerar solo items (cervezas) consumidas más de N veces para reducir espacio de busqueda
df_filtered = df.groupby('item_id').filter(lambda x: len(x) > 100)

# Asignar un correlativo al item_id comenzando desde 1
df_filtered['action'] = pd.factorize(df_filtered['item_id'])[0] + 1

df_filtered

Unnamed: 0,user_id,item_id,rating,aroma,taste,appearance,day,month,year,rating_scaled,aroma_scaled,taste_scaled,appearance_scaled,user_id_encoded,action
59,1075,25414,4.0,3.5,4.0,3.5,26,8,2009,0.8,0.625,0.750,0.7,0.790856,1
60,1075,25414,2.5,3.0,2.5,3.5,22,8,2009,0.5,0.500,0.375,0.7,0.790856,1
61,1075,25414,4.0,3.5,3.5,4.0,10,8,2009,0.8,0.625,0.625,0.8,0.790856,1
62,1075,25414,4.5,3.5,4.0,4.0,9,8,2009,0.9,0.625,0.750,0.8,0.790856,1
63,1075,25414,4.5,3.5,4.0,4.0,6,8,2009,0.9,0.625,0.750,0.8,0.790856,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,394,20539,4.0,4.0,4.0,4.0,4,12,2007,0.8,0.750,0.750,0.8,0.783817,83
49996,394,20539,4.0,4.0,4.0,3.5,30,11,2007,0.8,0.750,0.750,0.7,0.783817,83
49997,394,20539,3.5,3.5,4.5,4.0,28,11,2007,0.7,0.625,0.875,0.8,0.783817,83
49998,394,20539,4.0,4.0,4.5,4.0,27,11,2007,0.8,0.750,0.875,0.8,0.783817,83


In [238]:
print('Acciones posibles (número de items únicos): ', max(list(df_filtered.action)))

Acciones posibles (número de items únicos):  83


## partición train y test por fecha

In [239]:
# Convertir año, mes y dia a un objeto fecha
df_filtered['date'] = pd.to_datetime(df_filtered[['year', 'month', 'day']])

# ordenar por fecha
df_filtered = df_filtered.sort_values('date')

# split 10% test dejar las ultimas fechas para testear
split_point = int(len(df_filtered) * 0.9)

# split en train and test
df_train = df_filtered.iloc[:split_point]
df_test = df_filtered.iloc[split_point:]



## Determinamos features, acciones y recompenzas para el entrenamiento

In [240]:
import numpy as np

feature_columns = ['aroma_scaled', 'taste_scaled', 'appearance_scaled', 'user_id_encoded']

features = df_train[feature_columns].to_numpy()
actions = np.array(df_train.action)
rewards = np.array([x for x in df_train.rating_scaled])


## Instanciamos el modelo Multiarmed-Bandits

Necesitamos:
- Acciones posibles (`arms`), en este caso catalogo de items unicos.
- Algoritmo de Reinforcement Learning o `learning policy` (ej. Epsilon Greedy)
- Política de vecindad (`neighborhood_policy`) para incluir información contextual de vecinos cercanos con caracteristicas similares para reducir espacio de busqueda.

La politica de vecindad o `neighborhood_policy` es necesaria porque:
- Se utiliza en escenarios que requieren la incorporación de información contextual porque se complejiza el problema.
- La idea subyacente es que contextos similares probablemente tendrán acciones óptimas similares.
- Este algoritmo no trata cada contexto como un problema completamente distinto. En cambio, aprovecha la información adquirida de contextos que son similares.
- Al incorporalo, el algoritmo de Reinforcement Learning puede tomar decisiones más informadas y mejoradas.

In [241]:
# actiones posibles (cervezas a escoger del catalogo)
possible_actions = list(range(1, df_train.action.nunique() +1 ))

greedy = MAB(arms=possible_actions,learning_policy=LearningPolicy.EpsilonGreedy(epsilon=0.7), neighborhood_policy=NeighborhoodPolicy.KNearest(10))

In [242]:
X = features
y = rewards
decisions = df_train.action

In [243]:
greedy.fit(decisions= decisions, rewards= y , contexts=X)

## Evaluación


In [244]:
feature_columns = ['aroma_scaled', 'taste_scaled', 'appearance_scaled', 'user_id_encoded']

X_test = df_test[feature_columns].to_numpy()

prediction = greedy.predict(X_test)

scores = greedy.predict_expectations(X_test)

df_test['predicted_action'] = prediction

df_test['score'] = [ y[x] for x,y in zip(df_test['predicted_action'],scores)]

df_result = df_test[['user_id', 'action', 'rating_scaled' ,'predicted_action', 'score']]

df_result

Unnamed: 0,user_id,action,rating_scaled,predicted_action,score
15671,1199,31,0.7,32,0.324653
15670,1199,31,0.9,21,0.353031
29225,1199,46,0.6,32,0.650000
15673,1199,31,0.8,70,0.572061
17114,1199,32,1.0,36,0.866667
...,...,...,...,...,...
429,1075,2,0.9,6,0.303639
21084,1199,36,1.0,56,0.201260
21085,1199,36,1.0,11,0.000000
42206,263,68,0.8,18,0.853037


In [245]:
def dcg_at_k(r, k):
    r = np.asfarray(r)[:k]
    return np.sum(r / np.log2(np.arange(2, r.size + 2)))

def ndcg_at_k(r, k):
    dcg_max = dcg_at_k(sorted(r, reverse=True), k)
    if not dcg_max:
        return 0.
    return dcg_at_k(r, k) / dcg_max

def recall_at_k(r, k, n_rel):
    r = np.asarray(r)[:k] != 0
    return np.sum(r) / n_rel

def calculate_metrics(df):
    unique_users = df['user_id'].unique()

    ndcg_5, ndcg_10, recall_5 , recall_10 = [], [], [], []

    for user in unique_users:
        user_df = df[df['user_id'] == user]

        true_items = list(user_df['action'])
        predicted_items = list(user_df.sort_values(by='score', ascending=False)['predicted_action'])

        binary_true = [1 if item in true_items else 0 for item in predicted_items]
        binary_predicted = [1 if item in true_items else 0 for item in true_items]  # all are relevant for this user

        # NDCG
        ndcg_5.append(ndcg_at_k(binary_true, 5))
        ndcg_10.append(ndcg_at_k(binary_true, 10))

        # Recall
        recall_5.append(recall_at_k(binary_predicted, 5, len(true_items)))
        recall_10.append(recall_at_k(binary_predicted, 10, len(true_items)))

    return np.mean(ndcg_5) , np.mean(ndcg_10), np.mean(recall_5), np.mean(recall_10)

In [246]:
ndcg5, ndcg10, r5, r10 = calculate_metrics(df_result)

print("Average NDCG@5: ", ndcg5)
print("Average NDCG@10: ", ndcg10)
print("Average Recall@5: ", r5)
print("Average Recall@10: ", r10)

Average NDCG@5:  0.4598135191349038
Average NDCG@10:  0.5262422400396385
Average Recall@5:  0.3144381953606695
Average Recall@10:  0.40665416849911684


# Actividad
Con el mismo conjunto de datos probar:
1. `LinUCB`   
2. `UCB1`

Reportar los resultados y comentar si mejoran respecto a `EpsilonGreedy` mostrado en clases.

La elección de metaparámetros y la política de vecindad (`neighborhood_policy`) es de libre elección.

Puntaje:
- Código (3 ptos)
- Comentarios y discusión en una celda de texto (3 ptos)

Documentación:
https://fidelity.github.io/mabwiser/examples.html


### EpsilonGreedy

In [247]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

def epsilon_greedy(df, epsilon=0.1, num_iterations=1000):
    # Inicialización
    num_items = df['item_id'].nunique()
    item_counts = np.zeros(num_items)
    item_values = np.zeros(num_items)

    # Datos para entrenamiento y prueba
    X_train, X_test, y_train, y_test = train_test_split(
        df[['aroma_scaled', 'taste_scaled', 'appearance_scaled']].values,
        df['rating_scaled'].values,
        test_size=0.2,
        random_state=42
    )

    # Entrenamiento
    for _ in range(num_iterations):
        if np.random.rand() < epsilon:
            # Exploración: seleccionar un item aleatorio
            item = np.random.choice(num_items)
        else:
            # Explotación: seleccionar el item con el mayor valor estimado
            item = np.argmax(item_values)

        # Simular la recepción de una recompensa
        reward = np.random.normal(loc=y_train.mean(), scale=y_train.std())

        # Actualizar valores y conteos
        item_counts[item] += 1
        item_values[item] += (reward - item_values[item]) / item_counts[item]

    # Predicciones
    predictions_train = np.array([item_values[np.argmax(item_counts)] for _ in range(X_train.shape[0])])
    predictions_test = np.array([item_values[np.argmax(item_counts)] for _ in range(X_test.shape[0])])

    mse_train = mean_squared_error(y_train, predictions_train)
    mse_test = mean_squared_error(y_test, predictions_test)

    return mse_train, mse_test

# Configuración de los datos
mse_train_eps, mse_test_eps = epsilon_greedy(df_filtered)
print(f"EpsilonGreedy - MSE Train: {mse_train_eps}")
print(f"EpsilonGreedy - MSE Test: {mse_test_eps}")


EpsilonGreedy - MSE Train: 0.01705017823835765
EpsilonGreedy - MSE Test: 0.01651976023150178


### UCB1

In [249]:
import numpy as np

class UCB1:
    def __init__(self, num_items):
        self.num_items = num_items
        self.counts = np.zeros(num_items)
        self.values = np.zeros(num_items)
        self.total_count = 0

    def select_item(self):
        if self.total_count < self.num_items:
            return self.total_count
        ucb_values = self.values + np.sqrt(2 * np.log(self.total_count) / self.counts)
        return np.argmax(ucb_values)

    def update(self, item, reward):
        self.counts[item] += 1
        self.total_count += 1
        self.values[item] += (reward - self.values[item]) / self.counts[item]

def ucb1(df, num_iterations=1000):
    num_items = df['item_id'].nunique()
    ucb1_model = UCB1(num_items)

    # Datos para entrenamiento y prueba
    X_train, X_test, y_train, y_test = train_test_split(
        df[['aroma_scaled', 'taste_scaled', 'appearance_scaled']].values,
        df['rating_scaled'].values,
        test_size=0.2,
        random_state=42
    )

    # Entrenamiento
    for _ in range(num_iterations):
        item = ucb1_model.select_item()
        reward = np.random.normal(loc=y_train.mean(), scale=y_train.std())
        ucb1_model.update(item, reward)

    # Predicciones
    predictions_train = np.array([ucb1_model.values[np.argmax(ucb1_model.counts)] for _ in range(X_train.shape[0])])
    predictions_test = np.array([ucb1_model.values[np.argmax(ucb1_model.counts)] for _ in range(X_test.shape[0])])

    mse_train = mean_squared_error(y_train, predictions_train)
    mse_test = mean_squared_error(y_test, predictions_test)

    return mse_train, mse_test

# Configuración de los datos
mse_train_ucb1, mse_test_ucb1 = ucb1(df_filtered)
print(f"UCB1 - MSE Train: {mse_train_ucb1}")
print(f"UCB1 - MSE Test: {mse_test_ucb1}")


UCB1 - MSE Train: 0.021284071378814558
UCB1 - MSE Test: 0.02074316614147233


### LinUCB

In [250]:
!pip install mabwiser
!pip install category_encoders
# -*- coding: utf-8 -*-

import json
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy
import random
from mabwiser.simulator import Simulator
from time import time
import numpy as np
import warnings
import category_encoders as ce
from sklearn.preprocessing import MinMaxScaler

warnings.filterwarnings('ignore')

from tqdm import tqdm
from functools import partial
tqdm = partial(tqdm, position=0, leave=True)



In [251]:
!wget http://jmcauley.ucsd.edu/cse190/data/beer/beer_50000.json

--2024-08-20 04:11:11--  http://jmcauley.ucsd.edu/cse190/data/beer/beer_50000.json
Resolving jmcauley.ucsd.edu (jmcauley.ucsd.edu)... 137.110.160.73
Connecting to jmcauley.ucsd.edu (jmcauley.ucsd.edu)|137.110.160.73|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61156124 (58M) [application/json]
Saving to: ‘beer_50000.json.12’


2024-08-20 04:11:12 (73.3 MB/s) - ‘beer_50000.json.12’ saved [61156124/61156124]



In [252]:
appearances = []
tastes = []
names = []
ratings = []
users = []
items = []
aromas = []

days = []
months = []
years = []

with open('beer_50000.json') as f:

  for line in f:
    l = line.replace('\n' , '')
    formated_l = eval(l)

    appearance = formated_l['review/appearance']
    taste = formated_l['review/taste']
    name = formated_l['beer/name']
    rating = formated_l['review/overall']
    user_id = formated_l['beer/brewerId']
    item_id = formated_l['beer/beerId']
    aroma = formated_l['review/aroma']

    day = formated_l['review/timeStruct']['mday']
    month = formated_l['review/timeStruct']['mon']
    year = formated_l['review/timeStruct']['year']

    appearances.append(appearance)
    tastes.append(taste)
    names.append(name)
    ratings.append(rating)
    users.append(user_id)
    items.append(item_id)
    aromas.append(aroma)

    days.append(day)
    months.append(month)
    years.append(year)


df = pd.DataFrame()

df['user_id'] = users
df['item_id'] = items
df['rating'] = ratings
df['aroma'] = aromas
df['taste'] = tastes
df['appearance'] = appearances
df['day'] = days
df['month'] = months
df['year'] = years

df

Unnamed: 0,user_id,item_id,rating,aroma,taste,appearance,day,month,year
0,10325,47986,1.5,2.0,1.5,2.5,16,2,2009
1,10325,48213,3.0,2.5,3.0,3.0,1,3,2009
2,10325,48215,3.0,2.5,3.0,3.0,1,3,2009
3,10325,47969,3.0,3.0,3.0,3.5,15,2,2009
4,1075,64883,4.0,4.5,4.5,4.0,30,12,2010
...,...,...,...,...,...,...,...,...,...
49995,394,20539,4.0,4.0,4.0,4.0,4,12,2007
49996,394,20539,4.0,4.0,4.0,3.5,30,11,2007
49997,394,20539,3.5,3.5,4.5,4.0,28,11,2007
49998,394,20539,4.0,4.0,4.5,4.0,27,11,2007


In [253]:
# MinMax Scaler a datos entre 0 y 1
scaler = MinMaxScaler()
df['rating_scaled'] = scaler.fit_transform(df['rating'].values.reshape(-1,1))
df['aroma_scaled'] = scaler.fit_transform(df['aroma'].values.reshape(-1,1))
df['taste_scaled'] = scaler.fit_transform(df['taste'].values.reshape(-1,1))
df['appearance_scaled'] = scaler.fit_transform(df['appearance'].values.reshape(-1,1))

# Crear target encoder
encoder = ce.TargetEncoder(smoothing=100)
df['user_id_encoded'] = encoder.fit_transform(df['user_id'], df['rating_scaled'])

# Considerar solo items (cervezas) consumidas más de N veces para reducir espacio de busqueda
df_filtered = df.groupby('item_id').filter(lambda x: len(x) > 100)

# Asignar un correlativo al item_id comenzando desde 1
df_filtered['action'] = pd.factorize(df_filtered['item_id'])[0] + 1

df_filtered

Unnamed: 0,user_id,item_id,rating,aroma,taste,appearance,day,month,year,rating_scaled,aroma_scaled,taste_scaled,appearance_scaled,user_id_encoded,action
59,1075,25414,4.0,3.5,4.0,3.5,26,8,2009,0.8,0.625,0.750,0.7,0.790856,1
60,1075,25414,2.5,3.0,2.5,3.5,22,8,2009,0.5,0.500,0.375,0.7,0.790856,1
61,1075,25414,4.0,3.5,3.5,4.0,10,8,2009,0.8,0.625,0.625,0.8,0.790856,1
62,1075,25414,4.5,3.5,4.0,4.0,9,8,2009,0.9,0.625,0.750,0.8,0.790856,1
63,1075,25414,4.5,3.5,4.0,4.0,6,8,2009,0.9,0.625,0.750,0.8,0.790856,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,394,20539,4.0,4.0,4.0,4.0,4,12,2007,0.8,0.750,0.750,0.8,0.783817,83
49996,394,20539,4.0,4.0,4.0,3.5,30,11,2007,0.8,0.750,0.750,0.7,0.783817,83
49997,394,20539,3.5,3.5,4.5,4.0,28,11,2007,0.7,0.625,0.875,0.8,0.783817,83
49998,394,20539,4.0,4.0,4.5,4.0,27,11,2007,0.8,0.750,0.875,0.8,0.783817,83


In [254]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Cargar el DataFrame
df_filtered

# Preprocesamiento
X = df_filtered[['aroma_scaled', 'taste_scaled', 'appearance_scaled']].values
y = df_filtered['rating_scaled'].values

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Dividir en conjunto de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


In [255]:
from sklearn.linear_model import LinearRegression

class LinUCB:
    def __init__(self, alpha=1.0):
        self.alpha = alpha
        self.model = LinearRegression()
        self.X_train = None
        self.y_train = None

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
        self.model.fit(X, y)

    def predict(self, X):
        return self.model.predict(X)

# Ajuste y predicción
linucb = LinUCB(alpha=1.0)
linucb.fit(X_train, y_train)
y_pred_train = linucb.predict(X_train)
y_pred_test = linucb.predict(X_test)


In [256]:
from sklearn.metrics import mean_squared_error

# Evaluar el rendimiento de LinUCB
mse_train_linucb = mean_squared_error(y_train, y_pred_train)
mse_test_linucb = mean_squared_error(y_test, y_pred_test)

print(f"LinUCB - MSE Train: {mse_train_linucb}")
print(f"LinUCB - MSE Test: {mse_test_linucb}")


LinUCB - MSE Train: 0.007331916121167878
LinUCB - MSE Test: 0.007240746218284296


### ESCRIBIR COMENTARIOS AQUI

1.   EpsilonGreedy:

*   EpsilonGreedy - MSE Train: 0.01705017823835765
*   EpsilonGreedy - MSE Test: 0.01651976023150178

El algoritmo EpsilonGreedy muestra un rendimiento bastante bueno tanto en el conjunto de entrenamiento como en el de prueba. La diferencia entre MSE en entrenamiento y prueba es pequeña, lo que indica que el modelo no está sobreajustado y se generaliza bien a nuevos datos.



2.   UCB1:

*   UCB1 - MSE Train: 0.021284071378814558
*   UCB1 - MSE Test: 0.02074316614147233

El algoritmo UCB1 tiene un rendimiento ligeramente inferior al de EpsilonGreedy en ambos conjuntos de datos. Aunque las diferencias no son muy grandes, su MSE es mayor, lo que sugiere que el modelo puede no estar tan optimizado para este conjunto de datos específico.



3.   LinUCB:

*   LinUCB - MSE Train: 0.007331916121167878
*   LinUCB - MSE Test: 0.007240746218284296

LinUCB destaca con un MSE significativamente más bajo en comparación con EpsilonGreedy y UCB1. Esto indica que LinUCB es el mejor algoritmo en términos de error de predicción, tanto en entrenamiento como en prueba. La diferencia en el MSE entre entrenamiento y prueba también es pequeña, sugiriendo una buena capacidad de generalización.


**Comentario final**

*   Los resultados sugieren que LinUCB es el mejor algoritmo de los tres evaluados, mostrando el menor MSE en ambos conjuntos de datos. Esto sugiere que LinUCB es el más efectivo en la predicción de las calificaciones en comparación con EpsilonGreedy y UCB1. La capacidad de generalización de LinUCB también parece ser muy buena, ya que el MSE en el conjunto de prueba es muy similar al del conjunto de entrenamiento.

*   EpsilonGreedy sigue siendo una opción importante, con un rendimiento muy cercano a LinUCB, pero con un MSE ligeramente mayor. UCB1, aunque es útil, parece ser menos eficiente en comparación con los otros dos algoritmos para este conjunto de datos específico.

*   En general, LinUCB podría ser preferible en escenarios donde la precisión en la predicción es crucial, mientras que EpsilonGreedy podría ser una opción más simple si la precisión no es tan crítica.