# Lab 3: Contextual Bandit-Based News Article Recommendation

**`Course`:** Reinforcement Learning Fundamentals  
**`Student Name`:**  Vardhaman Kalloli

**`Roll Number`:**  U20230048
**`GitHub Branch`:** vardhaman_U20230048  

# Imports and Setup

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

from rlcmab_sampler import sampler


# Load Datasets

In [2]:
# Load datasets
news_df = pd.read_csv("data/news_articles.csv")
train_users = pd.read_csv("data/train_users.csv")
test_users = pd.read_csv("data/test_users.csv")

print(news_df.head())
print(train_users.head())


                                                link  \
0  https://www.huffpost.com/entry/covid-boosters-...   
1  https://www.huffpost.com/entry/american-airlin...   
2  https://www.huffpost.com/entry/funniest-tweets...   
3  https://www.huffpost.com/entry/funniest-parent...   
4  https://www.huffpost.com/entry/amy-cooper-lose...   

                                            headline   category  \
0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS   
1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   
2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   
3  The Funniest Tweets From Parents This Week (Se...  PARENTING   
4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   

                                   short_description               authors  \
0  Health experts said it is too early to predict...  Carla K. Johnson, AP   
1  He was subdued by passengers and crew when he ...        Mary Papenfuss   
2  "Until you have a dog y

## Data Preprocessing

In this section:
- Handle missing values
- Encode categorical features
- Prepare data for user classification

In [None]:
# Handle missing values
print("Missing values in train_users:")
print(train_users.isnull().sum())
print("\nMissing values in test_users:")
print(test_users.isnull().sum())

# Fill missing values with median for numerical columns
numerical_cols = ['age', 'income', 'clicks', 'purchase_amount', 'session_duration', 
                  'content_variety', 'engagement_score', 'num_transactions', 
                  'avg_monthly_spend', 'avg_cart_value', 'browsing_depth', 
                  'revisit_rate', 'scroll_activity', 'time_on_site', 
                  'interaction_count', 'preferred_price_range', 'discount_usage_rate', 
                  'wishlist_size', 'product_views', 'repeat_purchase_gap (days)', 
                  'churn_risk_score', 'loyalty_index', 'screen_brightness', 
                  'battery_percentage', 'cart_abandonment_count', 
                  'background_app_count', 'session_inactivity_duration', 'network_jitter']

for col in numerical_cols:
    if col in train_users.columns:
        median_val = train_users[col].median()
        train_users[col].fillna(median_val, inplace=True)
        if col in test_users.columns:
            test_users[col].fillna(median_val, inplace=True)

# encode categorical features
categorical_cols = ['region_code', 'subscriber', 'browser_version']
# initialize label encoders
label_encoders = {}

for col in categorical_cols:
    if col in train_users.columns:
        le = LabelEncoder()
        combined_values = pd.concat([train_users[col].astype(str), 
                                    test_users[col].astype(str)])
        le.fit(combined_values)
        
        train_users[col + '_encoded'] = le.transform(train_users[col].astype(str))
        test_users[col + '_encoded'] = le.transform(test_users[col].astype(str))
        label_encoders[col] = le

# encode the target variable: user label
label_encoder_target = LabelEncoder()
train_users['label_encoded'] = label_encoder_target.fit_transform(train_users['label'])

print("\nLabel encoding mapping:")
for i, label in enumerate(label_encoder_target.classes_):
    print(f"{label} -> {i}")

cols_to_drop = ['user_id', 'region_code', 'subscriber', 'browser_version', 'label']
feature_cols = [col for col in train_users.columns 
                if col not in cols_to_drop and col != 'label_encoded']

X_train = train_users[feature_cols]
y_train = train_users['label_encoded']

X_test = test_users[feature_cols]

print(f"\nTraining set shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test set shape: {X_test.shape}")
print(f"\nFeature columns: {len(feature_cols)}")


Missing values in train_users:
user_id                          0
age                            698
income                           0
clicks                           0
purchase_amount                  0
session_duration                 0
content_variety                  0
engagement_score                 0
num_transactions                 0
avg_monthly_spend                0
avg_cart_value                   0
browsing_depth                   0
revisit_rate                     0
scroll_activity                  0
time_on_site                     0
interaction_count                0
preferred_price_range            0
discount_usage_rate              0
wishlist_size                    0
product_views                    0
repeat_purchase_gap (days)       0
churn_risk_score                 0
loyalty_index                    0
screen_brightness                0
battery_percentage               0
cart_abandonment_count           0
browser_version                  0
background_app_count    

/var/folders/df/t9cn2q3s5bv415v50tksnk1c0000gn/T/ipykernel_5596/2164864743.py:21: ChainedAssignmentError: A value is being set on a copy of a DataFrame or Series through chained assignment using an inplace method.
Such inplace method never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy (due to Copy-on-Write).

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' instead, to perform the operation inplace on the original object, or try to avoid an inplace operation using 'df[col] = df[col].method(value)'.

See the documentation for a more detailed explanation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html
  train_users[col].fillna(median_val, inplace=True)
/var/folders/df/t9cn2q3s5bv415v50tksnk1c0000gn/T/ipykernel_5596/2164864743.py:23: ChainedAssignmentError: A value is being set on a copy of a DataFrame or Se

## User Classification

Train a classifier to predict the user category (`User1`, `User2`, `User3`),
which serves as the **context** for the contextual bandit.


In [5]:
from sklearn.ensemble import RandomForestClassifier

# random forest classifier
print("Training Random Forest Classifier...")
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)
rf_classifier.fit(X_train, y_train)

# evaluate
y_train_pred = rf_classifier.predict(X_train)
train_accuracy = accuracy_score(y_train, y_train_pred)
print(f"Training Accuracy: {train_accuracy:.4f}")

test_user_contexts = rf_classifier.predict(X_test)
test_users['predicted_context'] = test_user_contexts
test_users['predicted_label'] = label_encoder_target.inverse_transform(test_user_contexts)

print(f"\nPredicted context distribution on test set:")
print(test_users['predicted_label'].value_counts())

# display sample predictions
print(f"\nSample predictions:")
print(test_users[['user_id', 'predicted_label', 'predicted_context']].head(10))


Training Random Forest Classifier...
Training Accuracy: 0.9580

Predicted context distribution on test set:
predicted_label
user_2    695
user_1    675
user_3    630
Name: count, dtype: int64

Sample predictions:
  user_id predicted_label  predicted_context
0   U4058          user_2                  1
1   U1118          user_1                  0
2   U6555          user_1                  0
3   U9170          user_1                  0
4   U3348          user_1                  0
5   U2244          user_3                  2
6   U3022          user_3                  2
7   U5291          user_1                  0
8   U1945          user_3                  2
9   U6084          user_3                  2


# `Contextual Bandit`

## Reward Sampler Initialization

The sampler is initialized using the student's roll number `i`.
Rewards are obtained using `sampler.sample(j)`.


## Arm Mapping

| Arm Index (j) | News Category | User Context |
|--------------|---------------|--------------|
| 0–3          | Entertainment, Education, Tech, Crime | User1 |
| 4–7          | Entertainment, Education, Tech, Crime | User2 |
| 8–11         | Entertainment, Education, Tech, Crime | User3 |

## Epsilon-Greedy Strategy

This section implements the epsilon-greedy contextual bandit algorithm.


## Upper Confidence Bound (UCB)

This section implements the UCB strategy for contextual bandits.

## SoftMax Strategy

This section implements the SoftMax strategy with temperature $ \tau = 1$.


## Reinforcement Learning Simulation

We simulate the bandit algorithms for $T = 10,000$ steps and record rewards.

P.S.: Change $T$ value as and if required.


## Results and Analysis

This section presents:
- Average Reward vs Time
- Hyperparameter comparisons
- Observations and discussion


## Final Observations

- Comparison of Epsilon-Greedy, UCB, and SoftMax
- Effect of hyperparameters
- Strengths and limitations of each approach
