# Contextual Multi-Armed Bandit Recommendation System
Div Dasani

The recommendation problem can be contextualized as an agent recommending an action given a pre-existing state. Contextual multi-armed bandits use this perspective to suggest optimal recommendations, trading-off between exploration and exploitation to minimize cumulative regret. The bandit algorithms are appropriate to use in recommendation problems when there are many cold items (items which have no or little interaction data) in the catalog or if user preferences change over time.

The MovieLens100K dataset is used for this notebook. AWS architecture is employed to make the system more easily scalable and deployable.

In [None]:
import sagemaker
import boto3
import sys
import os
import json
import glob
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import HTML
import time
from time import gmtime, strftime
from misc import get_execution_role, wait_for_s3_object
from sagemaker.rl import RLEstimator
%matplotlib inline

### Setup S3 bucket

In [None]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

#AWS region must be us-west-2
aws_region = sage_session.boto_region_name

### Create an IAM role

In [None]:
instance_type = 'local'
role = get_execution_role()
print("Using IAM role arn: {}".format(role))

### Download MovieLens 100K and upload to S3

In [None]:
%%bash
curl -o ml-100k.zip http://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip

In [None]:
movielens_data_s3_path = sage_session.upload_data(path="ml-100k", bucket=s3_bucket, key_prefix="movielens/data")

### Model Training

In [None]:
hyperparameters = {
                   # Algorithm params
                   "arm_features": True,
                   "exploration_policy": "regcbopt",
                   "mellowness": 0.01,
                   
                   # Env params
                   "item_pool_size": 100,
                   "top_k": 5,
                   "total_interactions": 2000,
                   "max_users": 100,
                   }

job_name_prefix = "testbed-bandits-1"

In [None]:
vw_image_uri = "462105765813.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-vw-container:adf"

In [None]:
estimator = RLEstimator(entry_point="train.py",
                        source_dir='src',
                        image_name=vw_image_uri,
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        hyperparameters = hyperparameters
                    )

estimator.fit(inputs={"movielens": movielens_data_s3_path}, wait=True)

### Plot for Performance Evaluation

In [None]:
job_name = estimator.latest_training_job.job_name
output_path_prefix = f"{job_name}/output.tar.gz"
model_path = f"{job_name}/model.tar.gz"
sage_session.download_data(path="./output", bucket=s3_bucket, key_prefix=output_path_prefix)

In [None]:
%%bash
tar -C ./output -xvzf ./output/output.tar.gz

In [None]:
output_path_local = "output/data/output.json"
with open(output_path_local) as f:
    all_regrets = json.load(f)
    
all_regrets = {key: np.cumsum(val) for key,val in all_regrets.items()}
df = pd.DataFrame(all_regrets)
df.plot(title="Cumulative Regret")

### Create a SageMaker model for inference

In [None]:
sage_session = sagemaker.local.LocalSession()

In [None]:
bandit_model = sagemaker.model.Model(image=vw_image_uri,
                                     role=role,
                                     name="vw-model-1",
                                     model_data=f"s3://{s3_bucket}/{model_path}",
                                     sagemaker_session=sage_session)

In [None]:
bandit_model.deploy(initial_instance_count=1, instance_type=instance_type, endpoint_name="bandit")

In [None]:
predictor = sagemaker.predictor.RealTimePredictor(endpoint="bandit",
                                                  sagemaker_session=bandit_model.sagemaker_session,
                                                  serializer=sagemaker.predictor.json_serializer,
                                                  deserializer=sagemaker.predictor.json_deserializer,
                                                 )

In [None]:
predictor.predict({"shared_context": None, "actions_context": [[0, 0, 1], [1, 0, 0], [1, 1, 1]], "top_k": 2})

### Clean Up endpoint

In [None]:
if "predictor" in locals():
    predictor.delete_endpoint()