# Contextual Bandits with Parametric Actions -- Experimentation Mode

We demonstrate how you can use varying number of actions with contextual bandits algorithms in SageMaker. This notebook builds on 
the [Contextual Bandits example notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/reinforcement_learning/bandits_statlog_vw_customEnv/bandits_statlog_vw_customEnv.ipynb) example notebook which used fixed number of actions. Please refer to that notebook for basics on contextual 
bandits. 

In the contextual bandit setting, an agent recommends an action given a state. This notebook introduces three features to bandit 
algorithms that makes them applicable to a broader set of real-world problems. We use the movie recommendation problem as an example.
1. The number of actions available to the agent can change over time. For example, the movies in the catalog changes over time.
2. Each action may have features associated with it. For the movie recommendation problem, each movie can have features such as 
genre, cast, etc.
3. The agent can pick multiple actions. When recommending movies, it is natural that multiple movies are recommended at a time.

The contextual bandit agent will trade-off between exploitation and exploration to quickly learn user preferences and minimize 
poor recommendations. The bandit algorithms are appropriate to use in recommendation problems when there are many cold items (items which have no or little interaction data) in the catalog or if user preferences change over time.

## What is Experimentation Mode?

Contextual bandits are often used to train models by interacting with the real world. In movie recommendation, the bandit learns user preferences based on their feedback from past interactions. To test if bandit algorithms are applicable for your use case, you may want to test different algorithms and understand the impact of different features, hyper-parameters. Experimenting with real users can lead to poor experience due to unanticipated issues or poor performance. Experimenting in production comes with the complexity of working with infrastructure components (e.g. web services, data engines, databases) designed for scale. With Experimentation Mode, you can get started with a small dataset or a simulator and identify the algorithm, features and hyper-parameters that are best applicable for your use case. The experimentation is much faster, does not impact real users and easy to work with. Once you are satisfied with the algorithm performance, you can switch to Deployment Mode, where we provide infrastructure support that scales to production requirements.

In [None]:
import sagemaker
import boto3
import sys
import os
import json
import glob
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import subprocess
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

In [None]:
%matplotlib inline

In [None]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

In [None]:
# run in local mode?
local_mode = True

if local_mode:
    instance_type = 'local'
else:
    instance_type = "ml.c5.xlarge"

In [None]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

#### Download MovieLens 100K and upload to S3

In [None]:
%%bash
curl -o ml-100k.zip http://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip

In [None]:
movielens_data_s3_path = sage_session.upload_data(path="ml-100k", bucket=s3_bucket, key_prefix="movielens/data")

#### Define the hyperparameters and start the training job

In [None]:
hyperparameters = {
                   # Algorithm params
                   "arm_features": True,
                   "exploration_policy": "regcbopt",
                   "mellowness": 0.01,
                   
                   # Env params
                   "item_pool_size": 100,
                   "top_k": 5,
                   "total_interactions": 1000,
                   "max_users": 100,
                   }

job_name = "testbed-bandits-1"

In [None]:
estimator = RLEstimator(entry_point="train.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        image_name="462105765813.dkr.ecr.us-west-2.amazonaws.com/sagemaker-rl-vw-container:adf",
                        role=role,
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name,
                        hyperparameters = hyperparameters
                    )

estimator.fit(inputs={"movielens": movielens_data_s3_path}, wait=True)

#### Download the outputs to plot performance

In [None]:
if local_mode:
    output_path_prefix = f"{estimator.latest_training_job.job_name}/output.tar.gz"
else:
    output_path_prefix = f"{estimator.latest_training_job.job_name}/output/output.tar.gz"
    
sage_session.download_data(path="./output", bucket=s3_bucket, key_prefix=output_path_prefix)

In [None]:
%%bash
tar -C ./output -xvzf ./output/output.tar.gz

In [None]:
if local_mode:
    output_path_local = "output/data/output.json"
else:
    output_path_local = "output/output.json"

with open(output_path_local) as f:
    all_regrets = json.load(f)

In [None]:
all_regrets = {key: np.cumsum(val) for key,val in all_regrets.items()}
df = pd.DataFrame(all_regrets)
df.plot()