# Contextual Bandits Demo for VacSIM policy 
The implenmentation of this policy is done using a python package used of implmenting Contextual Bandits called [SpaceBandits](https://pypi.org/project/space-bandits/).

In [None]:
!pip install space-bandits

## Implementing Linear Model
All the necessary scientific libraries are imported. We have used LinearBandits model which is simplest model of the package. As mentioned in the documentation itself, this package maps contexts to expected rewards with linear coefficients.

In [None]:
import numpy as np
import pandas as pd
# from google.colab import files
import math
import matplotlib.pyplot as plt
from space_bandits import LinearBandits 

In [None]:
files.upload()

In [None]:
df = pd.read_csv('VacSIM Input.csv')
df.info()

In [None]:
df.columns

## Training the model

In [None]:
df.head()

After normalising the actions coming from ACKTR (Sub model - 1) we have in total 0-100 possible actions i.e. 101 actions. Similarly the context that we considered for our Model is :
* Susceptible Population - 'Susceptible'. 
* Total number of infected cases predicted using SEIR -  'Total predicted cases'.
* Death Rate (calculated using total cases of Deaths using SEIR) -  'Death Rate (Predicted)'.
* Percentage of Share of population of a State -  'Population Share'.
* Recovery Rate (calculated using total cases of Recoveries using SEIR) 'Recovery Rate (Predicted)'.

Using the total possible actions and number of contexts considered, we intialized the constructor of LinearBandits class

In [None]:
num_actions = 101 
num_features = 5 
model = LinearBandits(num_actions, num_features)

The input data is split such a manner that The input training data is considered from 1 September to 15 September each including 50 episodes total of 5 states i.e. 50*5 = 250 for a single date. This method was adopted to make results robust.

On the other hand, the input test data is considered to be 

In [None]:
df_train = df[0:3750]  
df_test = df[3750:4000] #this data corresponds to one date only (16 September) for demonstration purpose    
# For future dates the data will be from row 4000 onwards, 250 rows for each date   
df_train.shape, df_test.shape

For each iteration we of training we considered an array of context, A chosen action and corresponding reward. These values are passed to .update() method to update LinearBandits model's internal state.  

In [None]:
for i in range(len(df_train)):
    context = df_train[['Susceptible', 'Total predicted cases', 'Death Rate (Predicted)', 'Population Share', 'Recovery Rate (Predicted)']].iloc[i].to_numpy()
    action =  int((df_train[['Normalized Action']]).iloc[i].to_numpy())
    reward =  int(df_train[['Reward']].iloc[i].to_numpy())
    model.update(context, action, reward)
print('Done')

## Predicting the actions

In [None]:
df_test.head()

After training the model as above, we can use the .action() method to map a given context to the action with the highest expected reward. The values of action are stored in array to be recorded and exported.


In [None]:
column_names = ["Action", "Date", "Name of State"]
df_10 = pd.DataFrame(columns = column_names)
arr_action = []
for i in range(len(df_test)): 
  new_context =  df_test[['Susceptible', 'Total predicted cases', 'Death Rate (Predicted)', 'Population Share','Recovery Rate (Predicted)']].iloc[i].to_numpy()
  arr_action.append(model.action(new_context))
  df_10.loc[i] = [arr_action[i],df_test['Date'].iloc[i],df_test['Name of State'].iloc[i]]
  # print(f"Action {i} = {arr_action[i]}\t  Date : {df_test['Date'].iloc[i]}\t Location : {df_test['Name of State'].iloc[i]} \t ")
pd.set_option("display.max_rows", None, "display.max_columns", None)
df_10=df_10.sort_values(by=['Name of State'])


Given below is the reward matrix. Here rows represents all possible actions and columns represents the iteration.

In [None]:
df_test_context  = df_test[['Susceptible', 'Total predicted cases', 'Death Rate (Predicted)', 'Population Share','Recovery Rate (Predicted)']].to_numpy()
exp = model.expected_values(df_test_context)
df_exp_reward = pd.DataFrame(exp)
df_exp_reward.T.head(n=50)

## Calculating mean action and normalised action
As defined in Dataset README.md

In [None]:
print("Assam", 
"Mean Action:",df_10[0:50]['Action'].mean(), 
"Normalised Action:",100*df_10[0:50]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))


In [None]:
print("Delhi", 
"Mean Action:",df_10[50:100]['Action'].mean(),
"Normalised Action:",100*df_10[50:100]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))


In [None]:

print("Jharkhand", 
"Mean Action:",df_10[100:150].mean(),
"Normalised Action:",100*df_10[100:150].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))

In [None]:

print("Maharashtra", 
"Mean Action:",df_10[150:200].mean(),
"Normalised Action:",100*df_10[150:200].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))

In [None]:

print("Nagaland", 
"Mean Action:",df_10[200:250].mean(),
"Normalised Action:",100*df_10[200:250].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))