# Contextual Bandits Demo for VacSIM policy 
The implenmentation of this policy is done using a python package used of implmenting Contextual Bandits called [SpaceBandits](https://pypi.org/project/space-bandits/).

In [1]:
!pip install space-bandits



## Implementing Linear Model
All the necessary scientific libraries are imported. We have used LinearBandits model which is simplest model of the package. As mentioned in the documentation itself, this package maps contexts to expected rewards with linear coefficients.

In [2]:
import numpy as np
import pandas as pd
# from google.colab import files
import math
import matplotlib.pyplot as plt
from space_bandits import LinearBandits 
import matplotlib.pyplot as plt

In [6]:
df = pd.read_csv('Datasets/VacSIM Input.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Date                       7500 non-null   object 
 1   Name of State              7500 non-null   object 
 2   Susceptible                7500 non-null   int64  
 3   Total predicted cases      7500 non-null   int64  
 4   Death Rate (Predicted)     7500 non-null   float64
 5   Recovery Rate (Predicted)  7500 non-null   float64
 6   Population Share           7500 non-null   float64
 7   Action                     3750 non-null   float64
 8   Normalized Action          3750 non-null   float64
 9   Reward                     3750 non-null   float64
dtypes: float64(6), int64(2), object(2)
memory usage: 586.1+ KB


In [6]:
df.columns

Index(['Date', 'Name of State', 'Susceptible', 'Total predicted cases',
       'Death Rate (Predicted)', 'Recovery Rate (Predicted)',
       'Population Share', 'Action', 'Normalized Action', 'Reward'],
      dtype='object')

## Training the model

In [7]:
df.head()

Unnamed: 0,Date,Name of State,Susceptible,Total predicted cases,Death Rate (Predicted),Recovery Rate (Predicted),Population Share,Action,Normalized Action,Reward
0,01-09-2020,Assam,30885978,251754,1.029974,45.496794,15.975457,16.0,30.76923,3.475579
1,01-09-2020,Delhi,16165867,494104,1.029945,46.228932,8.594458,4.0,7.692308,296080.856
2,01-09-2020,Jharkhand,32819208,133576,1.029377,45.964095,16.888024,8.0,15.38462,11009.34939
3,01-09-2020,Maharashtra,106380541,4631183,1.030104,43.097455,57.529182,7.0,13.46154,97005.85851
4,01-09-2020,Nagaland,1943497,27048,1.027802,43.345164,1.012879,17.0,32.69231,0.080389


After normalising the actions coming from ACKTR (Sub model - 1) we have in total 0-100 possible actions i.e. 101 actions. Similarly the context that we considered for our Model is :
* Susceptible Population - 'Susceptible'. 
* Total number of infected cases predicted using SEIR -  'Total predicted cases'.
* Death Rate (calculated using total cases of Deaths using SEIR) -  'Death Rate (Predicted)'.
* Percentage of Share of population of a State -  'Population Share'.
* Recovery Rate (calculated using total cases of Recoveries using SEIR) 'Recovery Rate (Predicted)'.

Using the total possible actions and number of contexts considered, we intialized the constructor of LinearBandits class

In [8]:
num_actions = 101 
num_features = 5 
model = LinearBandits(num_actions, num_features)

The input data is split such a manner that The input training data is considered from 1 September to 15 September each including 50 episodes total of 5 states i.e. 50*5 = 250 for a single date. This method was adopted to make results robust.

On the other hand, the input test data is corresponds to one date only (16 September) for demonstration purpose    

In [9]:
df_train = df[0:3750]  
df_test = df[3750:4000] #this data corresponds to one date only (16 September) for demonstration purpose    
# For future dates the data will be from row 4000 onwards, 250 rows for each date   
df_train.shape, df_test.shape

((3750, 10), (250, 10))

For each iteration we of training we considered an array of context, A chosen action and corresponding reward. These values are passed to .update() method to update LinearBandits model's internal state.  

In [10]:
for i in range(len(df_train)):
    context = df_train[['Susceptible', 'Total predicted cases', 'Death Rate (Predicted)', 'Population Share', 'Recovery Rate (Predicted)']].iloc[i].to_numpy()
    action =  int((df_train[['Normalized Action']]).iloc[i].to_numpy())
    reward =  int(df_train[['Reward']].iloc[i].to_numpy())
    model.update(context, action, reward)
print('Done')

Done


## Predicting the actions

In [11]:
df_test.head()

Unnamed: 0,Date,Name of State,Susceptible,Total predicted cases,Death Rate (Predicted),Recovery Rate (Predicted),Population Share,Action,Normalized Action,Reward
3750,16-09-2020,Assam,30441898,603248,1.03009,45.700939,15.975457,,,
3751,16-09-2020,Delhi,15375295,1132659,1.030054,46.946875,8.594458,,,
3752,16-09-2020,Jharkhand,32589056,316000,1.030063,46.067722,16.888024,,,
3753,16-09-2020,Maharashtra,97478946,11704661,1.030102,44.23859,57.529182,,,
3754,16-09-2020,Nagaland,1888867,69661,1.02927,43.756191,1.012879,,,


After training the model as above, we can use the .action() method to map a given context to the action with the highest expected reward. The values of action are stored in array to be recorded and exported.


In [12]:
column_names = ["Action", "Date", "Name of State"]
df_10 = pd.DataFrame(columns = column_names)
arr_action = []
for i in range(len(df_test)): 
  new_context =  df_test[['Susceptible', 'Total predicted cases', 'Death Rate (Predicted)', 'Population Share','Recovery Rate (Predicted)']].iloc[i].to_numpy()
  arr_action.append(model.action(new_context))
  df_10.loc[i] = [arr_action[i],df_test['Date'].iloc[i],df_test['Name of State'].iloc[i]]
  # print(f"Action {i} = {arr_action[i]}\t  Date : {df_test['Date'].iloc[i]}\t Location : {df_test['Name of State'].iloc[i]} \t ")
pd.set_option("display.max_rows", None, "display.max_columns", None)
df_10=df_10.sort_values(by=['Name of State'])


  multivariates = [np.random.multivariate_normal(mus[j], covs[j]) for j in range(n_rows)]


Given below is the reward matrix. Here rows represents all possible actions and columns represents the iteration.

In [23]:
df_test_context  = df_test[['Susceptible', 'Total predicted cases', 'Death Rate (Predicted)', 'Population Share','Recovery Rate (Predicted)']].to_numpy()
exp = model.expected_values(df_test_context)
df_exp_reward = pd.DataFrame(exp)
df_exp_reward.T.head(n=5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
0,30420160.0,24957270.0,11447240.0,7435456.0,20667330.0,3155892.0,2174625.0,689306.0,689132.9,1052022.0,28402530.0,17696890.0,414847.689633,14071760.0,487829300.0,51851.175989,4428075.0,26261.362751,6192.918311,69263.962229,-3261006.0,3176.433411,152155.4,4561.035381,15646680.0,3343361.0,-353.244392,13836050.0,18847990.0,1127.302579,18205450.0,70.68968,6336073.0,19920370.0,101.795535,20321230.0,5.304361,21.481955,79.410586,1.638635,22391890.0,17029610.0,7.816525,-0.119779,-432280.8,1.202765,0.294104,29530800.0,-359045.4,-18759920.0,22.681621,83.138679,532.028386,30194280.0,-0.816769,30213970.0,0.0,-24966810.0,0.0,-13537120.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.176417,0.0,0.0,0.0,30098180.0,190.912784,0.0,30105320.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16494090.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30309370.0
1,15307000.0,2802183.0,6245173.0,4807902.0,14972000.0,2752979.0,143722.3,1857918.0,411847.3,20640790.0,-288556.8,4080246.0,204652.043608,5128454.0,133014000.0,24484.552922,11398580.0,9008.574523,5152.996803,-95198.102437,17227710.0,3107.935047,695497.8,2225.436399,4903166.0,10079340.0,1163.998734,5830509.0,9899138.0,1657.928096,7894260.0,3.986577,11785090.0,1473724.0,44.221391,5982326.0,0.000642,-1.080425,164.133194,-5.084425,-733202.5,-11905530.0,-4.879343,-5.831023,-314991.5,-1.95721,0.14238,15245690.0,1619013.0,19138050.0,-25.231895,176.523721,271.087854,15278590.0,-3.714607,15409910.0,0.0,15575850.0,0.0,-30489690.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.569522,0.0,0.0,0.0,15320740.0,93.806488,0.0,15074350.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10580780.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15312680.0
2,32571740.0,30576670.0,12129820.0,7327458.0,22736390.0,3074650.0,2612729.0,345388.7,715921.8,-6216672.0,33993010.0,21691210.0,473571.901374,17235050.0,601955000.0,59581.365653,3526321.0,27246.430135,5335.497797,98936.055094,-8451708.0,3378.889396,-31353.69,5376.073193,19049300.0,2429646.0,-809.894749,16426500.0,22297500.0,1079.70182,21399040.0,83.541434,5767780.0,24575330.0,118.802701,24377900.0,6.45167,27.654829,66.155433,4.136523,29719610.0,25120970.0,25.886257,2.17027,-109486.0,2.308627,0.317303,31482020.0,-871652.4,-31480200.0,35.064691,18.855445,568.607465,32312680.0,0.440897,32285410.0,0.0,-37809300.0,0.0,-5070747.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.269212,0.0,0.0,0.0,32173680.0,205.421173,0.0,32280890.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16761210.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,32445460.0
3,97304560.0,-32409310.0,36529870.0,45471060.0,-2958834.0,12582630.0,708559.0,9854870.0,2750157.0,212274700.0,20620020.0,-26946040.0,50012.363411,-27812820.0,-1014039000.0,7695.126313,19218420.0,146583.621253,66888.630695,-45224.008406,98592940.0,1855.639572,1109231.0,-5675.188655,-25041410.0,15269020.0,10005.320045,-15770280.0,-26366250.0,837.884575,-25616960.0,81.640859,5269000.0,-20462380.0,-77.553229,-33872620.0,-1.44368,-64.911932,28.002787,-65.690752,-127097000.0,-112988300.0,-882.390567,-68.495684,-20754080.0,-26.366361,0.869017,98463340.0,5764295.0,277662000.0,-147.754417,2704.234457,1731.671727,97020740.0,-41.591371,98516440.0,0.0,244350800.0,0.0,-322528700.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.476069,0.0,0.0,0.0,97783500.0,580.427872,0.0,94855550.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,79375230.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,97105550.0
4,1868602.0,731519.5,1547428.0,-124230.2,12370780.0,1640177.0,-368803.0,949066.3,67748.33,2703908.0,-6417776.0,1367350.0,112737.860517,2614116.0,70285170.0,11186.150609,7291245.0,-6264.965531,-1378.047572,-100692.385372,8511578.0,2165.187766,1131937.0,1482.095733,2321171.0,6869875.0,381.604635,3388921.0,6300373.0,1513.043629,6889384.0,-15.811888,10417110.0,-1697097.0,30.107573,5624383.0,-0.733955,-0.054825,177.933388,-0.054802,7002274.0,-7588442.0,83.293966,-0.232061,1875401.0,-0.033468,0.018009,1845203.0,1658969.0,-50202.09,-21.25574,-66.578745,33.103936,1874607.0,-0.179219,1880553.0,0.0,422641.5,0.0,-1760672.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.072035,0.0,0.0,0.0,1872174.0,11.743902,0.0,1862885.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1111019.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1880812.0


## Calculating mean action and normalised action
As defined in Dataset README.md

In [14]:
print("Assam", 
"Mean Action:",df_10[0:50]['Action'].mean(), 
"Normalised Action:",100*df_10[0:50]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))


Assam Mean Action: 60.64 Normalised Action: 19.66405084635839


In [15]:
print("Delhi", 
"Mean Action:",df_10[50:100]['Action'].mean(),
"Normalised Action:",100*df_10[50:100]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))


Delhi Mean Action: 63.18 Normalised Action: 20.487709968221026


In [19]:

print("Jharkhand", 
"Mean Action:",df_10[100:150]['Action'].mean(),
"Normalised Action:",100*df_10[100:150]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))

Jharkhand Mean Action: 58.18 Normalised Action: 18.866333744081977


In [20]:

print("Maharashtra", 
"Mean Action:",df_10[150:200]['Action'].mean(),
"Normalised Action:",100*df_10[150:200]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))

Maharashtra Mean Action: 66.22 Normalised Action: 21.473506712497567


In [22]:

print("Nagaland", 
"Mean Action:",df_10[200:250]['Action'].mean(),
"Normalised Action:",100*df_10[200:250]['Action'].mean()/(df_10[0:50]['Action'].mean()+df_10[50:100]['Action'].mean()+df_10[100:150]['Action'].mean()+df_10[150:200]['Action'].mean()+df_10[200:250]['Action'].mean()))

Nagaland Mean Action: 60.16 Normalised Action: 19.50839872884104
