# Space Bandits Contextual Bandits Demo
This notebook demonstrates the basic usage of Space Bandits. The package is currently in development. Install with:

```
git clone https://github.com/AlliedToasters/dev_bandits.git

cd dev_bandits

pip install -e .
```

## Build a Linear Model
The simplest model in the packages maps contexts to expected rewards with linear coefficients. Use the model constructor function; you must specify the feature length (number of features per row) and the number of actions available.

In [1]:
from cbandits import init_linear_model

num_actions = 5 #five actions
context_dim = 10 #ten features

model = init_linear_model(num_actions, context_dim)

  from ._conv import register_converters as _register_converters


## Train the Model with .update() Method
Use past examples of context, action, rewards to train the model. A context must have the dimension specified above; each training example must include one action (indexed from zero) and one associated reward.

In [2]:
import numpy as np
context = np.random.random((10))
print('example context vector: \n', context)
action = 2
print('example action chosen: \n', action)
reward = 5
print('example reward associated with: \n', reward)

#here we update the model:
model.update(context, action, reward)

example context vector: 
 [0.33209376 0.94569573 0.49106177 0.26366882 0.27001911 0.61302338
 0.57737893 0.49921066 0.38959186 0.8496655 ]
example action chosen: 
 2
example reward associated with: 
 5


## Make Decisions with .action() Method

After training the model, we can use the .action() method to map a given context to the action with the highest expected reward.

In [3]:
new_context = np.random.random((10))
print('new example context vector: \n', context)

print('model suggested action: ')
print(model.action(new_context))

new example context vector: 
 [0.33209376 0.94569573 0.49106177 0.26366882 0.27001911 0.61302338
 0.57737893 0.49921066 0.38959186 0.8496655 ]
model suggested action: 
1


## Expected Values


## Advanced Parameters
### Memory Management
The model keeps a record of all previous examples; this is useful for updating, but it's impractical in ongoing production scenarios. To limit the model's memory, specify the number of previous examples to "remember" using the memory_size argument.

```python
model = init_linear_model(num_actions, context_dim, memory_size=1000000)
```

The above specifies that the model only keep a running record of the last 1000000 updates.

### Initial Exploration
Thompson sampling gives us continuous, intelligent exploration throughout the model's lifetime. However, initial exploration can be very helpful for encouraging model convergence, especially with a cold start. Use the initial_pulls argument to force the model to explore before using Thompson sampling. The model will sequentially try each action initial_pulls number of times; this results in initial_pulls * n_actions exploratory actions.

```python
model = init_linear_model(num_actions, context_dim, initial_pulls=2)
```

The above will result in the model suggesting each action 2 times before using Thompson sampling to suggest actions.

### Saving Your Model
Each cbandits model has a .save() method. Use it to save models for later use.

In [4]:
model.save('my_saved_model') #save to file my_saved_model

from cbandits import load_linear_model

model = load_linear_model('my_saved_model') #load from same location

## Building a Neural Model

Linear models are powerful but inherently limited. The Neural-Linear Bayesian Contextual Bandits model, which was named and explored in the 2018 research paper [Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling](https://arxiv.org/pdf/1802.09127.pdf), uses a neural network to give the model a powerful way to map a feature vector to a latent representational feature space. These learned features are used in a standard linear model identical to the one used above.<br><br>
Cbandits lets us deploy the same model with the API as above. In practice, optimizing the model is somewhat complicated; the neural network adds a huge number of hyperparameters to the model. Cbandits uses the default parameters used in the research paper to give users a nice starting point.

In [5]:
model.expected_values(context)

[0.0, 0.0, 4.720375992904131, 0.0, 0.0]