# Bandit Persistence Recipes

In production, it is often necessary to persist data to disk. This notebook demonstrates how Bandit subclasses can be persisted to disk, reloaded, and even redefined on the fly.

First, let's create a simple subclass of `Bandit` that will be trained a little, then persisted to disk.

In [1]:
from bayesianbandits import Bandit, Arm, epsilon_greedy, GammaRegressor

def reward_func(x):
    return x

est = GammaRegressor(1, 1)
policy = epsilon_greedy()

class Agent(Bandit, learner=est, policy=policy):
    arm1 = Arm("Action 1", reward_func)
    arm2 = Arm("Action 2", reward_func)

agent = Agent(rng=1)

First, we'll pull the arm once, update, and then persist the bandit to disk.

In [2]:
agent.pull()
agent.update(1)

print(f"Learned alpha and beta for arm 1: {agent.arm1.learner.coef_[1]}")

Learned alpha and beta for arm 1: [2. 2.]


`joblib` is a great library for persisting objects to disk. It is a dependency of `scikit-learn`, so it is already installed when installing `bayesianbandits`.

As we can see, the learned state of the bandit is persisted to disk. We can reload the bandit from disk, and it will be in the same state as before.

In [3]:
import joblib

joblib.dump(agent, "agent.pkl")

loaded = joblib.load("agent.pkl")

print(f"Learned alpha and beta for arm 1: {loaded.arm1.learner.coef_[1]}")

Learned alpha and beta for arm 1: [2. 2.]


After being reloaded, the bandit can be used as normal.

In [4]:
loaded.pull()
loaded.update(0)

print(f"Learned alpha and beta for arm 1: {loaded.arm1.learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded.arm2.learner.coef_[1]}")

joblib.dump(loaded, "agent.pkl")

Learned alpha and beta for arm 1: [2. 2.]
Learned alpha and beta for arm 2: [1. 2.]


['agent.pkl']

After your learning session has gone on for some time, you may get an idea for a new arm. You want to try it out, but you don't want to lose the state of the bandit you've already learned. Fortunately, you can just redefine the `Bandit` subclass definition and reload the bandit from disk. Any arms in the new definition will be initialized when the bandit is reloaded.

Note that the learned state of arm 1 is preserved. 

In [5]:
def action3():
    print("action3")

class Agent(Bandit, learner=est, policy=policy):
    arm1 = Arm("Action 1", reward_func)
    arm2 = Arm("Action 2", reward_func)
    arm3 = Arm("Action 3", reward_func)

loaded_with_new_def = joblib.load("agent.pkl")

print(f"Learned alpha and beta for arm 1: {loaded_with_new_def.arm1.learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded_with_new_def.arm2.learner.coef_[1]}")

print(f"Arms: {loaded_with_new_def.arms.keys()}")

Learned alpha and beta for arm 1: [2. 2.]
Learned alpha and beta for arm 2: [1. 2.]
Arms: dict_keys(['arm1', 'arm2', 'arm3'])


Again, the bandit can be used as normal.

In [6]:
loaded_with_new_def.pull()
loaded_with_new_def.update(0)

print(f"Learned alpha and beta for arm 1: {loaded_with_new_def.arm1.learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded_with_new_def.arm2.learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded_with_new_def.arm3.learner.coef_[1]}")

joblib.dump(loaded_with_new_def, "agent.pkl")

Learned alpha and beta for arm 1: [2. 3.]
Learned alpha and beta for arm 2: [1. 2.]
Learned alpha and beta for arm 2: [1. 1.]


['agent.pkl']

Now, you may decide that `arm2` is not a good arm, and you want to remove it from the bandit. You can do this by redefining the `Bandit` subclass definition and reloading the bandit from disk. Any arms in the `Bandit` instance that are not in the new definition will be removed when the bandit is reloaded. 

Note that this is a destructive operation upon re-serialization, and the learned state of arm 1 is lost forever!

In [7]:
class Agent(Bandit, learner=est, policy=policy):
    arm1 = Arm("Action 1", reward_func)
    arm3 = Arm("Action 3", reward_func)

loaded_with_removed_arm = joblib.load("agent.pkl")

print(f"Arms: {loaded_with_new_def.arms.keys()}")

print(f"Learned alpha and beta for arm 1: {loaded_with_removed_arm.arm1.learner.coef_[1]}")
print(f"Learned alpha and beta for arm 3: {loaded_with_removed_arm.arm3.learner.coef_[1]}")



Arms: dict_keys(['arm1', 'arm2', 'arm3'])
Learned alpha and beta for arm 1: [2. 3.]
Learned alpha and beta for arm 3: [1. 1.]
