An implementation of the multi-armed bandit optimization pattern as a Flask extension
Python HTML CSS Makefile
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Join the chat at

Flask-MAB is an implementation of multi-armed bandit test pattern as a flask middleware.

It can be used to test the effectiveness of virtually any parts of your app using user signals.

If you can pass it, we can test it!

Note for users of pre-release version: The API has [changed]( significantly with 1.0 to better fit with the [application factory pattern](

Complete Documentation.

Multi-armed what?!

A multi-armed bandit is essentially an online alternative to classical A/B testing. Whereas A/B testing is generally split into extended phases of execution and analysis, Bandit algorithms continually adjust to user feedback and optimize between experimental states. Bandits typically require very little curation and can in fact be left running indefinitely if need be.

The curious sounding name is drawn from the "one-armed bandit", an colloquialism for casino slot machines. Bandit algorithms can be thought of along similar lines as a eager slot player: if one were to play many slot machines continously over many thousands of attempts, one would eventually be able to determine which machines were hotter than others. A multi-armed bandit is merely an algorithm that performs exactly this determination, using your user's interaction as its "arm pulls". Extracting winning patterns becomes a fluid part of interacting with the application.

While bandit algorithms can provide excellent automated optimization, it's important to note that they are not considered a replacement for classic A/B tests. Bandits could be considered a sort of "black box," in the sense that their intuitions become opaque as they optimize. Experiments that call for rigorous tests of statistical significance may be better suited to more traditional frameworks.

John Myles White has an awesome treatise on Bandit implementations in his book Bandit Algorithms for Website Optimization. Most of the code in this library consistes of his excellent guidelines reimplemented to suit the nature of the Flask request lifecycle.

Getting Started

To get started defining experiments, there several steps:

  1. Determine what parts of your app you'd like to optimize
  2. Setup a storage engine (currently only json, though mongo+zodb are in the roadmap)
  3. Instantiate Bandits for all your experiments (you can have as many as you like, several experiments can run at once in a single app.)
  4. Assign arms to your bandits that represent your experimental states
  5. Attach the BanditMiddleware to your Flask app.

This guide will take you through each step. The example case we'll be working with is included in the source under the 'example' folder if you'd like to try running the finished product.

Determining what to test

The first task at hand requires a little planning. What are some of the things in your app you've always been curious about changing, but never had empirical data to back up potential modifications? Bandits are best suited to cases where changes can be "slipped in" without the user noticing, but since the state assigned to a user will be persisted to their client, you can also change things like UI.

For our example case, we'll be changing the label text and color of a button in our app to see if either change increases user interaction with the feature. We'll be representing these states as two separate experiements (so a user will get separate assignments for color and text) but you could conceviably make them one experiment by utilizing a tuple or sequence. More on that later!

Setting up your storage backend

HTTP itself is stateless, but bandits need to persist their increments between requests. In order to accomplish this, there is a bandit storage interface that can be implemented to save all the experiments for an application down to memory, database, etc.

At present, the only core implementation of this interface saves the bandits down to a JSON file at the path you specify, but this should work for most purposes. For 1.0 release, implementations using MongoDB and ZODB are planned.

Storage engines are attached using flask configuration directives.

Let's start setting up our bandit file storage:

app.config['MAB_STORAGE_ENGINE'] = 'JSONBanditStorage'
app.config['MAB_STORAGE_OPTS'] = ('./example/bandit_storage.json',)

This storage instance will be passed into our bandit middleware and all values that need to be persisted will be handled under the hood.

The storage opts are just arguments to be passed to the storage instance constructor (in this case, just the path to a flat file to store the information.)

Create bandits and assigning arms

The next step is to create a bandit for each experiment we want to test.

There are several different bandit implemenations included, but for the purposes of this example we'll be using an bandits.EpsilonGreedyBandit, an algorithm which aggressively assigns the present winner according to a fixed constant value, epsilon

Expanding upon our previous example, here are our bandits alongside our storage engine:

from import JSONBanditStorage
from flask.ext.mab.bandits import EpsilonGreedyBandit

color_bandit = EpsilonGreedyBandit(0.2)

txt_bandit = EpsilonGreedyBandit(0.5)
txt_bandit.add_arm("casual","Hey dude, wanna buy me?")
txt_bandit.add_arm("neutral","Add to cart")
txt_bandit.add_arm("formal","Good day sir... care to purchase?")

Here we have two bandits, one of which will randomize %20 of the time on the color of the button, the other %50 of the time on the text. The colors and test blurbs are considered our "arms" in the bandit parlance. An epsilon greedy bandit splits states between random selection and deterministically selecting the "winner", so as users click more, thereby sending reward signals, one combination of these two states will start to win out.

This code could easily be refactored using a function or generator, but for now, we'll include the full boilerplate. If you have a lot of experiments, consider defining a function to be more convenient.

Attaching the middleware

The main BanditMiddleware is where all the magic happens. Attaching it to our app, assigning it some bandits, and sending it pull and reward signals is all that's necessary to get the test going.

Expanding on our example, we'll define a simple flask app with some basic routes for rendering the interface. These routes will also understand how to reward the right arms and update the bandits so the state of the experiment starts adjusting in realtime.

Again, boilerplate here could be easily cut down, but here is a rough example:

from flask import Flask,render_template
from flask.ext.mab import BanditMiddleware

app = Flask('test_app')
mab = BanditMiddleware()
app.add_bandit('color_btn',color_bandit) #our bandits from previous code block

def home():
    """Render the btn"""
    return render_template("ui.html")

def home():
    """Button was clicked!"""
    return render_template("btnclick.html")

Now our app understands that it should be tracking two experiments and persisting their values to a file. "Arms" that get selected for every user will be persisted to cookies. However, we still need to make the system understand what endpoints use which experiments. In our example case, the "/" route is going to render the button, and so both states will need to be assigned there. The "/btnclick" endpoint, alternatively, is where our reward is determined, the theoretical "payoff" that state won us. In this case, its a boolean, assigning a 1 if the button gets clicked. So how are these two signals sent to the middleware? There are decorators much like the route decorator that easily registers these actions.

Using the decorators

Setting up the MAB feedback cycle is easily negotiated by endpoint:

def home(color_btn, txt_btn):
    """Render the btn using values from the bandit"""
    return render_template("ui.html",btn_color=color_btn,btn_text=txt_btn)

def reward():
    """Button was clicked!"""
    return render_template("btnclick.html")

Using these decorators, our middleware knows that the it should suggest some values for both our experiments at the root endpoint. When decorating with choose_arm, we identify the bandit/experiment we need a value assignment for. Just like parameters from your route these values are passed into the view function in the order you decorated for them, always after your route params

It should be stressed that things like colors are probably best stored in CSS, but for this example we'll pass the values right into jinja. You could consider setting up a dedicated endpoint for experiments with static styles like this, one that could parse and render your CSS. The rough idea here is to leave what the bandit actually affects up to you.

On the other side of the process, our "/btnclick" endpoint now knows that whatever "arms" assigned to this user worked out well, because the user clicked it. The BanditMiddleware.reward_endpt decorator knows to look in our user's cookie for the values that were assigned to her and give them some props. We're using booleans here, but you could pass any amount of reward in the event that some states in your experiment are better than others (you could for example weight your experiments differently.)

That's it! This user's feedback will be persisted by the middleware and used to adjust the content for future users. Over time, this pattern will start converging to a winner. Your app will get optimized on these two experimental features for free!