## The A/B testing problem

In this notebook we will write a Python program for A/B testing.  Writing the program is all about it's structure - the specific **data structures** (such as `list` or `dict`) and how we use them.

The two options we want to A/B test are an old website or new website.  You have the ability to **sample data** for both options, by directing traffic to the old or new website.  

We want to optimize for **bounce rate** - the percentage of visitors to a particular website who navigate away from the site after viewing only one page.  The bounce rate can be thought of as our reward signal or fitness function.

Lets make a **list** called `websites` to hold our two websites as **strings** - `old` and `new`:

In [None]:
websites = ['old', 'new']

We can access them using an **integer index**.  Python uses **zero based indexing**.  Other languages such as R use one based indexing.

In Python the first element is at `0`:

In [None]:
websites[0]

The second element is at `1`:

In [None]:
websites[1]

In the real world data would be generated for us by choosing which website to show to each user and recording the bounce rate.

Here we will build a **simple model** of the world to simulate this data generating process.  What would be a good choice of a model?

The **Central Limit Theorem** tells us that many real world processes look normally distributed.  This suggests that a normal (aka Gaussian) distribution is a good choice.

## Importing objects

Python is a **batteries included** language - meaning we can do alot with the standard library.  We won't need to form a Gaussian kernel ourselves.

We can `import` **objects** from **modules** of the Python standard library.  

Let's `import` `gauss` and `seed` from the `random` module:

In [None]:
from random import gauss, seed

What are these three things we have imported?  We can use the Python **builtin function** `type`:

In [None]:
type(gauss)

When using iPython (Jupyter notebooks all run iPython), we can get even more infomation about objects using the `?` after the object:

In [None]:
gauss?

And even more info with two `??` after the object

In [None]:
gauss??

The `gauss` function take two arguments.  These two arguments are the **statistics** needed to parameterize a Gaussian (aka normal) distribution - the mean (`mu`) and the standard deviation (`sigma`).  

We can use the `gauss` function to sample from a **standard normal** distribution, by setting `mu=0` & `sigma=1`.  

We can do this via **calling** the `gauss` method with **positional** or **keyword** arguments:

In [None]:
gauss(0, sigma=1)

Sampling from this Gaussian is a psuedorandom process.  We can use `seed` to control the generation of random numbers.

We would like to have some confidence that the random seed works as expected.  Here we will write a simple **test** to comfirm that `seed` is working.

Some of the tools that might be useful are `print`:

In [None]:
print(gauss(0, sigma=1))

One of Python's **comparison operators** (such as `==`):

In [None]:
gauss(0, sigma=1) == 0

An `assert` statement:

In [None]:
assert 1.0 == 1

## Exercise

Write some code to **test** `seed`.  You want to check that
- when you reset the seed the random numbers generated are the same
- when you don't reset the seed, they are different

When do you think we should be using random seeds in data science?

Returning to our purpose for using `gauss` - we want to model the data generating process for our two websites.  
That means we need to assume statistics for the two websites.  Let's look again at the data structure for our options:

In [None]:
websites

A better data structure here is a **dictionary**.  Let's make a `dict` where the `key` is the website option name, and the `value` is a list of the statistics (mean & standard deviation) for that website:

In [None]:
websites = {
    'old': [50, 1], 
    'new': [50, 3]
}

We can access our options using a now familar syntax, but with a `key` as the index:

In [None]:
websites['old']

But we can do better here.  One problem we have is that our statistics are **mutable** - our user change the statistics:

In [None]:
websites['old'][0] = 25

Python offers a mutable data structure - the `tuple`.

Let's recreate our website dictionary:

In [None]:
websites = {
    'old': (50, 1),
    'new': (50, 3)
}

One more problem - the user who is using our website dictionary doesn't know what two numbers are!  To fix this we will use a `namedtuple` from (my favourite) module - the `collections` module.

One of the strengths of Python is the quality of the higher level data structures!

Below is an example of using a `namedtuple`:

In [None]:
from collections import namedtuple

Stat = namedtuple('Statistics', ['mu', 'sigma'])

Stat(0, 1)

## Practical

Rewrite our `websites` dictionary to use `namedtuple` for the statistics:

## Turning the Gaussian into a bounce rate

The bounce rate is a percentage, with bounds of $[0, 1]$.  We will need to slightly modify our `gauss` to give us a sensible bounce rate.  

Let's write a **function** called `bounce` that wraps around the `gauss`.  

You will find the `min` and `max` Python builtins useful (bonus points for a **[doc string](https://realpython.com/python-pep8/#documentation-strings)**).

In [None]:
def bounce(mu, sigma):
    value = gauss(mu, sigma)
    return min(max(value, 0), 1)

In [None]:
bounce(0.5, 0.5)

## Optimize the A/B test

Now that we have a data structure for the data generating process, we can sample data from it.  We need a data structure to store data - the `list` is a good choice here.

One very useful Python builtin in is `dir` - primarily as a convenience for use at an interactive prompt.

When we run `dir()` with no arguments, it will return a list of names in the current scope:

In [None]:
dir()

We can use `dir()` on an object as well - it will return a list of **attributes** and **methods** of the object:

In [None]:
dir(list)

One of the attributes above is the `append` method, which we can use to store results from our sampling our websites:

In [None]:
gauss(websites['old'].mu, websites['old'].sigma)

A shortcut for the above is to **explode** the arguments into `gauss` using `*` (this will work with any iterable):

In [None]:
gauss(*websites['old'])

## Back to the lab

Let's run an experiment.  We will sample from both options the same number of times in series.  To do this we will need iterate for a given number of steps.

We can do the iteration with a `for` loop and the Python built-in `range`.  Write a `for` loop that iterates for a given number of steps, and stores the data in two lists.

In [None]:
old, new = [], []
for _ in range(10):
    old.append(gauss(*websites['old']))
    new.append(gauss(*websites['new']))
    
old

But we can do better.  Let's use the `defaultdict` ([blog post](https://adgefficiency.com/defaultdict/))., which is also in the `collections` module. 

Use a `list` as the default value.  What does using a `defaultdict` give us?

In [None]:
from collections import defaultdict

steps = 10
data = defaultdict(list)

for step in range(steps):
    data['old'].append(bounce(*websites['old']))
    data['new'].append(bounce(*websites['new']))

## Evaluation of option performance

One way to analyze these results would be to look at the average reward.  Let's get the mean reward our two options experienced.  

We will use two of the Python bulitins - `sum` and `len`:

In [None]:
def mean(samples):
    return sum(samples) / len(samples)

We use on of the three Python's API's yor string formatting
-`:10.4f` is the format specification
-`f` denotes fixed-point notation
-`10` is the total width of the field being printed, lefted-padded by spaces
-`4` is the number of digits after the decimal point

The other two are
- the older `%` style
- the Python 3.7 f-string

Having multiple ways to do one thing in a programming language is almost never good (see TensorFlow 1.0, matplotlib).

We can access both the name & rewards experienced oy our options as by calling the `items` and iterating over what comes back:

In [None]:
for name, rewards in data.items():
    print('{} {:6.2f}'.format(name, mean(rewards)))

## Optimizing website selection

In the real world we will want to pick the best website more often as we are learning - let's try to do that.  

We will introduce a function that picks a website design based the highest observed average reward (this is known as a greedy policy in reinforcement learning).

First we need a way to do an **argmax** - to get the index of the highest item in the list.  Let's write this function like a software engineer would - using **test driven development** (we 

First we write data to test with:

In [None]:
data = [0, 3, 2]
expected = 1

Now let's write our function.  We can use a combination of the Python builtin `max` and the `index` method of the list (note that you would most likely use `numpy.argmax`).

In [None]:
def argmax(data):
    return data.index(max(data))

assert argmax(data) == expected

For a given dataset, the function above will always select the same website.

The final piece of the puzzle is a way to **explore** - we can do this by randomly selecting a website.  

Write a function to randomly select a website - you will need the `random.random` function and most likely an `if` statement.

## Final exercise

You now have all the tools to run an A/B test.

Let's run a experiment where we:
1. collect data by randomly selecting a website for 6 steps
2. use that data to greedily select a website for another 20 steps
3. calculate the bounce rate across all the websites