# Applied Statistical Tests

In the previous chapter we introduced descriptive statistics and hypothesis testing.  In this chapter we will make extensive use of the concepts covered to introduce you to:

* A/B testing
* Multivariate testing

## A/B Testing

The foundational idea behind A/B testing is, what if we changed just one thing, would that matter?  More specifically, what if we presented two variants of one thing to a group of people, would there be a difference?

Let's start with an example.

### Inquiry

Suppose you work for a company that makes towels, and they want to send an email, like they've done many times in the past, to see how much this email affects sale of their towels.  Now let's say that, in the past 10 percent of the people sent the email bought a towel.  But, we aren't sure if this email maximizes profits.  In other words, would a different email mean we make more profit?  

### Set Up

In order to test this question, we can set up an experiment.  Here we will set up a randomized test group and a randomized control group.  

The test group will be sent an email, with slightly different copy, or possibly with a picture.  Some specific change will be made, in any event.

The control group will get the same email as last time.  This way, we can directly compare, as much as possible between the old email and the new one.  There are many things you typically need to control for, or account for in experimental design.  Some things to account for in this scenario are:

1) Age

2) Gender

3) Location

4) Time of Day

5) Time of Year

6) Approximate Disposable Income

By setting up the experiment with a test and control group, we account for time of day and time of year directly.  Some of the other information like location, age, gender, and approximate disposable income; we may not be able to account for.  Some of this may be accounted by selecting randomly from our population.  But there is no guarantee of that.  It is best to account for as many variables of interest as you can, while remaining ethical.  We will cover the ethics of variable accounting in the next section in detail.

### Running Our Example

In order to evaluate our set up, we will assume that our conversion rate follows a Bernoulli random variable.

### Bernoulli Random Variables

Informally, we can think of a Bernoulli random variable as a model for the set of possible outcomes of any single experiment.

Specifically, a Bernoulli random variable takes on value '1' with probably `p`, and value '0' with probability `1-p`.  

For our case, a value of '1' represents a person who chose to buy a towel by going to the website after reading our email.  The value of '0' represents no towel being bought after reading our email.

### An Aside Regarding Assumptions

By setting up our experiment in this way, we have actually assumed quiet a bit.  Let's formally state our assumptions here.  It is good practice, generally speaking to always list out your assumptions and confirm with a domain experiment, someone who knows the problem, but maybe not the statistics, before caring out any experiments.  If you work in a place where tasks are created for you, then before you get started, it's a good idea to confirm with the person who created the task, to make sure you are both aligned on assumptions.

### Making Our Assumptions Explicit

1) We assume here that the only reason a person would buy a towel is because of this email.

This is a pretty strong assumption.  Specifically we are only looking at the marginal effect of changing the language in an email.  It could be the case that our entire control group or our entire test group happened to need a new towel, and this email reminded them.  It could be all of the people in group one or group two just moved, and didn't have _any_ towels.  Or it could be that everyone in a group always buys things due to a sale.  

We could control for some of this stuff, which will be touched upon in a later section.  But for now, we aren't controlling for any of it.  

2) We assume non-uniformity of preferences.

A preference is a propensity or likelihood to consume a product, given the choice to consume it and the ability.  So if you are more likely to consume chocolate ice cream over vanilla ice cream given the choice between the two, and the money to buy either, then we say you have a preference for chocolate ice cream.

By assuming non-uniformity of preferences, we are saying, implicitly that there is some utility function for towels for each individual.  And that at price `towel_price`, you will not buy, because it would cost more than the utility you would derive.  But at price `towel_sale_price`, you will buy, because it would cost less than the utility you would derive.

3) We assume evenly distributed life circumstances.

We assume everyone on this email is capable of buying a towel from us.  This may seem like a riddiculous assumption, but what it's really not.  You have no idea what circumstance someone is in.

4) We assume evenly distributed access to the email

What if everyone who didn't buy a towel deactivated their email, or died, or doesn't speak english, or possibly, that their email client filtered out the email.  If we don't know for sure, we don't know the effectiveness of the email campaign.

The reason it's important to explicitly state your assumptions, is not necessarily for the code itself.  Although it can inform the tests you run or the models you create.  The most important reason to write down your assumptions is for future people working with your code, which is more often than not yourself, as well as for your managers.  

## Simulating Some Data

In [14]:
from scipy import stats

test_size = 1000
control_size = 1000
test_probability = 0.15
control_probability = 0.1
test_dist = stats.bernoulli(test_probability)
control_dist = stats.bernoulli(control_probability)

Note: we don't actually know the test probability, so "pretend" you can't see it.  We wouldn't know this for the population in the real world.  Also, we probably can't be "sure" our process follows a Bernoulli random variable.  It could be some other generative process, like this:

In [21]:
import random

def generative_process():
    random_number = random.randint(0, 100)
    if random_number < 15:
        return 1
    elif random_number % 2 == 0:
        return 0
    else:
        return 1
    
[generative_process() for _ in range(10)]

[0, 0, 1, 0, 1, 1, 1, 1, 1, 0]

The important difference here is there are multiple cases by which our generative process outputs a '1', which we may be important!  The more you know about your underlying data generation process, the better.  But usually it is impossible to discern this.  The important thing to note is, this is an assumption.

In [16]:
import pandas as pd
import numpy as np

test_df = pd.DataFrame()
test_df["converted"] = test_dist.rvs(size=test_size)
test_df["group"] = "test"
control_df = pd.DataFrame()
control_df["converted"] = control_dist.rvs(size=control_size)
control_df["group"] = "control"
df = test_df.append(control_df)

In [18]:
summary = df.pivot_table(values='converted', index='group', aggfunc=np.sum)
# add additional columns to the pivot table
summary['total'] = df.pivot_table(values='converted', index='group', aggfunc=lambda x: len(x))
summary['rate'] = df.pivot_table(values='converted', index='group')
summary

Unnamed: 0_level_0,converted,total,rate
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
control,92,1000,0.092
test,132,1000,0.132


We may believe our test was successful at this point, and in fact, we are done.  However, there is still much to do!  

