# Lab 7: Great British Bake Off (A/B Test)

Welcome to Lab 7! This week's lab will focus on A/B Testing using data from the ever-popular British television show, [*The Great British Bakeoff*](https://en.wikipedia.org/wiki/The_Great_British_Bake_Off).

First, set up the notebook by running the cell below.

In [1]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', (FutureWarning, np.VisibleDeprecationWarning))

import d8error

## 1. A/B Testing

A/B testing is a form of hypothesis testing that allows you to make comparisons between two distributions. We may also refer to an A/B test as a permutation test.

You'll almost never be explicitly asked to perform an A/B test. Make sure you can identify situations where the test is appropriate and know how to correctly implement each step. Oftentimes, we use an A/B test to determine whether or not two samples came from the same underlying distribution.

**Question 1.1.** The following statements are the steps of an A/B hypothesis test presented in a *random order*:

1. Choose a test statistic (typically the difference in means between two categories)

2. Shuffle the labels of the original sample, find your simulated test statistic, and repeat many times

3. Find the value of the observed test statistic

4. Calculate the p-value based off your observed and simulated test statistics

5. Define a null and alternate model

6. Use the p-value and p-value cutoff to draw a conclusion about the null hypothesis

Assign `ab_test_order` to an array of integers that contains the correct order of an A/B test, where the first item of the array is the first step of an A/B test and the last item of the array is the last step of an A/B test.


In [None]:
ab_test_order = make_array(...)

**Question 1.2.** If the null hypothesis of an A/B test is correct, should the order of labels affect the differences in means between each group? Why do we shuffle labels in an A/B test? 


_Type your answer here, replacing this text._

## 2. The Great British Bake Off

>"The Great British Bake Off (often abbreviated to Bake Off or GBBO) is a British television baking competition, produced by Love Productions, in which a group of amateur bakers compete against each other in a series of rounds, attempting to impress a group of judges with their baking skills" [Wikipedia](https://en.wikipedia.org/wiki/The_Great_British_Bake_Off)

For every week of the competition, the judges assign one contestant the title "Star Baker". Ultimately, one overall winner is crowned every season. Using this information, we would like to investigate whether there is an association between the number of Star Baker awards a contestant accumulates and whether or not they are the overall winner for the season.

The `bakers` table below describes the number of Star Baker awards each contestant won and whether or not they won their season (`1` if they won, `0` if they did not win). The data was manually aggregated from Wikipedia for seasons 2-11 of the show. We randomized the order of rows so as to not spoil the outcome of the show for anyone. Each row is the data for one contestant.

In [None]:
bakers = Table.read_table("star_bakers.csv")
bakers.show(3)

**Question 2.1.** Is it possible to use the `bakers` data to determine if winning more Star Baker awards *causes* an increase in the likelihood that a contestant will be the overall winner for that season? Explain.


_Type your answer here, replacing this text._

### Running an Experiment

We are going to run the following hypothesis test to examine a possible association between winning the season and number of Star Baker awards earned. The population we are examining is every contestant from seasons 2 through 11 of GBBO. We are going to use the following null and alternative hypotheses:

**Null hypothesis:** There is no association between earning Star Baker awards and winning the season.

**Alternative hypothesis:** Contestants who win more Star Baker awards are more likely to win the season than contestants with fewer Star Baker awards.

Our alternative hypothesis is related to our suspicion that contestants who win more Star Baker awards are more skilled, so they are more likely to win the season.

**Question 2.2.** Should we use an A/B test to test these hypotheses? If yes, what is our "A" group and what is our "B" group?


_Type your answer here, replacing this text._

**Question 2.3.** Create a new table called `means` that contains the mean number of star baker awards for bakers who did not win (`'won'` is 0) and bakers that did win (`'won'` is 1). The table should have the column names `won` and `star baker awards mean`.

In [None]:
means = ...
means

**Question 2.4.** Draw overlaid histograms to visualize the distribution of Star Baker awards for winners and non-winners in the `bakers` data. Use the bins provided (`useful_bins`).

Hint: Use the `group` keyword argument of `tbl.hist`. In order to produce overlaid histograms based on unique values in a given column, we can use syntax such as: `tbl.hist(..., group=<col_name>, bins=...)`


In [None]:
useful_bins = np.arange(-0.5, 6.6)
...

**Question 2.5.** We want to figure out if there is a signficant difference between the distribution of Star Baker awards for winners and non winners. Are higher numbers of Star Baker awards associated with a higher chance of winning the season?

  - (a) What should the test statistic be? 
  - (b) Which values of this test statistic support the null? 
  - (c) Which values support the alternative?

Hint: You should think about what measures we use to describe a distribution. What did we do in Lecture 19 to see if maternal smoking is associated with lower birthweight babies?


_Type your answer here, replacing this text._

**Question 2.6.** Set `observed_difference` to the observed test statistic; make a calculation using the `means` table. 


In [None]:
observed_difference = ...
observed_difference

**Question 2.7.** Given a table like `bakers`, a label column `labels_col`, and a numerical values column `values_col`, write a function that calculates the appropriate test statistic.

*Hint:* Make sure that you are taking the directionality of our alternative hypothesis into account in choosing an appropriate test statistic.


In [None]:
def find_test_stat(tbl, labels_col, values_col):
    ...
    
# CHECK: This function call should show the same result as the previous cell, but 
# find_test_stat should only use its parameter names, not `bakers` or either of the 
# `bakers` column names
find_test_stat(bakers, "won", "star baker awards")

When we run a simulation for A/B testing, we resample by **shuffling the labels** of the original sample. If the null hypothesis is true and there's no association between winning more Star Baker awards and winning the whole season, we expect that the observed difference in means (the value we recently computed for `observed_difference`) will be compatible with the empirical distribution of the simulated difference in means for many shufflings.

**Question 2.8.** Write a function `simulated_test_statistic` to compute one trial of our A/B test. Your function should run a simulation and return a test statistic. Think carefully, there are several steps here. 

In [None]:
def simulated_test_statistic(tbl, labels_col, values_col):
   ...

# When you run this cell multiple times, you should see a different result each time:
# sometimes positive, sometimes negative, never very large
simulated_test_statistic(bakers, "won", "star baker awards")

**Question 2.9.** Simulate 5000 trials of our A/B test and store the test statistics in an array called `differences`.


In [None]:
repetitions = 5000
differences = make_array()

...
                                                 
differences

Run the cell below to view a histogram of your simulated test statistics plotted with your observed test statistic.

In [None]:
Table().with_column('Difference Between Group Means', differences).hist(bins=20)
plots.scatter(observed_difference, 0, color='red', s=30, zorder=2)
plots.ylim(-0.1, 1.35);

**Question 2.10.** Use Python and the `differences` array to compute the p-value for your test and assign it to `empirical_p`.


In [None]:
empirical_p = ...
empirical_p

**Question 2.11.** Using a 5% P-value cutoff for determining statistical significance, draw a conclusion about the null and alternative hypotheses. Describe your findings using simple, non-technical language. What does your analysis tell you about the association between star baker awards and winning? What can you claim about causation based on your statistical analysis?


_Type your answer here, replacing this text._

## 3. Submission

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSw1FZDpy06LGwP8VC9xM1eK3aQj_bXYp0r5w&usqp=CAU">

Congratulations, you've finished Lab 07!

Time to submit your work:

  1. From the `File` menu select **Save and Export Notebook As HTML** to download an HTML version of your work.
  2. From the `File` menu select **Download** to download the code for your notebook.
  3. Submit your work to the lab assignment on Moodle:
      - Open the lab assignment activity.
      - Upload the code file (file extension .ipynb) 
      - Upload the HTML file (file extension .html)
      - Submit!
