In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("gla08.ipynb")

<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Guided Learning Activity 08: A/B Testing

This Guided Learning Activity is designed for you to complete alongside a Data Ambassador from the course. You might find that it feels like a combination of the lectures and lab assignment. Whether you are participating live or watching the recording of the live meeting, let the Data Ambassador guide you through the following tasks. There will be moments for you to reflect and explore your own ideas as a way to solidify concepts and skills introduced by your instructor. Keep in mind that this is not a graded assignment for MATH 108 by default. If you have any concerns about participation, reach out to your instructor.

---

## Learning Objectives

1. Outline the steps of a permutation test.
2. Visually and numerically observe differences in numerical distributions for two groups.
3. Perform a permutation test in a provided business scenario.
4. Reflect on the results of the test to make an informed decision.

---

## Configure the Notebook

Run the following code cell to set up the notebook.

In [None]:
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

---

## Permutation Test

* In MATH 108, you utilize a permutation test to perform an A/B test where you are trying to decide if the distributions of numerical values for two groups are the same or not.
* The term permutation refers to arrangement, so a permutation test considers many random arrangements of the labeled data.

```mermaid
graph TD;
    A["Numerical Data for Two Groups Labeled A and B"]
    A --> B["Null Hypothesis: Assume the Distribution of\nNumerical Values for Both Groups is the Same"]
    B --> C["Calculate Observed Test Statistic (e.g., Difference of Means)"]
    B --> D["Generate a Random Permutation of the Labels"]
    D --> E["Recalculate Test Statistic with Permuted Data"]
    E -->|"Repeat Many Times"| D
    E --> F["Generate Distribution of Test Statistics from Permutations"]
    G{"Is the Observed Test Statistic Consistent with\n the Distribution Based on the Permuted Data?"}
    C --> G
    F --> G
    G -->|"Yes"| H["Fail to Reject the Null Hypothesis"]
    G -->|"No"| I["Reject the Null Hypothesis"]
```

---

## Deciding Between Two Business Strategies

A company has a strategy for engaging with its customers through its website. The strategy includes branding design, social media connection, etc. The growth team at the company has an idea for a new strategy that may lead to increased revenue, increased engagement, etc. The company has employed researchers to help decide whether to adopt the growth team's proposal or continue with the current strategy. 

---

### Experimental Data

The researchers ran two campaigns, a control campaign where they didn't interfere with the current strategy at the company and a test campaign where they modified the website based on the growth team's proposed strategy. The collected various data points every day for 30 days in August (of 2019) and stored the data in [`campaigns.csv`](https://www.kaggle.com/datasets/amirmotefaker/ab-testing-dataset/data). The data variables are described below:

| Variable Name     | Description                                              |
|------------------|----------------------------------------------------------|
| campaign_name    | The name of the campaign (Control, Test)                 |
| date            | Date of the record                                        |
| spend     | Amount spent on the campaign in dollars                    |
| impressions     | Number of impressions the ad received                      |
| reach           | The number of unique impressions received in the ad         |
| website_clicks | Number of website clicks received through the ads          |
| searches       | Number of users who performed searches on the website       |
| view_content | Number of users who viewed content and products on the website |
| add_to_cart     | Number of users who added products to the cart              |
| purchase      | Number of purchases                                         |

Companies do not tend to share this type of data publicly as it might provide their competitors with information they don't want them to have, so that is why there is no information about this company, no details of the website, and the data is relatively old. 

---

### Task 01 📍

Read the data from `campaigns.csv` and assign the results to `campaigns`.

In [None]:
...

In [None]:
grader.check("task_01")

---

### Missing Data

You may have noticed from the preview of `campaigns` that some values are missing. There are multiple ways to handle missing data. One approach is to remove entire rows that contain missing numerical values. However, since we will be comparing mean behavior over the month, another reasonable approach is to replace missing values with the mean for that variable within the same campaign over the entire time period. This later process is a bit tedious to do with the MATH 108 libraries, so we'll do it for you using Pandas's `fillna` function. Run the following code cell to do this.

In [None]:
test = campaigns.where('campaign_name', 'Test Campaign')
control = campaigns.where('campaign_name', 'Control Campaign')
df_control = control.to_df()
cols_to_fill = ["impressions", "reach", "website_clicks", 
                "searches", "add_to_cart", "purchase", 
                "view_content"]

df_control[cols_to_fill] = df_control[cols_to_fill].fillna(df_control[cols_to_fill].mean())

control = Table.from_df(df_control)
campaigns = control.append(test)
campaigns

---

### Task 02 📍🔎

<!-- BEGIN QUESTION -->

Use a bar chart to compare the means of each numerical variable between the control and test campaigns. We recommend setting the `overlay=False` parameter in `barh` to make the comparisons easier to see. What do you notice?

_Type your answer here, replacing this text._

In [None]:
...

<!-- END QUESTION -->

---

### Observing the Differences

* There are visually notable differences in means for almost every variable, but those differences could be just due to chance?
* A statistical inference tests, like an A/B test, considers that random behavior.

---

### Task 03 📍

Define a function `diff_in_means` that takes a table like `campaigns` and a numerical column label (`str`) as inputs. The function should return the difference between the mean of the `'Control Campaign'` values and the mean of the `'Test Campaign'` values for the specified numerical variable.

**Note:** You can assume the table provided as input has the same columns as `campaigns`, and the values of the `'campaign_name'` column are `'Control Campaign'` and `'Test Campaign'`.

In [None]:
def diff_in_means(tbl, numerical_label):
    reduced = ...
    control_values = ...
    test_values = ...
    return ...

# Test the function
diff_in_means(campaigns, 'purchase')

In [None]:
grader.check("task_03")

---

### Task 04 📍

Initially, we created an empty table called `observed_differences` with 8 columns representing the 8 numerical variables in `campaigns`. Using `diff_in_means` and `with_row`, calculate the observed test statistic for all the numerical variables and update `observed_differences` to include these values in the first row of the table.

In [None]:
observed_differences = Table(campaigns.drop('campaign_name', 'date').labels)

observed_diffs_arr = make_array()
for numerical_label in observed_test_stats.labels:
    observed_diff_of_means = ...
    observed_diffs_arr = ...

observed_differences = ...
observed_differences

In [None]:
grader.check("task_04")

---

### Task 05 📍🔎

<!-- BEGIN QUESTION -->

Reflect on the difference of means. Which types of values provide evidence against the null hypothesis? Which values provide support for the growth team's strategy campaign?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---

### Simulate Generating Data

* Now that you have a way to calculate all of the differences in means for the `campaigns` table, you want to try shuffling the campaign labels and re-calculating the differences in means.
* If it is true that there is no difference in the numerical distributions, then shuffling the labels should have no significant impact on the differences.
* This is what it means to simulate generating data under the null hypothesis.

---

### Task 06 📍

Write a function `permute_and_calculate` that takes a table like `campaigns` as input, randomly shuffles `'campaign_name'` labels, and returns a table with 8 columns (one for each numerical variable in `campaigns` and 1 row containing the difference in means for each of the numerical variables. 

**Note:** You can assume the table provided as input has the same columns as `campaigns`.

In [None]:
def permute_and_calculate(tbl):
    simulated_differences = Table(tbl.drop('campaign_name', 'date').labels)
    simulated_diffs_arr = make_array()
    
    shuffled_labels = ...
    shuffled_tbl = ...
    
    for numerical_label in simulated_differences.labels:
        simulated_diff_of_means = ...
        simulated_diffs_arr = ...
    
    simulated_differences = ...
    return simulated_differences

# Test the function
permute_and_calculate(campaigns)

In [None]:
grader.check("task_06")

---

### Task 07 📍

Run the `permute_and_calculate` function 10,000 times. For each iteration, use `with_row` to add the results from `permute_and_calculate` to the empty table we've provided called `simulated_differences`.

In [None]:
simulated_differences = Table(campaigns.drop('campaign_name', 'date').labels)

for ...:
    simulated_differences = ...

simulated_differences

In [None]:
grader.check("task_07")

---

### Weighing the Evidence

* With the experimental data and distributions for the test statistics based on each numerical value, you can now weigh the significance of the differences that you observed earlier.
* A significant difference is one where the observed value is equal to or more extreme than 95% of the simulated statistics. (This assumes a p-value cutoff of 5%.)

---

### Task 08 📍🔎

<!-- BEGIN QUESTION -->

Complete the following code to go through each numerical variable and:
* Check to see if the experimental data favors the `'Test'`, `'Control'`, or `'Neither'`
* Calculate the p-value
* Determine if the test results are significant (`True`) or not (`False`)
* Generate a histogram of the distribution of simulated test statistics with the observed statistic included

In [None]:
summary_tbl = Table(['Variable', 'Favored Campaign', 'p-Value', 'Significant Results'])

for numerical_variable in simulated_differences.labels:
    observed_difference = observed_differences.column(numerical_variable).item(0)

    # Results Favor ...
    if ...:
        favored_campaign = 'Test'
    elif ...:
        favored_campaign = 'Control'
    else:
        favored_campaign = 'Neither'

    # p-Value and Statistical Significance
    p_value = ...
    
    if ...:
        significant = True
    else:
        significant = False

    # Add information to the table
    summary_tbl = summary_tbl.with_row([numerical_variable, favored_campaign, p_value, significant])
    
    # Generate histograms
    simulated_differences.select(numerical_variable).hist()
    # Add a point for the observed value of the test statistic (observed_average)
    max_y = (simulated_differences.select(numerical_variable)
             .bin(density=True).sort(1, True).column(1).item(0))
    plt.ylim(-0.1*max_y, max_y)
    plt.scatter(observed_difference, 0, color='red', s=30, label='Observed Test Statistic');
    plt.title('Distribution of Test Statistics')
    plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
    plt.show()
    
summary_tbl

<!-- END QUESTION -->

---

### Making a Decision

It is important to remember that statistical tests do not provide absolute truth. Instead, they help us evaluate the evidence against a given belief to inform decision-making. Analysis teams track the results of these tests, and they (or others within the organization) use that information to guide their decisions.

---

### Task 09 📍🔎

<!-- BEGIN QUESTION -->

For these 8 variables, it is likely that every one that completes this notebook will obtain different p-values and slightly different overall results. Use your results to reflect on how you might make a decision about whether or not to adopt the growth teams strategy.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

---

## Reflection

In this activity, you reflected on performing an A/B testing through a permutation test. Additionally, you applied the permutation test to a business scenario and experimental data in order to make a business decision.

---

## License

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a>.

<img src="./by-nc-sa.png" width=100px>