<a href="https://colab.research.google.com/github/Sagaust/DH-Computational-Methodologies/blob/main/AB%20Testing%20on%20a%20Website.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Greenweez Home Page

The home page is very important for a website. It generates a lot of traffic and is the showcase of the site. The traffic optimisation team wants to optimise the homepage. They hesitate between two versions.



### Here are the two versions:

[Variant A](https://drive.google.com/file/d/1LqPXgeOJ8QQ1ZfcO4_Mz26lehmyOkles/view) - Slider with a white design

[Variant B](https://drive.google.com/file/d/1rBydNNlrg5d1AmGXo8-9DsfrbE-tuAox/view) - Static page with a green design

### We need to split the users

Before we can actually run the AB Test, we need to segment our users into two groups. Let's start by importing the user data from the customers tab in [this spreadsheet](https://docs.google.com/spreadsheets/d/1lpyAhs6Yh2WZ-zqKrpfxKN08fZ3PTISvS2ajl3L6Avk/edit#gid=386045473).

In [3]:
# Import the data (also import the necessary packages)
import pandas as pd
customers = pd.read_csv("/content/Greenweez Home Page Results - customers.csv")

In [None]:
# Let's take a look at our dataframe
customers

Unnamed: 0,customers_id,avg_basket
0,9731,202.59
1,61582,22.92
2,305054,32.05
3,305036,30.46
4,10969,87.93
...,...,...
39995,273264,35.46
39996,273371,87.03
39997,70803,50.49
39998,6743,86.19


Let's adopt a naive strategy first - splitting by median customers_id

In [None]:
customers = customers.sort_values(by="customers_id").reset_index(drop = True)
customers1 = customers.iloc[:20000]
customers2 = customers.iloc[20000:]

Did we do a good job? Let's look at the mean avg_basket for both groups

In [None]:
print(customers1["avg_basket"].mean(), customers2["avg_basket"].mean())

76.670484 52.311415999999994


That's quite a difference! Should we try another strategy?
Let's divide the two groups randomly. Check out [this](https://stackoverflow.com/questions/29576430/shuffle-dataframe-rows) StackOverflow thread on how to do that.

In [None]:
customers = customers.sample(frac=1).reset_index(drop=True)
customers1 = customers.iloc[:20000]
customers2 = customers.iloc[20000:]

Let's check the avg_basket again. We should have done a better job!

In [None]:
print(customers1["avg_basket"].mean(), customers2["avg_basket"].mean())

64.656342 64.325558


### The results are in

After 4 weeks, the web developers have gotten back to you with the results of the [test](https://docs.google.com/spreadsheets/d/1lpyAhs6Yh2WZ-zqKrpfxKN08fZ3PTISvS2ajl3L6Avk/edit?usp=sharing). Let's analyse them to see which variant is the best. Take some time to make sense of the different columns in the *4 weeks* table. Then, download the file as CSV and load it in the next cell.

In [None]:
# Load in the CSV of the first day.
results = pd.read_csv("Greenweez Home Page Results - 4 weeks.csv")

In [None]:
# Have a look at your newly created dataframe
results

Unnamed: 0_level_0,Nb sessions,Nb bounces,% bounces,Nb pages,Page / Sessions,Nb transactions,% conversions
AB test group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Slider blank,243210,90310,0.371325,406734,1.672357,16904,0.069504
Static green,243920,92031,0.3773,405872,1.663955,16699,0.068461
Total,487130,182341,0.374317,812606,1.66815,33603,0.068982


In [None]:
# Let's reset the index to the "AB test group" column
results.set_index("AB test group", inplace = True)

In [None]:
# Make sure you know how to access the individual values - try displaying the number of sessions for the blank slider
# Try using the column/index names and not numbers to make the code more readable
results.loc["Slider blank", "Nb sessions"]

243210

### The bounce variable

The first metric we want to analyse is bounce! What kind of test would best suit this metric?

*Answer: Chi-Square test because bounce is a discrete binary variable, a customer either bounces or doesn't!*

Now that we've chosen the appropriate test, you might notice that we're lacking something! The theoretical or expected value. Since neither of these variants have been implemented before and we don't have a baseline, we'll have to create our own. Our hypothesis is that the Bounce rate is the same for both variants -- equal to the average Bounce rate of 37.40%.

Compute the theoretical number of bounces for both variants using the average bounce rate!

In [None]:
# Compute the theoretical number of bounces for both variants using the average bounce rate!
blank_theoretical_bounce = results.loc['Total', '% bounces'] * results.loc['Slider blank', 'Nb sessions']
green_theoretical_bounce = results.loc['Total', '% bounces'] * results.loc['Static green', 'Nb sessions']

Now that we have all the elements we need, compute the Chi-Square test below, first by hand with the formula (and the table) and then using the [scipy function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html)

In [None]:
## With the formula

chi_square_bounce = (((results.loc['Slider blank', 'Nb bounces'] - blank_theoretical_bounce) ** 2) /  (blank_theoretical_bounce) + \
                          ((results.loc['Static green', 'Nb bounces'] - green_theoretical_bounce) ** 2) /  green_theoretical_bounce)
print(f"Using the formula: {chi_square_bounce}")

## With Scipy

# Import the right modules (also import numpy)
from scipy.stats import chisquare
import numpy as np

# Create arrays for the observed and expected bounce values
f_obs_bounce = np.array([results.loc['Slider blank', 'Nb bounces'], results.loc['Static green', 'Nb bounces']])
f_exp_bounce = np.array([blank_theoretical_bounce, green_theoretical_bounce])

# Calculate chisquare
chi_square_bounce = chisquare(f_obs=f_obs_bounce, f_exp=f_exp_bounce)

What do you make of the results? Can we safely reject the null hypothesis?

*Yes, we can - the p-value is low enough (lower than our 5% threshold)*

### What about the other metrics?

Let's repeat what we just did for the other valid metric: number of transactions made. Again, we need to compute the theoretical values first.

Could we also compute for number of pages visited? Why/why not?

#### Number of transactions made

In [None]:
# Compute the theoretical transactions for both variants using the conversion rate!
blank_theoretical_transactions = results.loc['Slider blank',  'Nb sessions'] * results.loc['Total', '% conversions']
green_theoretical_transactions = results.loc['Static green',  'Nb sessions'] * results.loc['Total', '% conversions']

In [None]:
# Chi-Square with the formula

chi_square_transactions = (((results.loc['Slider blank', 'Nb transactions'] - blank_theoretical_transactions) ** 2) /  (blank_theoretical_transactions) + \
                          ((results.loc['Static green', 'Nb transactions'] - green_theoretical_transactions) ** 2) /  green_theoretical_transactions)
print(f"Using the formula: {chi_square_transactions}")


# Chi-Square with the Scipy function
f_obs_transactions = np.array([results.loc['Slider blank', 'Nb transactions'], results.loc['Static green', 'Nb transactions']])
f_exp_transactions = np.array([blank_theoretical_transactions, green_theoretical_transactions])

chi_square_transactions = chisquare(f_obs=f_obs_transactions, f_exp=f_exp_transactions)
chi_square_transactions

Is the resulting p-value satisfactory?