# AB - testing

## Introduction

We have a website that privdes machine learning content in a blog-like format. Recently you saw an article claiming that similar websites coud improve their engagement by simply using a specific color palette for the background. Since this change seems pretty easy to implement we decide to run an AB test to see if this change does in fact drive your users to stay more time in your website.

### Metric to evaluate

Here we decide to evaluate the average session duration, which measures how much time on average your users are spending on your website. This metric currently has a value of 30.87 minutes.

We decide to run the test for 20 days randomly splitting the users in two segments: **Control** and **Variation**

-> **Control**: Users that will keep seeing ther original website.

-> **Variation**: Users that will see the website with the new background colors.


### Loading data

In [4]:
import pandas as pd
data = pd.read_csv("background_color_experiment.csv")
data.head(5)

Unnamed: 0,user_id,user_type,session_duration
0,BM3C0BJ7CS,variation,15.528769
1,MJWN6XNH6L,variation,32.28759
2,46ZPHHABLS,variation,43.718217
3,OHA298DHUG,variation,49.519702
4,AKJ77X6F4A,control,61.709028


#### Dataset size

In [8]:
print(f'Dataset size -> {len(data)}')

Dataset size -> 4186


The dataset has the user_id, user_type and session_duration. Then we will separate the dataset in two parts one with the control data, and other with variation.

In [9]:
control_sd_data = data[data["user_type"] == "control"]["session_duration"]
variation_sd_data = data[data["user_type"] == "variation"]["session_duration"]

print(f'Control group size -> {len(control_sd_data)}')
print(f'Variation group size -> {len(variation_sd_data)}')

Control group size -> 2069
Variation group size -> 2117


## Theory

The goal of this experiment is to measure if changing the website's background color leads to an increase of the time visitors spend on it. Rewriting this as hypothesis test, the null hypothesis is that the change did not affect the time a visitor spend.

- $\mu_c$ is the average time a user **in the control group** spend in the website. Recall that the **control group** is the group accessing the website without the change in the background color.
- $\mu_v$ is the average time a user **in the variation groups** spend in the website. Recall that the **variation group** is the groups accessing the website **with the updated background color**.

With these quantities in hand, the next steps are to compute:

- $d$, the degrees of freedom of the $t$-student distribution, $t_d$.
- The $t$-value, which it will be called $t$.
- The $p$ value for the distribution $t_d$ for the $t$-value, i.e., the value  $p = P(t_d > t | H_0)$.

Finally, for a given significance level $\alpha$, you will be able to decide if you reject or not $H_0$, depending on wether $p \leq \alpha$ or not.

Let's get your hands into work now! 

In [10]:
# X_c stores the session tome for the control group and X_v, for the variation group. 
X_c = control_sd_data.to_numpy()
X_v = variation_sd_data.to_numpy()

In [11]:
print(f"The first 10 entries for X_c are:\n{X_c[:20]}\n")
print(f"The first 10 entries for X_v are:\n{X_v[:20]}\n")

The first 10 entries for X_c are:
[ 61.70902753  25.21946052  26.2404824   58.7480264  137.03680289
  19.92148102  18.8252202   75.25179496  38.27213776  29.17104128
  15.69643672  37.83860271  30.06843075  21.00318148  86.19711927
  46.96997965  46.47776713  14.83464105  17.70441365  26.44693676]

The first 10 entries for X_v are:
[15.52876878 32.28759003 43.7182168  49.51970242 71.77928343 23.29183517
 20.78024375 36.44129464 48.75034676 16.5952978  44.49566616 26.67006134
 34.43667579 20.72109411 19.60185277 41.74218978 19.74485294 32.62018094
 44.99513901 70.8916231 ]



### Compute the size, mean and sample standard deviation

In [12]:
def get_stats(X):
    n_size = len(X)
    mean = X.mean()
    std = X.std(ddof=1) # Use N-1 instead of N in the denominator
    return (n_size, mean, std)

In [13]:
n_c, x_c, s_c = get_stats(X_c)
n_v, x_v, s_v = get_stats(X_v)

In [14]:
print(f"For X_c:\n\tn_c = {n_c}, x_c = {x_c:.2f}, s_c = {s_c:.2f} ")
print(f"For X_v:\n\tn_v = {n_v}, x_v = {x_v:.2f}, s_v = {s_v:.2f} ")

For X_c:
	n_c = 2069, x_c = 32.92, s_c = 17.54 
For X_v:
	n_v = 2117, x_v = 33.83, s_v = 18.24 


### Computing the function to compute the degrees of freedom for the t-student distribution.

It is given by the following formula:

$$d = \frac{\left[\frac{s_{c}^2}{n_c} + \frac{s_{v}^2}{n_v} \right]^2}{\frac{(s_{c}^2/n_c)^2}{n_c-1} + \frac{(s_{v}^2/n_v)^2}{n_v-1}}$$

In [15]:
def degrees_of_freedom(n_v, s_v, n_c, s_c):
    s_v_squared = s_v ** 2
    s_c_squared = s_c ** 2
    s_v_n_v = s_v_squared / n_v
    s_c_n_c = s_c_squared / n_c
    numerator = (s_v_n_v + s_c_n_c) ** 2
    denominator = (s_v_n_v ** 2) / (n_v - 1) + (s_c_n_c ** 2) / (n_c - 1)
    dof = numerator / denominator
    
    return dof

In [16]:
d = degrees_of_freedom(n_v, s_v, n_c, s_c)
print(f"The degrees of freedom for the t-student in this scenario is: {d:.2f}")

The degrees of freedom for the t-student in this scenario is: 4182.97


## Computing the t-value

The next step is to calculate the t-value, given by the following equation in this case

$$t = \frac{\left( \overline{X}_v - \overline{X}_c \right)}{\sqrt{\left(\frac{s_v}{\sqrt{n_v}}\right)^2 + \left(\frac{s_c}{\sqrt{n_c}}\right)^2}} = \frac{\left( \overline{X}_v - \overline{X}_c \right)}{\sqrt{\frac{s_v^2}{n_v} + \frac{s_c^2}{n_c}}}$$

In [18]:
def t_value(n_v, x_v, s_v, n_c, x_c, s_c):
    s_v_squared = s_v ** 2
    s_c_squared = s_c ** 2
    s_v_n_v = s_v_squared / n_v
    s_c_n_c = s_c_squared / n_c
    numerator = x_v - x_c
    denominator = (s_v_n_v + s_c_n_c) ** 0.5
    t = numerator / denominator
    return t

In [19]:
t = t_value(n_v, x_v, s_v, n_c, x_c, s_c)
print(f"The t-value for this experiment is: {t:.2f}")

The t-value for this experiment is: 1.64


### Calculating the p-value for a significance level $\alpha$

Here we will execute a right-tailed test, because we are investigating wether the background color change increases the time spent by users in your website or not.

In this experiment the p-value for a significance level of $\alpha$ is given by:

$$p = P(t_d > t) = 1 - \text{CDF}_{t_d}(t)$$

In [20]:
from scipy import stats
def p_value(d, t_value):
    t_d = stats.t(df=d) # Get the t-student distribution with d degrees of freedom
    cdf = t_d.cdf(t_value) # Get the cumulative distribution function for the t-value
    p = 1 - cdf
    return p


In [21]:
print(f"The p-value for t_15 with t-value = 1.10 is: {p_value(15, 1.10):.4f}")
print(f"The p-value for t_30 with t-value = 1.10 is: {p_value(30, 1.10):.4f}")

The p-value for t_15 with t-value = 1.10 is: 0.1443
The p-value for t_30 with t-value = 1.10 is: 0.1400


## Make the decision

In the end we need to make a decision, accepting the ***$H_0$*** hypotheses or not, given a significance level $\alpha$.

We reject the hypotheses if the p-values is less than $\alpha$

In [22]:
def make_decision(X_v, X_c, alpha = 0.05):
    n_v, x_v, s_v = get_stats(X_v)
    n_c, x_c, s_c = get_stats(X_c)
    d = degrees_of_freedom(n_v, s_v, n_c, s_c)
    t = t_value(n_v, x_v, s_v, n_c, x_c, s_c)
    p = p_value(d, t)
    
    if p < alpha:
        return 'Reject H_0'
    else:
        return 'Do not reject H_0'

In [23]:
alphas = [0.06, 0.05, 0.04, 0.01]
for alpha in alphas:
    print(f"For an alpha of {alpha} the decision is to: {make_decision(X_v, X_c, alpha = alpha)}")

For an alpha of 0.06 the decision is to: Reject H_0
For an alpha of 0.05 the decision is to: Do not reject H_0
For an alpha of 0.04 the decision is to: Do not reject H_0
For an alpha of 0.01 the decision is to: Do not reject H_0
