# A/B Testing

**A/B Testing, also known as split testing, is a method used to compare two versions of a webpage, app, or other digital content to determine which one performs better. It involves randomly assigning users to either the control group (version A) or the experimental group (version B) and measuring the performance of each version based on predefined metrics, such as click-through rates or conversion rates. This method helps businesses and developers make data-driven decisions to optimize user experience and achieve specific goals by identifying the most effective variations of their digital assets.**

https://www.youtube.com/watch?v=KZe0C0Qq4p0&t=4562s

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import random

## Create Random Data

In [2]:
size = 1000

User_id = pd.Series(np.linspace(start = 0, stop = size + 1, num = size, dtype = 'int'))
click = pd.Series(np.random.randint(0, 2, size = size, dtype = 'int'))


conn = ['control'] * 800
exp = ['experiment'] * 200
group = conn + exp

In [3]:
random.shuffle(group)

In [4]:
data = pd.concat([User_id, click, pd.Series(group)], axis = 1)
data.columns = ['User_id', 'click', 'group']
data.head()

Unnamed: 0,User_id,click,group
0,0,0,control
1,1,1,control
2,2,1,control
3,3,1,control
4,4,0,control


In [5]:
import datetime

lst_date = []

for i in range(1, 7):    
    lst_date.append(datetime.datetime(2024, 1, i).strftime("%m/%d/%Y"))

In [6]:
lst_date = lst_date * int(np.ceil(1000 / 6))

In [7]:
Dates = pd.Series(lst_date)

In [8]:
data['Times'] = Dates

In [9]:
data.head()

Unnamed: 0,User_id,click,group,Times
0,0,0,control,01/01/2024
1,1,1,control,01/02/2024
2,2,1,control,01/03/2024
3,3,1,control,01/04/2024
4,4,0,control,01/05/2024


In [10]:
total_click = data.groupby(['group'])['click'].sum().reset_index()
total_click

Unnamed: 0,group,click
0,control,400
1,experiment,98


In [11]:
total_click['NoClick'] = total_click.click.apply(lambda x: data.group.value_counts().values[1] - x if x == total_click.click.loc[1] else data.group.value_counts().values[0] - x)

In [12]:
total_click['ClickPercentage'] = total_click.click.apply(lambda x: round(x / len(conn), 2) * 100 if x ==  total_click.click.loc[0] else round(x / len(exp), 2) * 100)

In [13]:
total_click

Unnamed: 0,group,click,NoClick,ClickPercentage
0,control,400,400,50.0
1,experiment,98,102,49.0


# Hypothesis testing for proportion

$$ \huge X \sim B(n, px) $$
$$ \huge Y \sim B(m, py) $$

### Hypothèses du Test

<div style="text-align: center; font-size: 1.5em;">
    H<sub>0</sub>: \( px = p_0 \) <br>
    H<sub>1</sub>: \( px != p_0 \) <br>
</div>

### Estimation de la Proportion

<div style="text-align: left; font-size: 1.1em;">
    L'estimation ponctuelle de la proportion, souvent notée \( \hat{p} \), est calculée comme suit : <br><br>
    </div>

<div style="text-align: center; font-size: 1.5em;">
        \( \hat{px} = \frac{X}{n} \)<br><br>
     \( \hat{py} = \frac{Y}{n} \)<br><br>
     \( \hat{P} = \frac{X + Y}{n + m} \)
</div>

    
    
où :
- \( X, Y \) est le nombre de succès observés.
- \( n \) est la taille de l'échantillon.


## Parameters of the Model for Power Analysis


* **delta** - This is the smallest difference we want to detect between the treatment and control groups that would be of practical or commercial importance. In a business context, delta is used to determine whether the observed difference is large enough to warrant implementing a change or alternative action. For example, if delta is set to 0.1, this means that we want to be able to detect a difference of at least 10% between groups with a certain level of confidence.

In [14]:
alpha = 0.05    # probability of type 1 error - erreur we making when we reject h0 when h0 is really true
print(f'Alpha -  significance level is {alpha}.')

delta = 0.1   
print(f'Delta - minimum detectable effect is {delta}.')

Alpha -  significance level is 0.05.
Delta - minimum detectable effect is 0.1.


In [15]:
# num of click by group (connection / experience)
x_conn = data.groupby('group')['click'].sum().loc['control']  
x_exp = data.groupby('group')['click'].sum().loc['experiment']

print(f'Number of Clicks in control: {x_conn}')
print(f'Number of Clicks in experiment: {x_exp}')

Number of Clicks in control: 400
Number of Clicks in experiment: 98


In [16]:
n_conn = data[data.group == 'control'].count()[0]
n_exp = data[data.group == 'experiment'].count()[0]

## Proportion Estimates

In [17]:
p_control_hat = x_conn / n_conn
p_experiment_hat = x_exp / n_exp

print(f'Px estimate - Click Probability in control group: {p_control_hat}')
print(f'Py estimate - Click Probability in experiment group: {p_experiment_hat}')

# P global
p_pooled_hat = (x_conn + x_exp)  / (n_conn + n_exp)
print(f'P-global esimate is: {p_pooled_hat}')

Px estimate - Click Probability in control group: 0.5
Py estimate - Click Probability in experiment group: 0.49
P-global esimate is: 0.498


## Test Statistics

In [27]:
from scipy.stats import norm

# standard error
SE = np.sqrt((p_pooled_hat * (1 - p_pooled_hat)) * (1 / n_conn + 1 / n_exp))

# statistic test
Test_Stat = (p_control_hat - p_experiment_hat / SE)
Test_Stat

-11.896227598077514

## Critical value of Z-Test

In [22]:
# critical value for the test with alpha = 0.05
Z = norm.ppf(1 - alpha / 2)
print(f'The critical value of Z: {Z:.2f}')

The critical value of Z: 1.96


## Calculating P-Value of Z-Test

**p <= 0.05 indicate a strong evidence against a null hypothesis, so we reject h0**

**p > 0.05 indicate a weak evidence against the null hypothesis, so we fail to reject h0**

<div style="text-align: center; font-size: 1.5em;">
$$
\begin{align*}
\text{CDF}(x) &= \Phi\left(\frac{x - \mu}{\sigma}\right) \\
\text{SF}(x) &= 1 - \text{CDF}(x) \\
\text{SF}(x) &= 1 - \Phi\left(\frac{x - \mu}{\sigma}\right)
\end{align*}
$$
</div>


In [26]:
# SF -> Survival Function => 2 * P(Z > T)
p_value = 2 * norm.sf(abs(Test_Stat)) 

text = 'We are rejecting H0 - There are a statistical significance' if p_value <= alpha else 'We are accepting H0 - There are not a statistical significance'
text

'We are rejecting H0 - There are a statistical significance'

## Conficence Interval

In [33]:
p_hat_difference = p_control_hat - p_experiment_hat 

interval = np.sqrt(((p_control_hat * (1 - p_control_hat)) / n_conn) + ((p_experiment_hat * (1 - p_experiment_hat)) / n_exp))

hight_interval = p_hat_difference + (Z * interval)
low_interval = p_hat_difference - (Z * interval)
print(f'The Confidence Interval of difference between proportion is between {low_interval:.2f} & {hight_interval:.2f}.')

The Confidence Interval of difference between proportion is between -0.07 & 0.09.


# Calculation of Minimum Sample Size for Difference Between Two Proportions

$$
n \geq \large \left( \frac{Z_{\alpha/2} \times \sqrt{p_1 (1 - p_1) + p_2 (1 - p_2)}}{d} \right)^2
$$


In [43]:
minimum_n =  (Z * np.sqrt(p_control_hat * (1 - p_control_hat) + p_experiment_hat * (1 - p_experiment_hat))) / delta
minimum_n = minimum_n ** 2
print(f'The minimum n is: {int(minimum_n)}')
print(f'Thus we would need at least {int(minimum_n)} individuals in each group to detect a difference of {delta} between the two proportions with a confidence level of 95%.')

The minimum n is: 192
Thus we would need at least 192 individuals in each group to detect a difference of 0.1 between the two proportions with a confidence level of 95%.
