In [23]:
import pandas as pd
import numpy as np
import scipy.stats as stats
from statsmodels.stats.proportion import proportions_ztest

## Frequentist A/B testing - Example -  Comparing two proportions

A/B testing is essentially a simple randomized trial.

When someone visits a website, they are randomly directed to one of two different landing pages. The purpose is to determine which page has a better conversion rate.

The key principle is that after a large number of visitors, the groups of people who visited the two pages are completely comparable in respect of all characteristics (age, gender, location etc). Consequenly, we can compare the two groups and obtain an unbiased assessment of which page has a better conversoin rate.

Below, we can see that Landing Page B has a higher conversion rate but is it statistical significant?

In [8]:
data = pd.DataFrame({
    'landing_page': ['A', 'B'],
    'not_converted': [4514, 4473],
    'converted': [486, 527],
    'conversion_rate':[486/(486 + 4514), 527/(527 + 4473)]
})
data



Unnamed: 0,landing_page,not_converted,converted,conversion_rate
0,A,4514,486,0.0972
1,B,4473,527,0.1054


### Formulate hypothesis

Conversion rate can be thought of as the proportion of visitors that had an order. Thus, we have are comparing two proportions. Our hypothesis test becomes:

- **Null Hypothesis:** there is no difference in proportions $p_a - p_b = 0$
- **Alternative Hypothesis:** there is a difference in proportions $p_a - p_b \ne 0$

### Assumptions

**1) Sample Size**  
* $n_a*\hat{p_a}=486\geq10$  
* $n_a*(1-\hat{p_a})=4515\geq10$  
* $n_b*\hat{p_b}=527\geq10$
* $n_b*(1-\hat{p_b})=4472\geq10$  
 

**2) Random Sample**  
  
By design, our experiment uses a random sample

### Test Statistic

A test statistic is a single metric that can be used to evaluate the null hypothesis. A standard way to obtain this metric is to compute the z-score. This measures how many standard errors is our observe sample mean below or above the population mean

$$ \begin{align} z = \frac{(\hat{p_a}-\hat{p_b}) - (p_a-p_b)}{SE(\hat{p_a}-\hat{p_b})} \end{align} $$

 
$\hat{p_a}-\hat{p_b}$: the sample difference in proportions   
$p_a-p_b$: the population difference in proportions  
$SE(p_a-p_b)$: the standard error of the sample difference in proportions 


$$\begin{align*}
& \text{Standard error is defined} \\
\\
& SE(X)=\frac{Var(x)}{\sqrt{n_x}} \\
\\
\\ & \text{Variance and covariance are defined} \\
\\
& Var(X) = E[X^2] - E[X]^2 \\
& Cov(X, Y) = E[XY] - E[X]E[Y] \\
\\
\\ & \text{Difference in variance between X and Y is defined} \\
\\
& Var(X - Y) = E[(X - Y)(X - Y)] - E[X - Y]^2 \\ 
& Var(X - Y) = E[X^2 - 2XY + Y^2] - (E[x] - E[y])^2 \\  
& Var(X - Y) = E[X^2] - 2E[XY] + E[Y^2] - E[x]^2 + 2E[x]E[y] - E[y]^2 \\
& Var(X - Y) = (E[X^2] - E[x]^2) + (E[Y^2] - E[y]^2) - 2(E[XY] - E[X]E[Y])\\
& Var(X - Y) = Var(X) + Var(Y) -2Cov(X, Y) \\
\\
\\ & \text{Groups are independent thereofore covariance is 0} \\
\\
& Var(X - Y) = Var(X) + Var(Y)\\
\\
\\ & \text{Variance of a binomial proportion} \\
\\
& Var(p_a) = p_a (1 - p_a) \\
\\
\\ & \text{Standard error of a binomial proportion} \\
\\
& SE(p_a) = \frac{ p_a (1 - p_a)}{n_a}
\\
\\ & \text{thus} \\
\\
& Var(p_a-p_b) = Var(p_a) + Var(p_b) \\
& Var(p_a-p_b) = p_a(1-p_a) + p_b(1-p_b) \\
& SE(p_a-p_b) = \sqrt{\frac{p_a(1-p_a)}{n_a} + \frac{p_b(1-p_b)}{n_b}}
\\
\\ & \text{Under the null: } p_a=p_b=p \\
\\
& SE(p_a-p_b) = \sqrt{p(1-p)(\frac{1}{n_a}+\frac{1}{n_b})}
\end{align*}$$


### P-Value and hypothesis test outcome

In [32]:
def ztest_proportion_two_samples(success_a, size_a, success_b, size_b, one_sided=False):
    """
    A/B test for two proportions;
    given a success a trial size of group A and B compute
    its zscore and pvalue
    
    Parameters
    ----------
    success_a, success_b : int
        Number of successes in each group
        
    size_a, size_b : int
        Size, or number of observations in each group
        
    one_side: bool
        False if it is a two sided test
    
    Returns
    -------
    zscore : float
        test statistic for the two proportion z-test

    pvalue : float
        p-value for the two proportion z-test
    """
    proportion_a = success_a/size_a
    proportion_b = success_b/size_b    

    propotion = (success_a+success_b)/(size_a+size_b)
    se = propotion*(1-propotion)*(1/size_a+1/size_b)
    se = np.sqrt(se)
    
    z = (proportion_a-proportion_b)/se
    p_value = 1-stats.norm.cdf(abs(z))
    p_value *= 2-one_sided # if not one_sided: p *= 2
    
    return f"z test statistic: {z}, p-value: {p_value}"

success_a=486
size_a=486+4514
success_b=527
size_b=527+4473

ztest_proportion_two_samples(
    success_a=success_a,
    size_a=size_a,
    success_b=success_b,
    size_b=size_b,
)

'z test statistic: -1.3588507649479744, p-value: 0.17419388311717388'

Under the null that the conversion rate of the page A and page B are equal, we would observe this difference in conversion rate with a probability of 17.4%. Our threshold is typically set to 5% and thus the difference of the conversion rate we observe does not give us sufficient evidence to reject the null.  
  
**We fail to reject the null**