# AB Testing Final Project Udacity
## Sample Size, Alpha, and Power

#### In this section, I compute the required sample size by using my own functions, instead of using the on-line calculator.
Note: There will be a small discrepancy between the on-line calculator and the value computed by me.

project description: https://docs.google.com/document/u/1/d/1aCquhIqsUApgsxQ8-SQBAigFDcfWVVohLEXcV6jWbdI/pub

In [1]:
import pandas as pd
import math
from scipy.stats import norm
# Tutorial of scipy.stats.norm
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

In [2]:
#### Function - get the Z star as the statistical threshold
def get_Z_star(alpha):
    """
    alpha: the probability to make type 1 error. That is, Null hypothesis is rejected, when Null hypothesis is true.   
    return: the standard error (SE) of difference between control and experiment
    """
    from scipy.stats import norm
    return norm.ppf(1 - alpha/2)

# Function test
get_Z_star(0.05)

1.959963984540054

In [3]:
#### Fuction:  The standard error used for hypothesis testing of difference in proportions 
def get_SE_of_difference(prob_baseline, size):
    """
    prob_baseline: baseline probability - this is the expected probability by default or in the control group.
    size: the denominator of baseline probability
    return: the standard error (SE) of difference between control and experiment
    """
    
    # This is the standard error for difference, so we use pooled standard error equation with the same size in each group!
    # We also assume that two groups (control and experiment) will have the same numerator and denominator.
    # Thus, the pooled probability will be the same as baseline.
    p_pooled = prob_baseline
    SE = math.sqrt(  ( p_pooled*(1-p_pooled)*(1/size + 1/size)   ) )
    return SE

## Size  -  Gross Conversion , dmin = 0.01
GC = n_enroll / n_click  = Probability of enrolling, given click  = 0.206250

In [4]:
#### use on-line calculator (https://www.evanmiller.org/ab-testing/sample-size.html)
print("Required Sample Size in each group, using online-calculator: N=", 25835)

Required Sample Size in each group, using online-calculator: N= 25835


In [5]:
#### Parameters
dmin = 0.01
alpha = 0.05
beta= 0.2
p=0.20625 # baseline probability

#### Iterate through the sample size (n) in one group from 1 to 1,000,000
for n in range(1, 1000000):
    
    # Get the Z star at the alpha/2
    z_star = get_Z_star(alpha)
    
    # Get the standard error of difference between two groups.
    SE = get_SE_of_difference(prob_baseline=p, size=n)
    
    # Estimate the beta by using the z_star and the SE
    # Note: cdf = Cumulative distribution function
    estimated_beta = norm.cdf(x=z_star , loc= dmin/SE , scale=1) 
    
    if estimated_beta <= beta :
        print('Required Sample Size in each group: N=', n)
        break  

Required Sample Size in each group: N= 25699


## Size - Retention, dmin = 0.01
RT = (n of payment) / n_enroll = Probability of payment, given enroll = 0.530000

In [6]:
#### use on-line calculator (https://www.evanmiller.org/ab-testing/sample-size.html)
print("Required Sample Size in each group, using online-calculator: N=", 39115)

Required Sample Size in each group, using online-calculator: N= 39115


In [7]:
#### Use my formula
dim = 0.01
alpha = 0.05
beta= 0.2
p=0.53

for n in range(1, 1000000):
    z_star = get_Z_star(alpha)
    SE = get_SE_of_difference(prob_baseline=p, size=n)
    estimated_beta = norm.cdf(x=z_star , loc= dim/SE , scale=1)
    
    if estimated_beta < beta :
        print('Required Sample Size in each group: N=', n)
        break 

Required Sample Size in each group: N= 39104


## Size - Net Conversion, dmin = 0.0075
NC =  (n of payment) / n_click = Probability of payment, given click = 0.109313

In [8]:
#### use on-line calculator (https://www.evanmiller.org/ab-testing/sample-size.html)
print("Required Sample Size in each group, using online-calculator: N=", 27413)

Required Sample Size in each group, using online-calculator: N= 27413


In [9]:
#### Use my formula
dim = 0.0075
alpha = 0.05
beta= 0.2
p= 0.109313

for n in range(1, 1000000):
    z_star = get_Z_star(alpha)
    SE = get_SE_of_difference(prob_baseline=p, size=n)
    estimated_beta = norm.cdf(x=z_star , loc= dim/SE , scale=1)
    
    if estimated_beta < beta :
        print('Required Sample Size in each group: N=', n)
        break 

Required Sample Size in each group: N= 27172
