In [88]:
from scipy.stats import norm
import numpy as np

Resources:
- Trustworthy Online Controlled Experiments by Ron Kohavi: https://www.amazon.com/Trustworthy-Online-Controlled-Experiments-Practical/dp/1108724264 
- Post from Ron Kohavi: https://www.linkedin.com/posts/ronnyk_abtest-statisticalpower-experimentguide-activity-6979146492221476864-P_wX?utm_source=share&utm_medium=member_desktop on the accuracy of a/b sample size calcs and why (most of the time) it does matter. Followed up by reply from post by Evan Miller https://www.linkedin.com/posts/evanmmiller_abtest-statisticalpower-experimentguide-activity-6980615725342179328-0P_E?utm_source=share&utm_medium=member_desktop who created one of the sample size calculators https://www.evanmiller.org/ab-testing/sample-size.html 

In [89]:
def z_score_for_alpha(alpha):
    # Calculate the z-score corresponding to the alpha value
    return norm.ppf(1 - alpha/2)

alpha = 0.05  # Significance level (e.g., 0.05 for 95% confidence interval)
z_score_alpha = z_score_for_alpha(alpha)
print("Z-score for alpha", alpha, ":", z_score_alpha)

Z-score for alpha 0.05 : 1.959963984540054


In [90]:
def z_score_for_beta(beta):
    # Calculate the z-score corresponding to the beta value
    return norm.ppf(1 - beta)

beta = 0.20 # 1 - power 
z_score_beta = z_score_for_beta(beta)
print("Z-score for beta", beta, ":", z_score_beta)


Z-score for beta 0.2 : 0.8416212335729143


In [91]:
def sample_size_mean(mean, sigma, alpha, beta, delta, variant_num):
    
    z_alpha = z_score_for_alpha(alpha)
    z_beta = z_score_for_beta(beta)
    
    # Rounded to the nearest whole number because partial users do not exist
    n = round((2 * (sigma**2) * ((z_alpha + z_beta) ** 2)) / (delta**2))
    
    sample_size = variant_num * n
    

    print(f"""
        You would need a minimum sample size of {sample_size} users ({n} users per variant)
        to detect a {delta} unit change in the mean with a significance level of {alpha} and a power 
        of {1-beta}, assuming a baseline variance of {sigma**2} and mean of {mean}.
        
    """)
    
    
    return sample_size

In [92]:
def em_sample_size_proportion(alpha, beta, delta, variant_num, baseline_conversion, type='relative'):
    
    z_alpha = z_score_for_alpha(alpha)
    z_beta = z_score_for_beta(beta)

    if type == 'relative':
        delta = delta * baseline_conversion
    elif 'absolute':
        delta

    p1 = baseline_conversion
    p2 = p1 + delta

    #Sample size function
    num = ((z_alpha * np.sqrt(2 * p1 * (1-p1)) ) + (z_beta * np.sqrt( (p1*(1-p1)) + (p2*(1-p2)) )))**2
    dnm = abs(p2-p1) ** 2

    #Get sample size per variant. Rounded to nearest whole number
    n = round(num/dnm)

    # Get total size
    sample_size = variant_num * n
    
    print(f"""
        You would need a minimum sample size of {sample_size} ({n} per variant)
        to detect a {round(delta*100, 2)}% {type} change in conversion rate with a significance level of {alpha} and a power 
        of {1-beta}, assuming a baseline conversion rate of {baseline_conversion*100}%.
        
    """)
    
    return sample_size

## Proportions

You would use the proportion calculating when viewing a metric such as % of searches w/view, buyer%, etc. 

If we are calculating a proportion and have historical data we can use the historical rate to improve estimation. There is also the ability to determine sample size needed for a relative or absolute change in the rate. 

This uses the reccommended online calculator function here: https://www.evanmiller.org/ab-testing/sample-size.html 

In [96]:
#alpha: significange level, type 1 error rate (false positive).
# Smaller alpoha = higher confidence level --> more samples
alpha = 0.05

#beta: Type 2 error rate (false negative) = 1 - power. Smaller beta = more power --> more samples reduces 
beta = 0.20

#Delta: Miniumum detectable effect. What is the smallest change that is meaningful to the business. 
#The point of the experiment is to figure this out.
#Smaller delta means smaller detectable effect --> more samples needed to detect change if change is more precise
delta = .01

#Basline conversion rate
baseline_conversion = .043

# Number of expriment groups 
variant_num = 2

_= em_sample_size_proportion(alpha, beta, delta, variant_num, baseline_conversion, type='absolute')


        You would need a minimum sample size of 13338 (6669 per variant)
        to detect a 1.0% absolute change in conversion rate with a significance level of 0.05 and a power 
        of 0.8, assuming a baseline conversion rate of 4.3%.
        
    


## Means 

We use the means formulas when calculating per user metrics (revenue per user, views per viewer, etc.)

Calculation can be validated:
- Using this online calculator: https://www.stat.ubc.ca/~rollin/stats/ssize/n2.html 
- Also, in Emma Ding's video on sample size estimation (includes formula): https://youtube.com/watch?v=FKPec6RoJOg 

In [94]:
#alpha: significange level, type 1 error rate (false positive).
# Smaller alpha = higher confidence level --> more samples
alpha = 0.05

#beta: Type 2 error rate (false negative) = 1 - power. Smaller beta = more power --> more samples reduces 
beta = .20

#Average of the control baseline
baseline_mean = 10

#Std Dev of control prior to running the experiment
baseline_std_deviation = 3

#Delta: Miniumum detectable effect. What is the smallest change that is meaningful to the business. 
#The point of the experiment is to figure this out.
#Smaller delta means smaller detectable effect --> more samples needed to detect change if change is more precise. 
# This should be in same units as the mean or sigma
delta = 0.05

# Number of expriment groups 
variant_num = 2

_= sample_size_mean(baseline_mean, baseline_std_deviation, alpha, beta, delta, variant_num)


        You would need a minimum sample size of 113024 users (56512 users per variant)
        to detect a 0.05 unit change in the mean with a significance level of 0.05 and a power 
        of 0.8, assuming a baseline variance of 9 and mean of 10.
        
    


In [101]:
p=0
sd= (0.0628 ** 2 ) * 1000
#sd == p-p^2