**Goal** - In this project, we will deep dive into Power Analysis required for successful implemental of A/B Testing. The focus is on executing power analysis  in Python.

I have created a hypothetical business problem inspired by various A/B testing use cases I have worked on in my past organizations. 

**Business Problem** - ABC Co, a monthly subscription based grocery supplies business, is looking to expand its member base. 
The pricing team proposes a 10% discount on the first month's membership fee to attract new members and suggests revamping the current Direct Mail marketing campaign to prominently highlight this offer. They belive/ hypothesis that this offer can increase the subscription rate from 15% to 18% 

Lets assume this team has already done aceessment of cost-effectiveness of this offer, and they conclude that even with 10% discount, the first month of enrollment will be ROI positive if the enrollment rate increases by 3% points. 

Marketing team wants to know whether this incentive will indeed increase the enrollment rate from 15% to 18%. 

You, the Data Scientist at ABC Co., decide to setup an A/B testing experiment to access this hypothesis. 

Before we dive deeper in A/B testing experiment setup, lets review some terms/ metrics

**What is A/B testing?**
A/B testing is a way to compare multiple versions of a single variable, for example by testing a subject's response to variant A against variant B, and determining which of the variants is more effective. (Wikipedia Definition)

Lets map this definition to the business problem. We want to compare enrollment rate (subject's response) for 2 versions of marketing Direct Mail (Current version vs new version with 10% discount) to know which of these variants is more effective.
 

**Metric Defintions **

Enrollment Rate = Number of audience enrolled/ total audience targeted 

Current Enrollment Rate (baseline) = 15% 

Expected Enrollment Rate = 18% 

Absolute Lift = Expected Enrollment Rate - Current Enrollment Rate

Relative Lift = (Expected Enrollment Rate - Current Enrollment Rate)/ Current Enrollment Rate


To design the A/B test experiment, follow the steps below- 

**Step 1** - Clearly define the Null hypothesis and Alternate hypothesis 

We have two target audience groups- 

Treatment - will receive Direct Mail with 10% discount offer

Control - will receive current version of Direct Mail (meaning no discounts) 

Lets say, we run the marketing campaign and record the enrollment rate towards the end of campaign. Let x be enrollment rate for Treatment and y be enrollment rate for Control 

**Null hypothesis (Ho): x = y** 

The enrollment rate for Treatment and Control group are not differnt. So if you observe a difference that is likely due to chance 


**Alternate hypothesis (H1): x>y**

There is difference between enrollment rate for Treatment and Control more extreme than what random chance can produce. This is the challenger to the null hypothesis


**Step 2** - Identify what should be the minimum sample size (number of target audience per group) to ensure statistically significant results which minimizes the Type I and Type II errors 

**Type I error** - Incorrectly rejecting the Null hypothesis when it is actually true

**Type II error** - Incorrectly accepting the Null hypothesis when it is actually false 

Minimizing these errors will ensure that results observed are statistically significant (not due to random chance). 

To achive this, we will implement hypothesis testing which outputs the probability of obtaining the observed results by random chance. This is called **p-value**.


**4 parameters are key in hypothesis testing - **

**Baseline Rate** - this is the current enrollment rate. In our case, 15%.

**Effect Size** - It is the magnitude of the difference in enrollment rates between two groups/

Cohen’s d is one of the measures to calculate effect size which we will use here.

**Sample Size** - this is the minimum required samples in each group (treatment and control) to get significant results. We need to find out the sample size. 

**Statistical Significance level** - This is probability threshold against which we compare the output of hypothesis testing. This threshold is typically set to 5% or 1%. This helps in minimizing Type I error.

**Statistical Power** - This is probability of correctly rejecting the null hypothesis. This probability threshold is usually set to 80% and it reduces Type II error 

Power analysis function is such that if you know 3 of the above variables you can find the value of 4th variable. 

Below is power analysis implementation in Python

Our goal is to find minimum sample size with a statistical significance level of 5%, statistical power of 20% and the effect size (which we will calculate).
Let's say we want to have a 50/50 distribution of sample audience in each group.





**Effect Size Calculation**

Cohen’s d formula for effect size 
(x1-x2)/ s 
where x1 is mean of group 1 and x2 is mean of group 2 and s is the pooled standard deviation of both groups 

In this case, the metric is enrollment rate with binary outcome (enroll/ not  enroll). So based on Bernoulli distribution 
Mean of treatment group is expected enrollment rate (p1) = 0.18  

Standard deviation of treatment group (s1) = sqrt(p1(1-p1))

Mean of control group is baseline enrollment rate (p2) = 0.15 

Standard deviation of control group (s2) = sqrt(p2(1-p2))

pooled standard deviation = sqrt((s1^2 + s2^2)/2)

Below is python function I have created to calculate the effect size 

In [2]:
import numpy as np 
import pandas as pd 

import numpy as np 

#baseline_rate = 15% 

#expected_rate = 18% 

def calculate_effect_size(baseline_rate, expected_rate):
    # Calculate standard deviations for both rates
    std_base = np.sqrt(baseline_rate * (1 - baseline_rate))
    std_exp = np.sqrt(expected_rate * (1 - expected_rate))
    
    # Calculate the pooled standard deviation
    pooled_std = np.sqrt((std_base**2 + std_exp**2) / 2)
    
    # Calculate the effect size
    effect_size = (expected_rate - baseline_rate) / pooled_std
    return effect_size

effect_size = calculate_effect_size(0.15, 0.18)
print(f'effect size is {effect_size}')

effect size is 0.0808892776909605


**Power Analysis** 

Here we will use the python package - statsmodels.stats.power to calculate the sample size

There are different power analysis functions in this package for t-tests, normal based test, F-tests and Chisquare goodness of fit test.

In our case a Z test is suitable so we will use the zt_ind_solve_power() function. 
(reference - https://www.statsmodels.org/dev/stats.html#proportion) 

We will input the effect size calculated above along with statistical significance level of 5% (alpha) and statistical power of 80%.


In [3]:
import statsmodels.stats.power as smp



# effect_size = 0.1  # Effect size (Cohen's d)
alpha = 0.05       # Significance level
power = 0.8        # Desired power
nobs1 = None       # Number of observations in group 1 (None for unknown) -- This is the sample size we need to find

#additional parameters required 
ratio = 1 # since we want to split audience 50% in Treatment and 50% in control, Treatment/Control ratio is set to 1 

#Since the alternative hypothesis is that x>y we want to use one-sided test 
#default value for this parameter is two-sided which can be used if the alternative hypothesis is x not equal to y
#the value can also be set to smaller for alternative hypothesis x<y
alternative = 'larger' 

# Perform power analysis
power_analysis = smp.zt_ind_solve_power(effect_size=effect_size, nobs1=nobs1, alpha=alpha, power=power, ratio=ratio, alternative=alternative)

print("Required Sample Size:", power_analysis)



Required Sample Size: 1889.8016605999433


Above results suggest that, we need atleast 1890 target audience in each group to ensure statistically significant results when A/B test is conducted. 

So, lets say we have 60,000 target audience. 
Of these we will take a random sample which is 2 times 1890 i.e. 3780. 

Random sample will ensure that the sample is representative of the population. 

We will then split them in two groups, Treatment and Control following a 50-50 distribution 

Below is example code to achieve this

In [4]:
#creating a dataframe which stores ID of  60,000 target audience 
df = pd.DataFrame({'id': range(1, 60001)})

print(df.head())

   id
0   1
1   2
2   3
3   4
4   5


In [5]:
#take random sample of size 3780 from this pool 
df_sample = df.sample(3780)
print(df_sample.head())

          id
18158  18159
31122  31123
2831    2832
16058  16059
46307  46308


In [None]:
#Create Treatment and Control group 
Treatment_df = df_sample.sample(1890)

#The control group is all IDs in df_sample excluding the ones in Treatment 
Control_df = df_sample[~df_sample['id'].isin(Treatment_df['id'])]


**Callout**

Often, in business setup we want to optimize profits while we continue learning what works/ does not work via A/B testing. 

Hence, in a business case like this we may want to explore 80-20 distribution of Treatment and Control - meaning 80% audience will be in treatment and get the discount offer while 20% will be in control. 

For this scenario, we can run the previous sample power analysis function by changing the ratio parameter to 4 and find the new sample size requirements. 

Let us run the code below. 

In [6]:

# effect_size = 0.1  # Effect size (Cohen's d)
alpha = 0.05       # Significance level
power = 0.8        # Desired power
nobs1 = None       # Number of observations in group 1 (None for unknown) -- This is the sample size we need to find

#additional parameters required 
ratio = 4 # since we want to split audience 80% in Treatment and 20% in control, Treatment/Control ratio is set to 1 

#Since the alternative hypothesis is that x>y we want to use one-sided test 
#default value for this parameter is two-sided which can be used if the alternative hypothesis is x not equal to y
#the value can also be set to smaller for alternative hypothesis x<y
alternative = 'larger' 

# Perform power analysis
power_analysis = smp.zt_ind_solve_power(effect_size=effect_size, nobs1=nobs1, alpha=alpha, power=power, ratio=ratio, alternative=alternative)

print("Required Sample Size:", power_analysis)



Required Sample Size: 1181.1260396328687


**Interpreting the sample size for 80-20 distribution**

In this case, the sample size we get from power analysis is 1181. This is the minimum sample size of the smallest group i.e. Control. 

To get the sample size for Treatment we multiply this by 4 (remember the ratio we set) 
So, sample size for Treatment is 4724