# Implementing A/B Tests in Python

A/B testing is one of the most important tools for optimizing most things we interact with on our computers, phones and tablets. From website layouts to social media ads and product features, every button, banner and call to action has probably been A/B tested. And these tests can be extremely granular; Google famously tested "40 shades of blue" to decide what shade of blue should be used for links on the Google and Gmail landing pages.

In this project we are going to try the A/B test

In [1]:
#import packages
import pandas as pd
import math
import statsmodels.stats.api as sms
import scipy.stats as st

  import pandas.util.testing as tm


In [2]:
#import data
raw_data = pd.read_csv("/content/ab_data.csv")
df = raw_data.copy()

print("Number of rows: ", df.shape[0], " Number of columns: ", df.shape[1])
df.head()

Number of rows:  262139  Number of columns:  5


Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,11:48.6,control,old_page,0.0
1,804228,01:45.2,control,old_page,0.0
2,661590,55:06.2,treatment,new_page,0.0
3,853541,28:03.1,treatment,new_page,0.0
4,864975,52:26.2,control,old_page,1.0


In [3]:
df["group"].value_counts()

treatment    131137
control      131002
Name: group, dtype: int64

In [4]:
#some of the control group saw the new_page and some tretment group saw the old_page 
mask1 = (df["group"] == "control") & (df["landing_page"] == "new_page")
index_to_drop1 = df[mask1].index
df = df.drop(index_to_drop1)

mask2 = (df["group"] == "treatment") & (df["landing_page"] == "old_page")
index_to_drop2 = df[mask2].index
df = df.drop(index_to_drop2)

print(df.shape)
df["group"].value_counts()

(258685, 5)


treatment    129379
control      129306
Name: group, dtype: int64

In [5]:
#Check how many duplicated users exist
print(df["user_id"].count())
print(df["user_id"].nunique())

258685
258684


In [7]:
#drop duplicated users
df.drop_duplicates(subset ='user_id',keep ='first',inplace = True)

In [8]:
#Show the % split between users who saw new vs old page
#Calculate pooled probability
mask = (df["group"] == "control")
conversions_control = df["converted"][mask].sum()
total_users_control = df["converted"][mask].count()

mask = (df["group"] == "treatment")
conversions_treatment = df["converted"][mask].sum()
total_users_treatment = df["converted"][mask].count()

print("Split of control users who saw old page vs treatment users who saw new page: ", 
          round(total_users_control / df["converted"].count() * 100, 2), "% ",
          round((total_users_treatment / df["converted"].count()) * 100, 2), "%")

#count number of users who converted in each group
print("Number of control users who converted on old page: ", conversions_control)
print("Percentage of control users who converted: ", round((conversions_control / total_users_control) * 100, 2), "%")

mask = (df["group"] == "treatment")
print("Number of treatment users who converted on new page: ", conversions_treatment)
print("Percentage of treatment users who converted: ", round((conversions_treatment/ total_users_treatment) * 100, 2), "%")

Split of control users who saw old page vs treatment users who saw new page:  49.99 %  50.01 %
Number of control users who converted on old page:  15621.0
Percentage of control users who converted:  12.08 %
Number of treatment users who converted on new page:  15354.0
Percentage of treatment users who converted:  11.87 %


# Set Test Parameters

In [9]:
#Check what sample size is required
baseline_rate = conversions_control / total_users_control
practical_significance = 0.01 #user defined
confidence_level = 0.05 #user defined, for a 95% confidence interval
sensitivity = 0.8 #user defined

effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + practical_significance)
sample_size = sms.NormalIndPower().solve_power(effect_size = effect_size, power = sensitivity, 
                                               alpha = confidence_level, ratio=1)
print("Required sample size: ", round(sample_size), " per group")

Required sample size:  17258  per group


# A/B Test

In [11]:
#Calculate pooled probability
mask = (df["group"] == "control")
conversions_control = df["converted"][mask].sum()
total_users_control = df["converted"][mask].count()

mask = (df["group"] == "treatment")
conversions_treatment = df["converted"][mask].sum()
total_users_treatment = df["converted"][mask].count()

prob_pooled = (conversions_control + conversions_treatment) / (total_users_control + total_users_treatment)

In [13]:
#Calculate pooled standard error and margin of error
se_pooled = math.sqrt(prob_pooled * (1 - prob_pooled) * (1 / total_users_control + 1 / total_users_treatment))
z_score = st.norm.ppf(1 - confidence_level / 2)
margin_of_error = se_pooled * z_score

#Calculate dhat, the estimated difference between probability of conversions in the experiment and control groups
d_hat = (conversions_treatment / total_users_treatment) - (conversions_control / total_users_control)

#Test if we can reject the null hypothesis
lower_bound = d_hat - margin_of_error
upper_bound = d_hat + margin_of_error

if practical_significance < lower_bound:
    print("Reject null hypothesis")
else: 
    print("Do not reject the null hypothesis")
    
print("The lower bound of the confidence interval is ", round(lower_bound * 100, 2), "%")
print("The upper bound of the confidence interval is ", round(upper_bound * 100, 2), "%")

Do not reject the null hypothesis
The lower bound of the confidence interval is  -0.46 %
The upper bound of the confidence interval is  0.04 %
