# A/B TESTING

## Case:

Facebook recently introduced a new bidding type, 'average bidding', as an alternative to the existing bidding type called 'maximum bidding'. The final criterion of success was accepted as purchase. Statistical tests are needed.

### Libraries & Packages 

In [1]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#pip install statsmodels
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.width', 500)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

### Data 

In [2]:
df_kontrol = pd.read_excel("ab_testing.xlsx", sheet_name= "Control Group") # new study
df_test = pd.read_excel("ab_testing.xlsx", sheet_name= "Test Group") # old study 

In [3]:
def check_df(dataframe, head=5):
    print("##################### Shape #####################")
    print(dataframe.shape)
    print("##################### Types #####################")
    print(dataframe.dtypes)
    print("##################### Head #####################")
    print(dataframe.head(head))
    print("##################### Tail #####################")
    print(dataframe.tail(head))
    print("##################### NA #####################")
    print(dataframe.isnull().sum())
    print("##################### Quantiles #####################")
    print(dataframe.describe([0, 0.05, 0.10, 0.25, 0.50, 0.75, 0.80, 0.95, 0.99, 1]).T)

In [4]:
check_df(df_kontrol) 

##################### Shape #####################
(40, 4)
##################### Types #####################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
##################### Head #####################
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
##################### Tail #####################
     Impression      Click  Purchase    Earning
35 132064.21900 3747.15754 551.07241 2256.97559
36  86409.94180 4608.25621 345.04603 1781.35769
37 123678.93423 3649.07379 476.16813 2187.72122
38 101997.49410 4736.35337 474.61354 2254.56383
39 121085.88122 4285.17861 590.40602 1289.30895
##################### NA #####################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int6

In [5]:
check_df(df_test) 

##################### Shape #####################
(40, 4)
##################### Types #####################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
##################### Head #####################
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752
##################### Tail #####################
     Impression      Click  Purchase    Earning
35  79234.91193 6002.21358 382.04712 2277.86398
36 130702.23941 3626.32007 449.82459 2530.84133
37 116481.87337 4702.78247 472.45373 2597.91763
38  79033.83492 4495.42818 425.35910 2595.85788
39 102257.45409 4800.06832 521.31073 2967.51839
##################### NA #####################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int6

### Average of 2 groups

In [6]:
df_kontrol["Purchase"].mean() 

550.8940587702316

In [7]:
df_test["Purchase"].mean() 

582.1060966484675

### Concat two data 

In [8]:
df_kontrol["Project_Name"] = "Maximum Bidding"
df_test["Project_Name"] = "Average Bidding"

df = pd.concat([df_kontrol, df_test])

In [9]:
df.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Project_Name
0,82529.45927,6090.07732,665.21125,2311.27714,Maximum Bidding
1,98050.45193,3382.86179,315.08489,1742.80686,Maximum Bidding
2,82696.02355,4167.96575,458.08374,1797.82745,Maximum Bidding
3,109914.4004,4910.88224,487.09077,1696.22918,Maximum Bidding
4,108457.76263,5987.65581,441.03405,1543.72018,Maximum Bidding


In [10]:
df.shape

(80, 5)

### A/B Testing

#### Hypothesis

H0 : μ1 = μ2   # There is no statistically significant difference between the average of purchases for the two projects.

H1 : μ1 != μ2  # There is statistically significant difference between the average of purchases for the two projects.

In [11]:
df.groupby("Project_Name").agg({"Purchase": "mean"})

Unnamed: 0_level_0,Purchase
Project_Name,Unnamed: 1_level_1
Average Bidding,582.1061
Maximum Bidding,550.89406


#### Assumptions

##### 1. Normal Distribution 

H0 :  The assumption of normal distribution is provided.

H1 :  The assumption of normal distribution is not provided.

In [12]:
test_stat, pvalue = shapiro(df.loc[df["Project_Name"] == "Maximum Bidding", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [13]:
test_stat, pvalue = shapiro(df.loc[df["Project_Name"] == "Average Bidding", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


In [14]:
# p value > 0.05 for both projects. So, H0 cannot be rejected. Normal distribution is provided.

##### 2. Homogeneity of variances

H0 :  The assumption of homogeneity of variances is provided.

H1 :  The assumption of homogeneity of variances is not provided.

In [15]:
test_stat, pvalue = levene(df.loc[df["Project_Name"] == "Maximum Bidding", "Purchase"],
                           df.loc[df["Project_Name"] == "Average Bidding", "Purchase"])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


In [16]:
# p value > 0.05. So, H0 cannot be rejected. Homogeneity of variances is provided.

#### Final Test

Since the normal distribution and homogeneity of variance are ensured, an independent two-sample t-test (parametric test) should be applied.

In [17]:
test_stat, pvalue = ttest_ind(df.loc[df["Project_Name"] == "Maximum Bidding", "Purchase"],
                              df.loc[df["Project_Name"] == "Average Bidding", "Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


In [18]:
# H0 cannot be rejected because p_value > 0.05.
# Therefore, there is no difference in the average of purchases between the two projects.
# Although there seems to be a difference normally, it has been proven that there is no statistical difference between the means.