  Business Problem

  One of our customers has decided to test the new feature and wants to do an A/B test
 to see if averagebidding converts more than maximumbidding.

  A/B testing has been going on for 1 month and the customer is now waiting for you analyze the result of this A/B test. The ultimate success criterion for the customer is Purchase. Therefore, the focus should be on the Purchase metric for statistical testing.

 Dataset Summary

 Dataset which includes the website information of a company, there is information such as the number of advertisements that users see and click, as well as earnings information from here. There are two separate data sets, the control and test groups. maximumbidding was applied to the control group and averaagebidding was applied to the test group.

*  Impression: Ad views
*  Click: Number of clicks on the displayed ad
*  Purchase: Number of products purchased after ads clicked
*  Earning: Earnings after purchased products

In [1]:
# import lib

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, pearsonr, spearmanr, kendalltau, f_oneway, kruskal

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)
pd.set_option('display.width', 225)


In [3]:
# datasets

test_data = pd.read_excel("../input/ab-testing/ab_testing.xlsx", sheet_name = "Test Group")
control_data = pd.read_excel("../input/ab-testing/ab_testing.xlsx", sheet_name = "Control Group")

test_data_copy = test_data.copy()
control_data_copy = control_data.copy()

In [4]:
def check_data(dataframe):
    print("###Shape###")
    print(dataframe.shape, "\n")
    print("###Head###")
    print(dataframe.head(10), "\n")
    print("###Info###")
    print(dataframe.info, "\n")
    print("###Type###")
    print(dataframe.dtypes, "\n")
    print("###NA###")
    print(dataframe.isnull().sum(), "\n")
    print("###Tail###")
    print(dataframe.tail, "\n")
    print("###Shape###")
    print(dataframe.describe(), "\n")

check_data(control_data)

###Shape###
(40, 4) 

###Head###
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
5  77773.63390 4462.20659 519.66966 2081.85185
6  95110.58627 3555.58067 512.92875 1815.00661
7 106649.18307 4358.02704 747.02012 1965.10040
8 122709.71659 5091.55896 745.98568 1651.66299
9  79498.24866 6653.84552 470.50137 2456.30424 

###Info###
<bound method DataFrame.info of      Impression      Click  Purchase    Earning
0   82529.45927 6090.07732 665.21125 2311.27714
1   98050.45193 3382.86179 315.08489 1742.80686
2   82696.02355 4167.96575 458.08374 1797.82745
3  109914.40040 4910.88224 487.09077 1696.22918
4  108457.76263 5987.65581 441.03405 1543.72018
..          ...        ...       ...        ...
35 132064.21900 3747.15754 551.07241 2256.97559
36  86409.94180 4608.

In [5]:
check_data(test_data)

###Shape###
(40, 4) 

###Head###
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752
5 115923.00695 4213.86862 778.37316 2157.40855
6 106116.43664 3279.47297 491.61453 2560.41120
7 125957.11610 4690.56991 855.71980 2563.57976
8 117442.86465 3907.93924 660.47791 2242.23259
9 131271.71560 4721.18781 532.27934 2368.10857 

###Info###
<bound method DataFrame.info of      Impression      Click  Purchase    Earning
0  120103.50380 3216.54796 702.16035 1939.61124
1  134775.94336 3635.08242 834.05429 2929.40582
2  107806.62079 3057.14356 422.93426 2526.24488
3  116445.27553 4650.47391 429.03353 2281.42857
4  145082.51684 5201.38772 749.86044 2781.69752
..          ...        ...       ...        ...
35  79234.91193 6002.21358 382.04712 2277.86398
36 130702.23941 3626.

In [6]:
#Combining test and control data with concat
df = pd.concat([control_data, test_data])

* Hypothesizing
1. HO: M1=M2 (There is no statistical difference between the purchasing averages between the two bidding types.)
2. H1: M1!=M2 (There is statistical difference between the purchasing averages between the two bidding types.)



In [7]:
purchase_control = control_data["Purchase"].mean()
purchase_test = test_data["Purchase"].mean()

print(f'Control Data Purchase Average = {purchase_control}\nTest Data Purchase Average = {purchase_test}')




Control Data Purchase Average = 550.8940587702316
Test Data Purchase Average = 582.1060966484677


* Normality Assumptions

1. H0:Assumption of normal distribution is provided.
2. H1:Assumption of normal distribution cannot be provided.


*  H0 is rejected if p-value < 0.05
*  H0 cannot be rejected if p-value > 0.05.

In [8]:
test_stat, pvalue = shapiro(test_data["Purchase"])
print(f'Test Stat= {test_stat}, pvalue= {pvalue}')

Test Stat= 0.9589452147483826, pvalue= 0.15413185954093933


In [9]:
test_stat, pvalue = shapiro(control_data["Purchase"])
print(f'Test Stat= {test_stat}, pvalue= {pvalue}')



Test Stat= 0.9772694110870361, pvalue= 0.5891125202178955


Hypothesis cannot be rejected for both groups because P-values are greater than 0.05 for both data.

* Variance Homogeneity

1. H0:Variances are homogeneous.
2. H1:Variances are not homogeneous.


In [10]:
test_stat, pvalue = levene(test_data["Purchase"], control_data["Purchase"])
print(f'Test Stat= {test_stat}, pvalue={pvalue}')

Test Stat= 2.6392694728747363, pvalue=0.10828588271874791


Variances are homogeneous and cannot be rejected because the p value is greater than 0.05.

It is provided in two assumptions. For AB-Test, t-test is applied.

In [11]:
test_stat, pvalue = ttest_ind(test_data["Purchase"], control_data["Purchase"])
print(f'Test stat= {test_stat}, pvalue = {pvalue}')

Test stat= 0.9415584300312964, pvalue = 0.34932579202108416


Since the P value is greater than 0.05, the hypothesis cannot be rejected. There is no significant difference between the two groups.