# AB TESTING FOR BIDDING METHODS

## Without A Grounding In Statistics, A Data Scientist Is A Lab Data Assistant

 ## AB Testing -  Independent Two Sample T Test

![images](images.png)
1. Determine the hypothesises
2. Control
    - 1. Normality (shapiro)
    - 2. Variance (levene)
3. Implementing the hypothesises
    - 1. If normality and variance homogeneity are provided then parametric test is used. (T test)
    - 2. If normality and variance homogeneity are not provided then non-parametric test is used. (mannwhithneyu)
4. Interpret the results according to the p-value
Note:
- If the data is not normally distributed then the non-parametric test is used but the homogeneity is not provided then parametric test is used with equal_var = False parameter.
- It may be beneficial to check and fix outliers before the normality test.





## Importing Modules & Datasets

In [1]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# !pip install statsmodels
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [6]:
test_df = pd.read_excel("ab_testing.xlsx", sheet_name="Test Group")
control_df = pd.read_excel("ab_testing.xlsx", sheet_name="Control Group")

In [7]:
test_df.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


In [8]:
control_df.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


## Exploratory Data Analysis

### Determining the conversion rates for impression to click and click to purchase in both control and test groups

In [36]:
#Calculating and adding as a new column the conversion rates
control_df["Impression_to_Click"] = (control_df["Click"] / control_df["Impression"]).round(2)
control_df["Click_to_Purchase"] = (control_df["Purchase"] / control_df["Click"]).round(2)
control_df["Impression_to_Purchase"] = (control_df["Purchase"] / control_df["Impression"]).round(2)
test_df["Impression_to_Click"] = (test_df["Click"] / test_df["Impression"]).round(2)
test_df["Click_to_Purchase"] = (test_df["Purchase"] / test_df["Click"]).round(2)
test_df["Impression_to_Purchase"] = (test_df["Purchase"] / test_df["Impression"]).round(2)


In [37]:
control_df.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Impression_to_Click,Click_to_Purchase,Impression_to_Purchase
0,82529.45927,6090.07732,665.21125,2311.27714,0.07,0.11,0.01
1,98050.45193,3382.86179,315.08489,1742.80686,0.03,0.09,0.0
2,82696.02355,4167.96575,458.08374,1797.82745,0.05,0.11,0.01
3,109914.4004,4910.88224,487.09077,1696.22918,0.04,0.1,0.0
4,108457.76263,5987.65581,441.03405,1543.72018,0.06,0.07,0.0


In [38]:
test_df.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Impression_to_Click,Click_to_Purchase,Impression_to_Purchase
0,120103.5038,3216.54796,702.16035,1939.61124,0.03,0.22,0.01
1,134775.94336,3635.08242,834.05429,2929.40582,0.03,0.23,0.01
2,107806.62079,3057.14356,422.93426,2526.24488,0.03,0.14,0.0
3,116445.27553,4650.47391,429.03353,2281.42857,0.04,0.09,0.0
4,145082.51684,5201.38772,749.86044,2781.69752,0.04,0.14,0.01


## AB Testing

### Testing The Purchase Columns

In [30]:
#Check Normality
test_stat, pvalue = shapiro(test_df.Purchase)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


In [31]:
test_stat, pvalue = shapiro(control_df.Purchase)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [32]:
#Check homogeneity
test_stat, pvalue = levene(control_df.Purchase,
                           test_df.Purchase)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


### The normality and homogeneity are provided for these data, so parametric test is used. 

In [35]:
test_stat, pvalue = ttest_ind(control_df.Purchase,
                              test_df.Purchase,
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


### According to the t test's result, the p value is 0.34 and higher than the significant level 0.05, thus it can be said that H0 hypothesis cannot be rejected, these data are not statistically different. 

### Testing The Conversion Rate of Impression to Purchase

In [39]:
#Check Normality
test_stat, pvalue = shapiro(test_df.Impression_to_Purchase)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.6336, p-value = 0.0000


In [40]:
test_stat, pvalue = shapiro(control_df.Impression_to_Purchase)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.6290, p-value = 0.0000


In [41]:
#Check homogeneity
test_stat, pvalue = levene(control_df.Impression_to_Purchase,
                           test_df.Impression_to_Purchase)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.0496, p-value = 0.8244


### These data are not normally distributed and homogeneity is provided, thus the non parametric test is used.

In [42]:
test_stat, pvalue = mannwhitneyu(control_df.Impression_to_Purchase,
                                 test_df.Impression_to_Purchase)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 900.0000, p-value = 0.2688


### According to the t test's result, the p value is 0.26 and higher than the significant level 0.05, thus it can be said that H0 hypothesis cannot be rejected, these data are not statistically different. 