# Comparison of Bidding Methods with AB Testing

######################################################
## What is AB Testing ?
######################################################

> A/B testing, also known as split testing, is a marketing experiment wherein you give two different versions of a product or a piece of marketing content to your audience and test the better version.
######################################################
## AB Testing Steps
######################################################
1. Hypothesize
2. Control of Assumptions
   - 1. Normal Distribution (shapiro)
   - 2. Homogeneity of Variance (levene)
3. Application of Hypothesis
   - 1. Independent two-sample t-test if assumptions are met (parametric test)
   - 2. If assumptions are not met mannwhitneyu test (non-parametric test)
4. Interpret results based on p-value
 Note:
 - If normality is not ensured, number 2. If  Homogeneity of Variance is not provided, an argument is entered for number 1.
 - It may be helpful to perform outlier analysis and correction prior to normality analysis.














## Imports and Dataframe Analysis

In [75]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from scipy.stats import shapiro, levene, ttest_ind

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [77]:
xls = pd.ExcelFile('ab_testing.xlsx')
control = pd.read_excel(xls, 'Control Group')
test = pd.read_excel(xls, 'Test Group')

###Control Group Analysis:

In [26]:
control.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,101711.44907,20302.15786,45475.94296,85726.69035,99790.70108,115212.81654,147539.33633
Click,40.0,5100.65737,1329.9855,2189.75316,4124.30413,5001.2206,5923.8036,7959.12507
Purchase,40.0,550.89406,134.1082,267.02894,470.09553,531.20631,637.95709,801.79502
Earning,40.0,1908.5683,302.91778,1253.98952,1685.8472,1975.16052,2119.80278,2497.29522


In [84]:
control.dtypes

Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object

###Test Group Analysis:

In [27]:
test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,120512.41176,18807.44871,79033.83492,112691.97077,119291.30077,132050.57893,158605.92048
Click,40.0,3967.54976,923.09507,1836.62986,3376.81902,3931.3598,4660.49791,6019.69508
Purchase,40.0,582.1061,161.15251,311.62952,444.62683,551.35573,699.86236,889.91046
Earning,40.0,2514.89073,282.73085,1939.61124,2280.53743,2544.66611,2761.5454,3171.48971


In [85]:
test.dtypes

Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object

###The averages of both dataframes can be seen in the upper cells which are:

550.8940587702316 -> Maximum Bidding

582.1060966484677 -> Average Bidding

###So there exists a difference between averages but is this difference really caused by bidding methods?

###Let's find out.

## Defining Hypothesis:

###There is no statistical difference between control and test group Purchase averages based on maximum bidding and average bidding
 
###H0: M1 = M2 -> There is no difference

###H1: M1 != M2 -> There is a difference

###Normal Distribution:
H0: The assumption of normal distribution is provided.

H1: The assumption of normal distribution is not provided.

In [80]:
test_stat, pvalue = shapiro(control["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [81]:
test_stat, pvalue = shapiro(test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


p-values for both dataframes are greater than 0.05(alpha value) 
hence making the assumption of normal distribution is provided for both. This leads to the conclusion that H0 can't be rejected.

###Homogeneity of Variance:

H0: Variances are homogeneous.

H1: Variances are not homogeneous.

In [82]:
test_stat, pvalue = levene(control["Purchase"],
                           test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


The assumption of homogeneity of variances is provided since the p-value is greater than the alpha value. This leads to the conclusion that H0 can't be rejected.


## As mentioned above since both assumptions are met we perform t test which is the parametric test.

#Application of the Test

In [83]:
test_stat, pvalue = ttest_ind(control["Purchase"],
                              test["Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


###The p-value is greater than the alpha value. The H0 hypothesis cannot be rejected. There is no statistically significant difference between the control and test group Purchase averages.

---



#Summary and Reccomendations:

### Our hypothesis came to a conclusion that there was no statistical difference between bidding methods even though the averages of both dataframes were different. In future works using a different dataframe with more indexes in it might change the end result leading into the conclusion that one bidding method is superior to another. For now there is no need for changing the bidding method.