<a href="https://www.kaggle.com/code/ebruiserisobay/conversion-rates-with-a-b-testing?scriptVersionId=181745352" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Comparison of Conversion Rates of Bidding Methods 

## 1. Business Problem

Facebook recently introduced a new type of bid called “average bid” as an alternative to the current type of bid called “maximum bid”. An A/B test was conducted to see if the average bid converts more than the maximum bid. The A/B test consists of data collected over a month. The ultimate measure of success is the purchase. Therefore, the test focused on the “purchase” metric.


## 2. About Dataset

This dataset, which contains a company's website information, includes information such as the number of advertisements seen and clicked by users, as well as information about the earnings from these advertisements. There are two separate data sets: control and test group. These data sets are on separate pages of the file. Maximum bidding was applied to the control group and average bidding was applied to the test group.

* **Impression:** Number of ad views
* **Click:** Number of clicks on the displayed ad
* **Purchase:** Number of products purchased after clicking the ads
* **Earning:** Revenue obtained from purchased products

## 3. Data Preparing & Understanding

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
!pip install statsmodels
import statsmodels.stats.api as sms
from scipy.stats import shapiro,levene,ttest_ind
from statsmodels.stats.proportion import proportions_ztest
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", 10)
pd.set_option("display.float_format", lambda x: "%.5f" % x)



In [2]:
# assigning control and test group data to separate variables

df_ = pd.read_excel("/kaggle/input/ab-testing-data/ab_testing.xlsx", sheet_name="Control Group")
df_max = df_.copy()

df2 = pd.read_excel("/kaggle/input/ab-testing-data/ab_testing.xlsx", sheet_name="Test Group")
df_avg = df2.copy()

Control Group :

In [3]:
df_max.head() #display first 5 columns

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


In [4]:
df_max.shape #display the shape of the dataset

(40, 4)

In [5]:
df_max.describe().T #descriptive statistics

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,101711.44907,20302.15786,45475.94296,85726.69035,99790.70108,115212.81654,147539.33633
Click,40.0,5100.65737,1329.9855,2189.75316,4124.30413,5001.2206,5923.8036,7959.12507
Purchase,40.0,550.89406,134.1082,267.02894,470.09553,531.20631,637.95709,801.79502
Earning,40.0,1908.5683,302.91778,1253.98952,1685.8472,1975.16052,2119.80278,2497.29522


In [6]:
df_max.info() #info about dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB


In [7]:
df_max.columns # display the names of columns

Index(['Impression', 'Click', 'Purchase', 'Earning'], dtype='object')

Test Group:

In [8]:
df_avg.head()  #display first 5 columns

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


In [9]:
df_avg.shape  #display the shape of the dataset

(40, 4)

In [10]:
df_avg.describe().T #descriptive statistics

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,120512.41176,18807.44871,79033.83492,112691.97077,119291.30077,132050.57893,158605.92048
Click,40.0,3967.54976,923.09507,1836.62986,3376.81902,3931.3598,4660.49791,6019.69508
Purchase,40.0,582.1061,161.15251,311.62952,444.62683,551.35573,699.86236,889.91046
Earning,40.0,2514.89073,282.73085,1939.61124,2280.53743,2544.66611,2761.5454,3171.48971


In [11]:
df_avg.info() #info about dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB


In [12]:
df_avg.columns # display the names of columns

Index(['Impression', 'Click', 'Purchase', 'Earning'], dtype='object')

In [13]:
# combining the control and test group data using the concat method after the analysis process

df_max.columns = [f"{col}_max" for col in df_max.columns] # changing column names (control)
df_avg.columns = [f"{col}_avg" for col in df_avg.columns] # changing column names (test)

df = pd.concat([df_max,df_avg], axis=1)

In [14]:
df.head() #display first 5 columns

Unnamed: 0,Impression_max,Click_max,Purchase_max,Earning_max,Impression_avg,Click_avg,Purchase_avg,Earning_avg
0,82529.45927,6090.07732,665.21125,2311.27714,120103.5038,3216.54796,702.16035,1939.61124
1,98050.45193,3382.86179,315.08489,1742.80686,134775.94336,3635.08242,834.05429,2929.40582
2,82696.02355,4167.96575,458.08374,1797.82745,107806.62079,3057.14356,422.93426,2526.24488
3,109914.4004,4910.88224,487.09077,1696.22918,116445.27553,4650.47391,429.03353,2281.42857
4,108457.76263,5987.65581,441.03405,1543.72018,145082.51684,5201.38772,749.86044,2781.69752


## 4. Defining the Hypothesis of the A/B Test

There is no statistically significant difference between the purchase averages of “maximumbidding” and “average bidding”.

* H0: m0 = m1  **(null hypothesis)**

There is a statistically significant difference between the purchase averages of “maximumbidding” and “average bidding”.

* H1: m0 != m1 **(alternative hypothesis)**

In [15]:
#analyzing purchase (gain) averages for control group

df["Purchase_max"].mean() 

550.8940587702316

In [16]:
#analyzing purchase (gain) averages for test group

df["Purchase_avg"].mean()

582.1060966484677

## 5. Checking Conditions with Hypothesis Test

**Normality Assumption:**

* H0: mo = m1  “Purchase_max” satisfies the assumption of normal distribution.
* H1: m0 != m1 “Purchase_max” does not meet the assumption of normal distribution.

In [17]:
test_stat,pvalue = shapiro(df["Purchase_max"]) #shapiro test
print('Test Stat = %.4f, p-value = %.4f' % (test_stat,pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [18]:
#shapiro test result:

alpha = 0.05
if pvalue < alpha :
    print('“Purchase_max” does not meet the assumption of normal distribution. (reject H0)')
else:
    print('“Purchase_max” satisfies the assumption of normal distribution (fail to reject H0)')

“Purchase_max” satisfies the assumption of normal distribution (fail to reject H0)


**Variance Homogeneity Assumption:**

* H0: mo = m1  variances are homogeneous.
* H1: m0 != m1 variances are not homogeneous.

In [19]:
test_stat,pvalue = levene(df["Purchase_max"], df["Purchase_avg"] ) #levene test
print('Test Stat = %.4f, p-value = %.4f' % (test_stat,pvalue))

Test Stat = 2.6393, p-value = 0.1083


In [20]:
# levene test result:

alpha = 0.05
if pvalue < alpha :
    print('variances are not homogeneous. (reject H0)')
else:
    print('variances are homogeneous. (fail to reject H0)')

variances are homogeneous. (fail to reject H0)


## 6. Conducting the Hypothesis Test and Result

In [21]:
# parametric test was performed since the assumptions were met :

# independent two-sample T-Test

test_stat, pvalue = ttest_ind(df["Purchase_max"], df["Purchase_avg"],equal_var= True)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat,pvalue))

Test Stat = -0.9416, p-value = 0.3493


In [22]:
# test result:

alpha = 0.05
if pvalue < alpha :
    print("There is a statistically significant difference between the purchase averages of maximumbidding and average bidding (reject H0)")
else:
    print("There is no statistically significant difference between the purchase averages of maximumbidding and average bidding.(fail to reject H0)")

There is no statistically significant difference between the purchase averages of maximumbidding and average bidding.(fail to reject H0)
