# __Comparison of Conversion Rates Between Bidding Methods Using A/B Testing__

## __Story of Dataset__
This dataset contains information from a company’s website, including the number of ads shown 
to users, the number of clicks on those ads, and the revenue generated from purchases. 
There are two separate datasets for the control and test groups, located on different sheets of the "ab_testing.xlsx" Excel file. 
The control group was exposed to Maximum Bidding, while the test group experienced Average Bidding.


__Variables__

- impression: Number of ad impressions
- Click: Number of clicks on the displayed ads
- Purchase: Number of products purchased after clicking the ads
- Earning: Revenue generated from the purchased products

### __Preparing and Analyzing the Data__

In [10]:

import pandas as pd
import statsmodels.stats.api as sms
from pygments.lexers.macaulay2 import M2KEYWORDS
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

__Loading the Data__

In [11]:
df_control = pd.read_excel("datasets/ab_testing.xlsx",  sheet_name="Control Group")
df_test = pd.read_excel("datasets/ab_testing.xlsx",  sheet_name="Test Group")

In [12]:
df_test.columns = ["Impression_test", "Click_test", "Purchase_test","Earning_test"]
df_control.columns =  ["Impression_control", "Click_control", "Purchase_control","Earning_control"]

In [13]:
df = pd.concat([df_control,df_test], axis=1)


In [14]:
df.head()

Unnamed: 0,Impression_control,Click_control,Purchase_control,Earning_control,Impression_test,Click_test,Purchase_test,Earning_test
0,82529.45927,6090.07732,665.21125,2311.27714,120103.5038,3216.54796,702.16035,1939.61124
1,98050.45193,3382.86179,315.08489,1742.80686,134775.94336,3635.08242,834.05429,2929.40582
2,82696.02355,4167.96575,458.08374,1797.82745,107806.62079,3057.14356,422.93426,2526.24488
3,109914.4004,4910.88224,487.09077,1696.22918,116445.27553,4650.47391,429.03353,2281.42857
4,108457.76263,5987.65581,441.03405,1543.72018,145082.51684,5201.38772,749.86044,2781.69752


In [15]:
df.isnull().sum()

Impression_control    0
Click_control         0
Purchase_control      0
Earning_control       0
Impression_test       0
Click_test            0
Purchase_test         0
Earning_test          0
dtype: int64

In [16]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression_control,40.0,101711.44907,20302.15786,45475.94296,85726.69035,99790.70108,115212.81654,147539.33633
Click_control,40.0,5100.65737,1329.9855,2189.75316,4124.30413,5001.2206,5923.8036,7959.12507
Purchase_control,40.0,550.89406,134.1082,267.02894,470.09553,531.20631,637.95709,801.79502
Earning_control,40.0,1908.5683,302.91778,1253.98952,1685.8472,1975.16052,2119.80278,2497.29522
Impression_test,40.0,120512.41176,18807.44871,79033.83492,112691.97077,119291.30077,132050.57893,158605.92048
Click_test,40.0,3967.54976,923.09507,1836.62986,3376.81902,3931.3598,4660.49791,6019.69508
Purchase_test,40.0,582.1061,161.15251,311.62952,444.62683,551.35573,699.86236,889.91046
Earning_test,40.0,2514.89073,282.73085,1939.61124,2280.53743,2544.66611,2761.5454,3171.48971


## __A/B Testing__

<span style="color:darkblue; font-weight:bold;"> Analyze the average purchase (revenue) values for the control and test groups.</span>


In [17]:
df["Purchase_control"].mean()

550.8940587702316

In [18]:
df["Purchase_test"].mean()

582.1060966484677


<span style="color:red; font-weight:normal;">There is a mathematical difference between the mean of the Purchase_test variable and the mean of the Purchase_control variable
</span>

### __Define the Hypothesis__
 - H0: M1 = M2 (There is no significant difference between the two bidding methods.)
 
 - H1: M1 ≠ M2 (There is a significant difference between the two bidding methods.)


### __Assumption Checks:__

- __Normality Assumption__

  H0: The normality assumption is satisfied.

  H1: The normality assumption is not satisfied.
 
- __Homogeneity of Variance__

   H0: Variances are homogeneous.

  
   H1: Variances are not homogeneous.

### __Step 1: Normality Assumption__

In [19]:
test_stat, pvalue = shapiro(df["Purchase_control"])
print("test_stat = %.4f, p-value = %.4f" %(test_stat,pvalue))

test_stat = 0.9773, p-value = 0.5891



<span style="color:red; font-weight:normal;">p-value > 0.05 </span><br>
<span style="color:red; font-weight:normal;">H0 cannot be rejected</span><br>
<span style="color:red; font-weight:normal;"> The normality assumption is satisfied for the purchase control variable </span>


In [20]:
test_stat, pvalue = shapiro(df["Purchase_test"])
print("test_stat = %.4f, p-value = % .4f" %(test_stat,pvalue))

test_stat = 0.9589, p-value =  0.1541



<span style="color:red; font-weight:normal;">p-value > 0.05 </span><br>
<span style="color:red; font-weight:normal;">H0 cannot be rejected</span><br>
<span style="color:red; font-weight:normal;"> The normality assumption is satisfied for the purchase test variable </span>


### __Step 2: Homogeneity of Variance__

In [21]:
test_stat, pvalue = levene(df["Purchase_control"],df["Purchase_test"])
print("test_stat = %.4f, p-value = % .4f" %(test_stat,pvalue))


test_stat = 2.6393, p-value =  0.1083



<span style="color:red; font-weight:normal;">p-value > 0.05 </span><br>
<span style="color:red; font-weight:normal;"> The homogeneity of variance assumption is satisfied </span>

### __Selecting the test based on the results of the assumptions__

<span style="color:darkblue; font-weight:bold;"> Since the normality and homogeneity of variance assumptions are satisfied, the t-test will be used.</span>

In [22]:
test_stat,pvalue = ttest_ind(df["Purchase_control"],df["Purchase_test"], equal_var=True)
print("test_stat = %.4f, p-value = %.4f" % (test_stat,pvalue))

test_stat = -0.9416, p-value = 0.3493


<span style="color:red; font-weight:normal;">
Since the p-value is greater than 0.05, H0 cannot be rejected, meaning that there is no statistically significant difference 
in average purchases between the control and test groups.
</span>

###  __As a Result:__

Based on the business problem, we conducted assumption checks and determined that the necessary conditions were met.  
Therefore, an independent two-sample t-test (a parametric test) was applied.

The results indicated that there is **no statistically significant difference** between the Average Bidding and Maximum Bidding methods at a 95% confidence level.

Given this outcome, we recommend the client:

- to consider **extending the test duration** and/or **expanding the dataset** for more robust conclusions,
- and to test these bidding methods on other relevant metrics (e.g., Clicks, Revenue) to capture broader effects.

In the absence of a significant difference, the **most efficient method in terms of cost, manageability, or operational performance** can be preferred.