<a href="https://www.kaggle.com/code/burcakaydn/bidding-methods-with-a-b-testing?scriptVersionId=170351911" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## Analyzing the Effectiveness of Different Bidding Strategies Through A/B Testing

## Business Problem
* Facebook recently introduced a new bidding type called "average bidding" as an alternative to the existing "maximum bidding" type. Our client, bombabomba.com, has decided to test this new feature and wants to conduct an A/B test to determine whether average bidding brings in more conversions than maximum bidding.
* The A/B test has been running for 1 month, and bombabomba.com now expects you to analyze the results of this A/B test.
* Purchase is the ultimate success metric for bombabomba.com. Therefore, the Purchase metric should be focused on for statistical tests.


**"Average bidding"** and **"maximum bidding"** are terms commonly used in the context of online advertising and digital marketing to describe bidding strategies.

**Average Bidding:** In this strategy, the advertiser sets a specific budget and makes an average bid for each click. The advertising platform automatically adjusts the bids based on the frequency of ad impressions and clicks, optimizing them to result in an average cost. This strategy provides advertisers with more control and allows them to manage their budget more effectively.

**Maximum Bidding:** In this strategy, the advertiser sets a maximum bid for each click. The advertising platform automatically adjusts the bids to reach the specified maximum bid. This strategy allows advertisers to bid more aggressively to increase the visibility and likelihood of clicks for their ads.

## Data Story
* This data set contains information about a company's website, including information such as the number of ads seen and clicked by users, as well as revenue information from these ads. There are two separate data sets:
* Control and Test groups. These data sets are on separate pages of the ab_testing.xlsx Excel file. 
* Maximum Bidding is applied to the control group, and Average Bidding is applied to the test group.

    * **Impression:** Number of ad impressions
    * **Click:** Number of clicks on ads
    * **Purchase:** Number of products purchased after clicking on ads
    * **Earning:** Revenue generated after purchasing products


## Project Tasks
### AB Testing (Independent Two-Sample T Test)
1. Formulate hypotheses
2. Assumption Check
- 1. Normality Assumption (shapiro)
- 2. Homogeneity of Variances (levene)
3. Application of the Hypothesis
- 1. Independent two-sample t-test if assumptions are met
- 2. Mann-Whitney U test if assumptions are not met
4. Interpret the results based on the p-value

Note:
- If normality is not satisfied, go directly to number 2. If homogeneity of variances is not satisfied,
pass argument 1 to number 1.
- Before normality examination, outlier examination and correction may be useful.

In [1]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

## Task 1: Data Preparation and Analysis
### Step 1: 
* Read the data set consisting of control and test groups from the file ab_testing_data.xlsx.
* Assign the control and test group data to separate variables.

In [2]:
# Get the sheet names
file_path  = pd.ExcelFile('/kaggle/input/ab-testing/ab_testing.xlsx')
sheet_names = file_path.sheet_names
print("Sheet Names in the Excel File:", sheet_names)

Sheet Names in the Excel File: ['Control Group', 'Test Group']


In [3]:
# Read the control and test group data
# Adjust the 'sheet_name' or 'usecols' parameters according to the structure of the Excel file
df_control = pd.read_excel(file_path, sheet_name='Control Group')
df_test = pd.read_excel(file_path, sheet_name='Test Group')

In [4]:
# Display the control and test group data
print("Control Group Data:")
print(df_control.head())
print("\nTest Group Data:")
print(df_test.head())

Control Group Data:
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018

Test Group Data:
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752


### Step 2: Analyze the control and test group data.

In [5]:
# Statistics of the control group
print("Statistics of the Control Group:")
print(df_control.describe())
# Statistics of the test group
print("\nBasic Statistics of the Test Group:")
print(df_test.describe())

Statistics of the Control Group:
        Impression      Click  Purchase    Earning
count     40.00000   40.00000  40.00000   40.00000
mean  101711.44907 5100.65737 550.89406 1908.56830
std    20302.15786 1329.98550 134.10820  302.91778
min    45475.94296 2189.75316 267.02894 1253.98952
25%    85726.69035 4124.30413 470.09553 1685.84720
50%    99790.70108 5001.22060 531.20631 1975.16052
75%   115212.81654 5923.80360 637.95709 2119.80278
max   147539.33633 7959.12507 801.79502 2497.29522

Basic Statistics of the Test Group:
        Impression      Click  Purchase    Earning
count     40.00000   40.00000  40.00000   40.00000
mean  120512.41176 3967.54976 582.10610 2514.89073
std    18807.44871  923.09507 161.15251  282.73085
min    79033.83492 1836.62986 311.62952 1939.61124
25%   112691.97077 3376.81902 444.62683 2280.53743
50%   119291.30077 3931.35980 551.35573 2544.66611
75%   132050.57893 4660.49791 699.86236 2761.54540
max   158605.92048 6019.69508 889.91046 3171.48971


### Step 3: After the analysis process, concatenate the control and test group data using the concat method.


In [6]:
# Add a column named 'group' and assign group names
df_control['Group'] = 'control'
df_test['Group'] = 'test'

In [7]:
# Concatenate the control and test group data
df_combined = pd.concat([df_control, df_test], axis=0)

# Show the first few rows of the concatenated data set
print(df_combined.head())
print("\n")
print(df_combined.tail())

    Impression      Click  Purchase    Earning    Group
0  82529.45927 6090.07732 665.21125 2311.27714  control
1  98050.45193 3382.86179 315.08489 1742.80686  control
2  82696.02355 4167.96575 458.08374 1797.82745  control
3 109914.40040 4910.88224 487.09077 1696.22918  control
4 108457.76263 5987.65581 441.03405 1543.72018  control


     Impression      Click  Purchase    Earning Group
35  79234.91193 6002.21358 382.04712 2277.86398  test
36 130702.23941 3626.32007 449.82459 2530.84133  test
37 116481.87337 4702.78247 472.45373 2597.91763  test
38  79033.83492 4495.42818 425.35910 2595.85788  test
39 102257.45409 4800.06832 521.31073 2967.51839  test


## Task 2: Definition of A/B Testing Hypothesis
### Step 1: Define the hypothesis.
        H0: M1 = M2
        H1: M1 != M2
### Step 2: Analyze the purchase (revenue) means for the control and test groups.

In [8]:
# Mean purchase of the control group
purchase_mean_control = df_control['Purchase'].mean()

# Mean purchase of the test group
purchase_mean_test = df_test['Purchase'].mean()

print("Mean 'Purchase' of the Control Group: %.4f" % purchase_mean_control)
print("Mean 'Purchase' of the Test Group: %.4f" % purchase_mean_test)

Mean 'Purchase' of the Control Group: 550.8941
Mean 'Purchase' of the Test Group: 582.1061


## Task 3: Implementation of Hypothesis Testing
AB Testing (Independent Two-Sample T Test)
### Step 1: Perform assumption checks before hypothesis testing. These are Normality Assumption and Homogeneity of Variances.
Test whether the control and test groups conform to the normality assumption based on the Purchase variable

In [9]:
from scipy.stats import shapiro

# Normality assumption for the control group
test_stat, pvalue = shapiro(df_control['Purchase'])
print('Shapiro Test for Control Group - Purchase: Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Shapiro Test for Control Group - Purchase: Test Stat = 0.9773, p-value = 0.5891


In [10]:
# Normality assumption for the test group
test_stat, pvalue = shapiro(df_test['Purchase'])
print('Shapiro Test for Test Group - Purchase: Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Shapiro Test for Test Group - Purchase: Test Stat = 0.9589, p-value = 0.1541


## Interpretation
* Since the p-values for both groups are greater than 0.05, the null hypothesis cannot be rejected.
* This means that the Purchase variable values for both groups are normally distributed.
* Since the normality assumption is met, an independent two-sample t-test (parametric test) can be applied.
* However, before making a final decision, a test of homogeneity of variances should also be performed.

In [11]:
from scipy.stats import levene

# Homogeneity of variances test
test_stat, pvalue = levene(df_control['Purchase'], df_test['Purchase'])
print('Levene Test for Homogeneity of Variances: Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Levene Test for Homogeneity of Variances: Test Stat = 2.6393, p-value = 0.1083


## Interpretation
* Since the p-value is greater than 0.05, there is no statistically significant difference
* between the variances. Therefore, the assumption of homogeneity of variances is satisfied.
* Thus, an independent two-sample t-test (parametric test) can be applied.

### Step 2: Select the appropriate test based on the results of the Normality Assumption and Homogeneity of Variances.
### Independent Two-Sample T Test (When Assumptions are Satisfied)

In [12]:
from scipy.stats import ttest_ind

test_stat, pvalue = ttest_ind(df_control['Purchase'], df_test['Purchase'], equal_var=True)
print('Independent Two-Sample T Test: Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Independent Two-Sample T Test: Test Stat = -0.9416, p-value = 0.3493


## Step 3: Interpret the test results based on the obtained p-value.
## Interpretation
* Since the p-value is greater than 0.05, there is no statistically significant difference between the purchase means of the control and test groups. 
* Therefore, it can be concluded that there is no statistically significant difference in conversion between the control group using maximum bidding and the test group using average bidding. 
* Thus, there is currently no reason for the client to prefer one bidding method over the other. 
* However, it may be advisable to conduct a more comprehensive analysis taking into account the client's long-term performance and other factors.
