<a href="https://www.kaggle.com/code/bsrsrc/ab-testing-independent-two-sample-t-test?scriptVersionId=181146058" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div class="alert alert-primary" style="margin-top: 20px">


<h1><center>AB Testing</center></h1>

</div>

## Business Problem

---

Facebook recently introduced a new bidding type called "average bidding" as an alternative to the existing "maximum bidding" method. One of our clients, bombabomba.com, has decided to test this new feature and wants to conduct an A/B test to determine if average bidding yields higher conversion rates compared to maximum bidding. The A/B test has been ongoing for 1 month, and bombabomba.com now seeks your analysis of the test results. The ultimate success metric for bombabomba.com is Purchase. Therefore, the Purchase metric should be the focus for statistical tests.

---

## Dataset Story

---

This dataset contains website information for a company, including metrics such as the number of ads seen and clicked by users, as well as revenue generated.There are two separate data sets: Control and Test groups, each represented on separate sheets in the ab_testing.xlsx Excel file. Maximum Bidding was applied to the Control group, while Average Bidding was applied to the Test group.

| Variable   | Description                                       |
|------------|---------------------------------------------------|
| impression | Number of ad impressions                          |
| Click      | Number of clicks on ads                           |
| Purchase   | Number of purchases after clicking on ads         |
| Earning    | Revenue generated from purchases                  |


## AB Testing (Independent Two-Sample T Test)
---
1. Formulate Hypotheses

2. Assumption Check

* 1. Normality Assumption (shapiro)

* 2. Homogeneity of Variances (levene)

3. Applying Hypothesis Test

* 1. If assumptions are met, apply independent two-sample t test

* 2. If assumptions are not met, apply Mann-Whitney U test

4. Interpret Results based on p-value

---

## Note:

- If normality assumption is not met, proceed directly to step 2. If homogeneity of variances is not met, specify argument for step 1.

- Prior to normality assessment, it may be beneficial to inspect and correct outliers.

---
## Task 1: Data Preparation and Analysis
---

In [1]:
#libraries
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option("display.float_format", lambda x: "%.5f" % x)

In [2]:
#dataset
df_control = pd.read_excel("/kaggle/input/ab-testing-exercise/ab_testing.xlsx", sheet_name="Control Group")
df_test = pd.read_excel("/kaggle/input/ab-testing-exercise/ab_testing.xlsx", sheet_name="Test Group")
df_control.head()
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


In [3]:
def check_df(df_control, head=5):
    print("#################### Shape ###################")
    print(df_control.shape)
    print("#################### Types ###################")
    print(df_control.dtypes)
    print("#################### Head ###################")
    print(df_control.head(head))
    print("#################### Tail ###################")
    print(df_control.tail(head))
    print("#################### NA ###################")
    print(df_control.isnull().sum())
    print("#################### Quantiles ###################")
    print(df_control.describe([0, 0.05, 0.50, 0.95, 1]).T)
check_df(df_control)

#################### Shape ###################
(40, 4)
#################### Types ###################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
#################### Head ###################
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
#################### Tail ###################
     Impression      Click  Purchase    Earning
35 132064.21900 3747.15754 551.07241 2256.97559
36  86409.94180 4608.25621 345.04603 1781.35769
37 123678.93423 3649.07379 476.16813 2187.72122
38 101997.49410 4736.35337 474.61354 2254.56383
39 121085.88122 4285.17861 590.40602 1289.30895
#################### NA ###################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64
#############

In [4]:
def check_df(df_test, head=5):
    print("#################### Shape ###################")
    print(df_test.shape)
    print("#################### Types ###################")
    print(df_test.dtypes)
    print("#################### Head ###################")
    print(df_test.head(head))
    print("#################### Tail ###################")
    print(df_test.tail(head))
    print("#################### NA ###################")
    print(df_test.isnull().sum())
    print("#################### Quantiles ###################")
    print(df_test.describe([0, 0.05, 0.50, 0.95, 1]).T)
check_df(df_test)

#################### Shape ###################
(40, 4)
#################### Types ###################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
#################### Head ###################
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752
#################### Tail ###################
     Impression      Click  Purchase    Earning
35  79234.91193 6002.21358 382.04712 2277.86398
36 130702.23941 3626.32007 449.82459 2530.84133
37 116481.87337 4702.78247 472.45373 2597.91763
38  79033.83492 4495.42818 425.35910 2595.85788
39 102257.45409 4800.06832 521.31073 2967.51839
#################### NA ###################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64
#############

In [5]:
df_control.columns = [i+"_control" for i in df_control.columns]
df_test.columns = [i+"_test" for i in df_test.columns]
df_control.head()
df_test.head()

Unnamed: 0,Impression_control,Click_control,Purchase_control,Earning_control
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


Unnamed: 0,Impression_test,Click_test,Purchase_test,Earning_test
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


In [6]:
df = pd.concat([df_control, df_test], axis=1)
df.head()
df.shape

Unnamed: 0,Impression_control,Click_control,Purchase_control,Earning_control,Impression_test,Click_test,Purchase_test,Earning_test
0,82529.45927,6090.07732,665.21125,2311.27714,120103.5038,3216.54796,702.16035,1939.61124
1,98050.45193,3382.86179,315.08489,1742.80686,134775.94336,3635.08242,834.05429,2929.40582
2,82696.02355,4167.96575,458.08374,1797.82745,107806.62079,3057.14356,422.93426,2526.24488
3,109914.4004,4910.88224,487.09077,1696.22918,116445.27553,4650.47391,429.03353,2281.42857
4,108457.76263,5987.65581,441.03405,1543.72018,145082.51684,5201.38772,749.86044,2781.69752


(40, 8)

## Task 2: Defining the Hypothesis of the A/B Test
---
* H0 = M1 = M2 (There is no statistically significant difference between the control group Maximum Bidding and the test group Average Bidding.)

* H1 = M1 != M2 (There is a statistically significant difference between the control group Maximum Bidding and the test group Average Bidding.)


In [7]:
df[["Purchase_control", "Purchase_test"]].mean()

Purchase_control   550.89406
Purchase_test      582.10610
dtype: float64

---
## Task 3: Performing the Hypothesis Test
---

In [8]:
test_stat, pvalue = shapiro(df_control["Purchase_control"])
print("Test Stat = %.4f, p-value = %.4f" % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [9]:
test_stat, pvalue = shapiro(df_test["Purchase_test"])
print("Test Stat = %.4f, p-value = %.4f" %(test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


---
* H0 = The normality assumption is met.

* H1 = The normality assumption is not met.

---

* p-value < 0.05, reject H0

* p-value > 0.05, cannot reject H0.

---

Since the p-value is greater than 0.05, H0 cannot be rejected, meaning that the normality assumption is met. Homogeneity assumption test should be conducted.

---

In [10]:
test_stat, pvalue = levene(df_control["Purchase_control"], df_test["Purchase_test"])
print("Test Stat= %.4f, p-value=%.4f" % (test_stat, pvalue))

Test Stat= 2.6393, p-value=0.1083


---
* H0 = The variances are homogeneous.

* H1 = The variances are not homogeneous.
---
* p-value < 0.05, reject H0

* p-value > 0.05, cannot reject H0.
---
Since the p-value is greater than 0.05, H0 cannot be rejected, meaning that the variances are homogeneous.

Since the assumptions are met, an independent two-sample t-test (parametric test) is conducted.

---

In [11]:
test_stat, pvalue = ttest_ind(df_control["Purchase_control"],
                             df_test["Purchase_test"],
                             equal_var=True)
print("Test Stat= %.4f, p-value= %.4f" % (test_stat, pvalue))

Test Stat= -0.9416, p-value= 0.3493


---
* p-value < 0.05, reject H0

* p-value > 0.05, cannot reject H0.
---
Since the p-value is greater than 0.05, H0 cannot be rejected. Therefore, there is no statistically significant difference between the control group Maximum Bidding and the test group Average Bidding.

---