# A / B Testing
A/B testing (also known as bucket testing or split-run testing) is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective.

<font color = 'blue'>
Content: 

1. [Business Problem](#1)
1. [Variables](#2)
1. [Libraries](#3)
1. [Load Data](#4)
1. [Which System - Comparison](#5)
1. [Independent Two Sample T-Test](#6)
    * [6.1. Hypothesis Testing](#7)
    * [6.2. Assumption Control](#8)
        * [6.2.1. Normality Assumption (Shapiro-Wilk-W-Test)](#9)
        * [6.2.2. Variance Homogeneity Assumption](#10)
    * [6.3. Independent Two Sample T-Test](#11)
1. [References](#12)



<a id = "1"></a><br>
## 1. Business Problem
A ...... company recently introduced a new bidding type, “average bidding”, as an alternative to its exisiting bidding type, called “maximum bidding”. One of our clients, --------.com, has decided to test this new feature and wants to conduct an A/B test to understand if average bidding brings more conversions than maximum bidding.

In this A/B test, --------.com randomly splits its audience into two equally sized groups, e.g. the test and the control group. A --------- company ad campaign with “maximum bidding” is served to “control group” and another campaign with “average bidding” is served to the “test group”.

The A/B test has run for 1 month and --------.com now expects you to analyze and present the results of this A/B test.


You should answer the following questions in your presentation:

* How would you define the hypothesis of this A/B test?
* Can we conclude statistically significant results?
* Which statistical test did you use, and why?
* Based on your answer to Question 2, what would be your recommendation to client?

<a id = "2"></a><br>
## 2. Variables

* **Impression**: Ad views
* **Click**: Indicates the number of clicks on the displayed ad.
* **Purchase:** Indicates the number of products purchased after the ads clicked.
* **Earning:**  Earnings after purchased products

<a id = "3"></a><br>
## 3. Libraries

In [1]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, pearsonr, spearmanr, kendalltau, \
    f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

# installation required
!pip install openpyxl # for excel file

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
[K     |████████████████████████████████| 242 kB 4.5 MB/s 
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9
/kaggle/input/ab-testing/ab_testing.xlsx


<a id = "4"></a><br>
## 4. Load Data 

In [2]:
# control data
df_control = pd.read_excel("../input/ab-testing/ab_testing.xlsx",sheet_name= "Control Group")
df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


In [3]:
# test data
df_test = pd.read_excel("../input/ab-testing/ab_testing.xlsx",sheet_name= "Test Group")
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


<a id = "5"></a><br>
## 5. Which System Enables More Purchases?
Comparison of Purchase Means

In [4]:
df_control["Purchase"].describe().T

count    40.00000
mean    550.89406
std     134.10820
min     267.02894
25%     470.09553
50%     531.20631
75%     637.95709
max     801.79502
Name: Purchase, dtype: float64

In [5]:
df_test["Purchase"].describe().T

count    40.00000
mean    582.10610
std     161.15251
min     311.62952
25%     444.62683
50%     551.35573
75%     699.86236
max     889.91046
Name: Purchase, dtype: float64

In [6]:
sms.DescrStatsW(df_control["Purchase"]).tconfint_mean()

(508.0041754264924, 593.7839421139709)

In [7]:
sms.DescrStatsW(df_test["Purchase"]).tconfint_mean()

(530.5670226990062, 633.6451705979289)

This comparison indicates the new system is more applicable because it's mean is higher than old one

Yet, this Question must be asked => Is the difference of Means meaningfull statistically


<a id = "6"></a><br>
## 6. Independent Two Sample T-Test


<a id = "7"></a><br>
### 6.1 Hypothesis Testing
Let's write our hypothesis.
* **h0: m1 = m2 :** no significant difference in control group and test group purchases
* **h1: m1 != m2 :** there is a significant difference in control group and test group purchases

<a id = "8"></a><br>
### 6.2 Assumption Control
Before test, we should consider 
1. Normality Assumption
2. Variance Homogeneity Assumption

<a id = "9"></a><br>
#### 6.2.1. Normality Assumption (Shapiro-Wilk-W test)

* **H0:** Normal distribution assumption is provided.(p-value < 0.05)

* **H1:** Normal distribution assumption not provided. (not p-value < 0.05)

In [8]:
# control group
test_stat, pvalue = shapiro(df_control["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


H0 hypothesis could not be rejected because pvalue = 0.5891 > 0.05.

We see that the data in the control group has a normal distribution.

Because 0.05 is not greater than p value.

In [9]:
# test group
test_stat, pvalue = shapiro(df_test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


H0 hypothesis could not be rejected because pvalue = 0.1541 > 0.05.

We see that the data in the test group has a normal distribution.

Because 0.05 is not greater than p value.

<a id = "10"></a><br>
#### 6.2.2. Variance Homogeneity Assumption

Variance Homogeneity Assumption (Levene Test)

**H0:** Variances are homogeneous. (p-value < 0.05)

**H1:** Variances are not homogeneous. (not p-value < 0.05)

In [10]:
test_stat, pvalue = levene(df_control["Purchase"], df_test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


H0 hypothesis could be rejected because pvalue = 0.1083 > 0.05.

We see that the data in the test and control group have variances homogeneous.

Because 0.05 is not greater than p value.

<a id = "11"></a><br>
### 6.3. Independent Two Sample T-Test
What was our hypothesis? Let's call it again.
* H0: M1 = M2 (... there is no difference between the mean of the two groups.)
* H1: M1 != M2 (...there is a difference between the ist means between the two group means)


#### p-value indicates
When we use independet two sample t-test, we should consider these values
* p-value <- 0.05 H0 rejected
* p-value <- if not 0.05, H0 can't be rejected

In [11]:
# Independent Two Sample T-Test
test_stat, pvalue = ttest_ind(df_control["Purchase"], df_test["Purchase"], equal_var=True)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


p-value is 0.3493, H0 hypothesis cannot be rejected. There is a significant difference between the means of the two groups

<a id = "12"></a><br>
## 7. References
* https://github.com/mvahit
* https://github.com/mathchi
* https://www.veribilimiokulu.com/
* https://www.linkedin.com/in/vahitkeskin/
* https://en.wikipedia.org/wiki/A/B_testing