## Background

For any statistical hypothesis testing, we always start with formulating the hypothesis. 

For example, if we wish to prove that the new designed advertisement (treatment) has higher conversion rate compared to existing advertisement (control), our hypothesis will be:

1. Null hypothesis or H0: There is no difference between control and treatment. 
    - In our example, the H0 is 'There is no difference in the conversion rate between existing advertisement and new designed advertisement'.


2. Alternative Hypothesis or H1: There is difference between control and treatment.
    - In our example, the H1 is 'There is difference in the conversion rate between existing advertisement and new designed advertisement'.
    
Notice that both null and alternative hypothesis must account for all possibilities.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as ss 

In [2]:
data = pd.read_csv('dataset/ads_conversion_rate.csv')

In [3]:
print('Total records:', len(data))
print('Total variables:', len(data.columns))
print('Percentage of null in each column:')
round(data.isnull().sum()/len(data)*100,2)

Total records: 10000
Total variables: 10
Percentage of null in each column:


Ad 1     0.0
Ad 2     0.0
Ad 3     0.0
Ad 4     0.0
Ad 5     0.0
Ad 6     0.0
Ad 7     0.0
Ad 8     0.0
Ad 9     0.0
Ad 10    0.0
dtype: float64

In [4]:
data.head()

Unnamed: 0,Ad 1,Ad 2,Ad 3,Ad 4,Ad 5,Ad 6,Ad 7,Ad 8,Ad 9,Ad 10
0,0.232,0.359,0.054,0.048,0.056,0.095,0.113,0.316,0.172,0.03
1,0.007,0.03,0.005,0.112,0.05,0.223,0.117,0.129,0.216,0.026
2,0.248,0.324,0.236,0.2,0.112,0.241,0.062,0.017,0.162,0.193
3,0.027,0.447,0.166,0.198,0.132,0.02,0.006,0.111,0.245,0.13
4,0.131,0.296,0.194,0.199,0.116,0.094,0.195,0.342,0.074,0.179


In [5]:
data.describe()

Unnamed: 0,Ad 1,Ad 2,Ad 3,Ad 4,Ad 5,Ad 6,Ad 7,Ad 8,Ad 9,Ad 10
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,0.124012,0.251815,0.125499,0.125215,0.10063,0.123909,0.125614,0.174963,0.125426,0.125316
std,0.072231,0.141946,0.071719,0.072364,0.028602,0.072826,0.071553,0.101372,0.072347,0.071927
min,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0
25%,0.062,0.128,0.064,0.062,0.076,0.061,0.065,0.086,0.063,0.063
50%,0.124,0.249,0.126,0.126,0.101,0.124,0.127,0.177,0.126,0.125
75%,0.186,0.376,0.187,0.187,0.125,0.187,0.187,0.263,0.188,0.187
max,0.25,0.5,0.25,0.25,0.15,0.25,0.25,0.35,0.25,0.25


Let's assume the data from 'Ad 1' is from the control group whereas the data from 'Ad 2' is from the treatment group.

In [6]:
t_stat, p_value= ss.ttest_ind(data['Ad 1'],data['Ad 2'])

print('Statistics=%.3f, p=%.5f' % (t_stat, p_value))

# interpret
alpha = 0.05

if p_value > alpha:
    print('There is no difference in the conversion rate in showing Ad 1 and 2 to users (fail to reject H0)')
else:
    print('There is difference in the conversion rate in showing Ad 1 and 2 to users (reject H0)')

Statistics=-80.244, p=0.00000
There is difference in the conversion rate in showing Ad 1 and 2 to users (reject H0)


---

### Reference

https://www.analyticsvidhya.com/blog/2020/10/ab-testing-data-science/