# A/B Hypothesis Testing

**Background information:**
An e-commerce company is revamping a landing page after various analysis and research. The company want to experiment whether the new landing page will give better conversion rate before rolling out to a wider audience. 

**Data:**
We were given the experiment result from control and experimental/treatment group. We have hypothesis that the new page (treatment group) will give a better conversion rate.

**Goal:**
Test whether the hypothesis is proven or not

In [1]:
import pandas as pd
import numpy as np

In [2]:
url = 'https://docs.google.com/spreadsheets/d/1XZ6SjnbAs_bHGdznm8sUEYqmPbNlrMabIueGvpmxi2E/edit#gid=842283717'
url = url.replace('/edit#gid=', '/export?format=csv&gid=')
df = pd.read_csv(url)
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:49,control,old_page,0
1,804228,2017-01-12 8:01:45,control,old_page,0
2,661590,2017-01-11 16:55:06,treatment,new_page,0
3,853541,2017-01-08 18:28:03,treatment,new_page,0
4,864975,2017-01-21 1:52:26,control,old_page,1


## Data Preparation & Cleaning

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   user_id       5000 non-null   int64 
 1   timestamp     5000 non-null   object
 2   group         5000 non-null   object
 3   landing_page  5000 non-null   object
 4   converted     5000 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 195.4+ KB


In [4]:
print('Number of missing data for each column:')
print(df.isnull().sum())
print('\nNumber of duplicated data:', df.duplicated().sum())

Number of missing data for each column:
user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

Number of duplicated data: 0


In [5]:
print('Checking typos in column group and landing_page:')
print('Column group:', df.iloc[:,2].unique())
print('Column landing_page:', df.iloc[:,3].unique())

Checking typos in column group and landing_page:
Column group: ['control' 'treatment']
Column landing_page: ['old_page' 'new_page']


In [6]:
# Dropping irrelevant column
df.drop(columns=['user_id', 'timestamp'], inplace=True)

In [7]:
control = (df.iloc[:,0]=='control') & (df.iloc[:,1]=='old_page')
df_control = df[control].copy()
df_control.head()

Unnamed: 0,group,landing_page,converted
0,control,old_page,0
1,control,old_page,0
4,control,old_page,1
5,control,old_page,0
7,control,old_page,0


In [8]:
treatment = (df.iloc[:,0]=='treatment') & (df.iloc[:,1]=='new_page')
df_treatment = df[treatment].copy()
df_treatment.head()

Unnamed: 0,group,landing_page,converted
2,treatment,new_page,0
3,treatment,new_page,0
6,treatment,new_page,1
8,treatment,new_page,1
9,treatment,new_page,1


## EDA

In [16]:
print('Conversion rate in control group:', (df_control.iloc[:,2].mean()*100).round(2),'%')
print('Conversion rate in treatment group:', (df_treatment.iloc[:,2].mean()*100).round(2),'%')

Conversion rate in control group: 12.61 %
Conversion rate in treatment group: 13.17 %


## Hypothesis Testing

**Null Hypothesis (H0)**: Conversion rate in old page **less than or equal to** the conversion rate in new page

**Alternative Hypothesis (H1)**: Conversion rate in old page **better than** the conversion rate in new page

In [26]:
from statsmodels.stats.weightstats import ztest
(stat, pvalue) = ztest(df_treatment.iloc[:,2], df_control.iloc[:,2], alternative='larger')
print('Z-score =', stat)
print('p-value =', pvalue)
if (pvalue<0.05):
    print('The p=value is less than 0.05, we have enough evidence to reject null hypothesis')
else:
    print('The p=value is more than 0.05, we do not have enough evidence to reject null hypothesis')

Z-score = 0.5787432513872296
p-value = 0.2813812136899914
The p=value is more than 0.05, we do not have enough evidence to reject null hypothesis


**Result**

We don't have enough evidence to reject null hypothesis, thus we **accept null hypothesis.** New landing page doesn't get better conversion rate than the old one. 

## Recommendation
The company can keep using the old landing page.