# A/B Testing - Web Page

![bell curves](bell_curves.png)

For our data, we’ll use a [dataset from Kaggle](https://www.kaggle.com/zhangluyuan/ab-testing?select=ab_data.csv) which contains the results of an A/B test on what seems to be 2 different designs of a website page (old_page vs. new_page).

Let’s imagine you work on the product team at a medium-sized online e-commerce business. The UX designer worked really hard on a new version of the product page, with the hope that it will lead to a higher conversion rate. The product manager (PM) told you that the current conversion rate is about 13% on average throughout the year, and that the team would be happy with an increase of 2%, meaning that the new design will be considered a success if it raises the conversion rate to 15%.
Before rolling out the change, the team would be more comfortable testing it on a small number of users to see how it performs, so you suggest running an A/B test on a subset of your user base users.

we are interested in capturing the `conversion rate`. A way we can code this is by each user session with a binary variable:

* `0` - The user did not buy the product during this user session
* `1` - The user bought the product during this user session

There are 294478 rows in the DataFrame, each representing a user session, as well as 5 columns :
* `user_id` - The user ID of each session
* `timestamp` - Timestamp for the session
* `group` - Which group the user was assigned to for that session {control, treatment}
* `landing_page` - Which design each user saw on that session {`old_page`, `new_page`}
* `converted` - Whether the session ended in a conversion or not (binary, `0`=not converted, `1`=converted)

In [188]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.stats.api as sms
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from math import ceil
from numpy import std, mean, sqrt

# Check the data

In [94]:
dataset = pd.read_csv('ab_data.csv')
dataset.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [95]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294478 non-null  int64 
 1   timestamp     294478 non-null  object
 2   group         294478 non-null  object
 3   landing_page  294478 non-null  object
 4   converted     294478 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB


In [96]:
dataset['group'].value_counts()

treatment    147276
control      147202
Name: group, dtype: int64

In [97]:
treatment_mask = dataset['group'] == 'treatment'
control_mask = dataset['group'] == 'control'
treatment = dataset[treatment_mask]
control = dataset[control_mask]
#are there people in control group seeing test page or ppl in test group seeing old page ?
mask_cross = control['user_id'].isin(treatment['user_id'])
user_id_to_avoid = control.loc[mask_cross,'user_id']

print('We have',len(user_id_to_avoid),'user_ids to avoid because they saw both pages')
print("let's drop duplicated user_ids to avoid this problem")

We have 1895 user_ids to avoid because they saw both pages
let's drop duplicated user_ids to avoid this problem


In [98]:
before = len(dataset)
dataset = dataset.drop_duplicates(subset = 'user_id')
after = len(dataset)
print(before - after, 'lines were dropped')
treatment_mask = dataset['group'] == 'treatment'
control_mask = dataset['group'] == 'control'
treatment = dataset[treatment_mask]
control = dataset[control_mask]

3894 lines were dropped


In [100]:
display(pd.crosstab(dataset['group'], dataset['landing_page']))
print("we also have group user that saw new pages and treatment users that saw old pages")

landing_page,new_page,old_page
group,Unnamed: 1_level_1,Unnamed: 2_level_1
control,1006,144226
treatment,144314,1038


we also have group user that saw new pages and treatment users that saw old pages


In [101]:
control = control[control['landing_page'] != 'new_page']
treatment = treatment[treatment['landing_page'] != 'old_page']
new_dataset = pd.concat([control,treatment], axis = 0)
pd.crosstab(new_dataset['group'], new_dataset['landing_page'])

landing_page,new_page,old_page
group,Unnamed: 1_level_1,Unnamed: 2_level_1
control,0,144226
treatment,144314,0


# Calculation of the required n (sample size) and sample

* Statistical Power : 1 — β = 0.8 (probability of finding a difference when there's one)
* Thresold α: 0.05
* we want a two sided test

In [41]:
effect_size = sms.proportion_effectsize(0.13, 0.15)    # Calculating effect size based on our expected rates
required_n = sms.NormalIndPower().solve_power( # Solve for a 2 sided test
    effect_size, 
    power=0.8, 
    alpha=0.05, 
    ratio=1,
    alternative='two-sided'
    )            

required_n = ceil(required_n) #rounding up

print('we need at least', required_n, 'people on each group')

we need at least 4720 people on each group


In [167]:
control_sample = control.sample(required_n, random_state=7)
treatment_sample = treatment.sample(required_n, random_state=7)
df_sample = pd.concat([control_sample, treatment_sample], axis =0)

In [171]:
display(df_sample.groupby('group')['converted'].agg([np.mean, np.std, stats.sem]).style.format('{:.3f}') )

print('At first glance, it looks like the new page performes like the previous one')

Unnamed: 0_level_0,mean,std,sem
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
control,0.122,0.328,0.005
treatment,0.127,0.333,0.005


At first glance, it looks like the new page performes like the previous one


# Testing Hypothesis

In [172]:
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

In [179]:
n_con = control_sample['converted'].count()
n_treat = treatment_sample['converted'].count()
successes = [control_sample['converted'].sum(), treatment_sample['converted'].sum()]
nobs = [n_con, n_treat]

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]')

z statistic: -0.65
p-value: 0.513
ci 95% for control group: [0.113, 0.132]
ci 95% for treatment group: [0.117, 0.136]


In [200]:
verbatim = '''
the p-value is way higher than 0.05, which means that we failed to reject the Null hypothesis.
So the new page did not perform significantly better than the old one
'''
print(verbatim)


the p-value is way higher than 0.05, which means that we failed to reject the Null hypothesis.
So the new page did not perform significantly better than the old one

