**Scenario:**

An online bicycle store has changed its home page interface to encourage visitors to click through to its loyalty programme sign-up page. It hopes the new interface will encourage more visitors to access the loyalty programme page, to see what benefits the programme brings, and to sign up. The current click-through rate (CTR) is around 50% annually, and the company hopes the new design will push this to at least 55%.



## 1. Conduct power analysis 

In [3]:
# import necessary libraries
import statsmodels.stats.api as sms
from statsmodels.stats.power import TTestIndPower

# define parameters
alpha = 0.05
power = 0.80
ratio = 1.0
effect = sms.proportion_effectsize(0.50,0.55)

# specify instance for TTestIndPower
analysis = TTestIndPower()

# Calculate sample size and power analysis by using the function solve_power()
results = analysis.solve_power(effect,
                               alpha = alpha,
                               power = power,
                               ratio = ratio,
                               nobs1 = None)

# Print sample size 
print('Sample Size: %.3f' % results)

Sample Size: 1565.490


  return np.clip(_boost._nct_sf(x, df, nc), 0, 1)
  return np.clip(_boost._nct_cdf(x, df, nc), 0, 1)
  return np.clip(_boost._nct_sf(x, df, nc), 0, 1)
  return np.clip(_boost._nct_cdf(x, df, nc), 0, 1)


## 2.  Prepare Data in Python

In [5]:
# Import required libraries, packages and classes
import pandas as pd
import numpy as np
import math
import statsmodels.stats.api as sms
import scipy.stats as st
import matplotlib as mpl
import matplotlib.pyplot as plt

In [7]:
# import required dataset and create dataframe
df = pd.read_csv('bike_shop.csv')

# View dataset
df.head()

Unnamed: 0,RecordID,IP Address,LoggedInFlag,ServerID,VisitPageFlag
0,1,39.13.114.2,1,2,0
1,2,13.3.25.8,1,1,0
2,3,247.8.211.8,1,1,0
3,4,124.8.220.3,0,3,0
4,5,60.10.192.7,0,2,0


In [8]:
# View metadata
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184588 entries, 0 to 184587
Data columns (total 5 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   RecordID       184588 non-null  int64 
 1   IP Address     184588 non-null  object
 2   LoggedInFlag   184588 non-null  int64 
 3   ServerID       184588 non-null  int64 
 4   VisitPageFlag  184588 non-null  int64 
dtypes: int64(4), object(1)
memory usage: 7.0+ MB


In [22]:
# Create new dataframe to store cleaned data
# rename column names to eliminate unnecassary spaces
df_new = df.rename(columns={'IP Address':'IPAddress',
                            'LoggedInFlag':'LoyaltyPage'})

# Remove duplicate values 
df_new.drop_duplicates(subset = 'IPAddress',
                      keep = False,
                      inplace = True) 

# Remove unneeded columns
df_final = df_new.drop(['RecordID','VisitPageFlag'], axis = 1)

# View final dataframe
df_final.head()

Unnamed: 0,IPAddress,LoyaltyPage,ServerID
7,97.6.126.6,0,3
12,188.13.62.2,0,3
14,234.1.239.1,0,2
15,167.15.157.7,0,2
16,123.12.229.8,0,1


In [30]:
# View metadata 
print(df_final.shape)
df_final.info()
df_final.head()

(39608, 4)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 39608 entries, 7 to 184584
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   IPAddress    39608 non-null  object
 1   LoyaltyPage  39608 non-null  int64 
 2   ServerID     39608 non-null  int64 
 3   Group        39608 non-null  object
dtypes: int64(2), object(2)
memory usage: 1.5+ MB


Unnamed: 0,IPAddress,LoyaltyPage,ServerID,Group
7,97.6.126.6,0,3,Control
12,188.13.62.2,0,3,Control
14,234.1.239.1,0,2,Control
15,167.15.157.7,0,2,Control
16,123.12.229.8,0,1,Treatment


## 3. Perform random sampling with Pandas

In [32]:
# Apply mapping using map() method
df_final['Group'] = df_final['ServerID'].map({1: 'Treatment',
                                             2: 'Control',
                                             3: 'Control'})

#View dataframe
print(df_final.shape)
df_final.head()

(39608, 4)


Unnamed: 0,IPAddress,LoyaltyPage,ServerID,Group
7,97.6.126.6,0,3,Control
12,188.13.62.2,0,3,Control
14,234.1.239.1,0,2,Control
15,167.15.157.7,0,2,Control
16,123.12.229.8,0,1,Treatment


In [49]:
df_final['Group'].value_counts()

Control      26310
Treatment    13298
Name: Group, dtype: int64

In [54]:
# Obtain a simple random sampling for control and treatment groups with sample size
# Set a random_stategenerator at an arbitraury value of 42

control = df_final[df_final['Group'] == 'Control'].sample(n=1566,
                                                          random_state = 42)

treatment = df_final[df_final['Group'] == 'Treatment'].sample(n=1566,
                                                          random_state = 42)

# Join both dataframes using concat
ab_test = pd.concat([control,treatment], axis = 0)

# reset index for ab_test
ab_test.reset_index(drop = True, inplace = True)

ab_test

Unnamed: 0,IPAddress,LoyaltyPage,ServerID,Group
0,25.16.126.2,1,3,Control
1,106.13.67.3,1,3,Control
2,169.11.137.7,0,2,Control
3,164.9.86.8,1,2,Control
4,112.12.25.7,0,2,Control
...,...,...,...,...
3127,187.4.117.9,1,1,Treatment
3128,134.0.112.5,1,1,Treatment
3129,7.3.242.7,0,1,Treatment
3130,118.14.226.4,0,1,Treatment


## 4. Perform A/B test ( Analyse dataset )

In [55]:
# import required packages
from scipy.stats import sem

# Group data by group and aggregate by loyalty page
conversion_rates = ab_test.groupby('Group')['LoyaltyPage']

# Calculate standard deviation and sem
STD_p = lambda x: np.std(x, ddof=0)
SE_p = lambda x: st.sem(x, ddof=0)

# Aggregate data using mean, std and sem
conversion_rates = conversion_rates.agg([np.mean, np.std, sem])

# Assign to column names
conversion_rates.columns = ['conversion_rate',
                            'std',
                            'sem']
# Convvert to a pandas dataframe 
cr = pd.DataFrame(conversion_rates)

# View output
cr

Unnamed: 0_level_0,conversion_rate,std,sem
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,0.531928,0.499139,0.012613
Treatment,0.483397,0.499884,0.012632


In [56]:
# Import required libraries to calculate p-value and confidence intervals
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

# Create subsets of control and treatment results
control_results = ab_test[ab_test['Group'] == 'Control']['LoyaltyPage']
treatment_results = ab_test[ab_test['Group'] == 'Treatment']['LoyaltyPage']

# Determine count of control and treatment results and assign respectable variables
n_con = control_results.count()
n_treat = treatment_results.count()

# Create variable to store sum of both sub-sets in a list
successes = [control_results.sum(), treatment_results.sum()]

# Create variable to store count of both subsets in a list
nobs = [n_con, n_treat]

# Use the imported libraries to calculate the statistical values.
z_test, pval = proportions_ztest(successes, nobs = nobs)
(lower_con, lower_treat),(upper_con,upper_treat) = proportion_confint(successes,
                                                                       nobs = nobs,
                                                                       alpha = 0.05)

#print z_test, p-value, and confidence level of 95% for control and treatment group
print(f"Z test stat: {z_test:.3f}")
print(f"P-value: {pval:.3f}")
print(f"Confidence Interval of 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]")
print(f"Confidence Interval of 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]")

Z test stat: 2.716
P-value: 0.007
Confidence Interval of 95% for control group: [0.507, 0.557]
Confidence Interval of 95% for treatment group: [0.459, 0.508]


**Conclusion:**
Changes decreased visits to homepage,
P-value is less than 5% therefore, we reject the null hypothesis