# A/B testing Library of Montana State University

## Exposition

See: https://github.com/H2605/Montana-Project/tree/main/Sources

## Defined metrics

<ins>Click-through rate (CTR) for the homepage.</ins> Defined as the amount of clicks on the button divided by the total visits to the page. Selected as a measure of the initial ability of the category title to attract users.

<ins>Drop-off rate for the category pages.</ins> Percentage of visitors who leave the site from a given page, selected as a measure of the ability of the category page to meet user expectations.


<ins>Homepage-return rate for the category pages.</ins> Percentage of users who navigated from the library homepage to the category page, then returned back to the homepage. This sequence of actions provides clues as to whether a user discovered the desired option on the category page; if not, the user would likely then return to the homepage to continue navigation. Homepage-return rate was therefore selected as a measure of the ability of the category page to meet user expectations.

## Loading data

In [6]:
import pandas as pd
from scipy import stats

In [7]:
hp_v1 = pd.read_csv('/Users/huyduc/Downloads/CrazyEgg/Homepage Version 1 - Interact, 5-29-2013/Element list Homepage Version 1 - Interact, 5-29-2013.csv')
hp_v2 = pd.read_csv('/Users/huyduc/Downloads/CrazyEgg/Homepage Version 2 - Connect, 5-29-2013/Element list Homepage Version 2 - Connect, 5-29-2013.csv')
hp_v3 = pd.read_csv('/Users/huyduc/Downloads/CrazyEgg/Homepage Version 3 - Learn, 5-29-2013/Element list Homepage Version 3 - Learn, 5-29-2013.csv')
hp_v4 = pd.read_csv('/Users/huyduc/Downloads/CrazyEgg/Homepage Version 4 - Help, 5-29-2013/Element list Homepage Version 4 - Help, 5-29-2013.csv')
hp_v5 = pd.read_csv('/Users/huyduc/Downloads/CrazyEgg/Homepage Version 5 - Services, 5-29-2013/Element list Homepage Version 5 - Services, 5-29-2013.csv')

In [8]:
hp_v1.head(5)
#interact 

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,128,area,Montana State University - Home,1291,False,Homepage Version 1 - Interact • http://www...
1,69,a,FIND,842,True,created 5-29-2013 • 20 days 4 hours 21 min...
2,61,input,s.q,508,True,
3,67,a,lib.montana.edu/find/,166,True,
4,78,a,REQUEST,151,True,


In [9]:
hp_v2.head(5)
#connect

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,74,a,FIND,502,True,Homepage Version 2 - Connect • http://www....
1,66,input,s.q,357,True,created 5-29-2013 • 20 days 7 hours 34 min...
2,72,a,lib.montana.edu/find/,171,True,
3,133,area,Montana State University Libraries - Home,83,False,
4,103,a,Hours,74,True,


In [10]:
hp_v3.head(5)
#learn

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,69,a,FIND,587,True,Homepage Version 3 - Learn • http://www.li...
1,61,input,s.q,325,True,created 5-29-2013 • 20 days 12 hours 21 mi...
2,67,a,lib.montana.edu/find/,142,True,
3,128,area,Montana State University - Home,83,False,
4,98,a,Hours,76,True,


In [11]:
hp_v4.head(5)
#help

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,74,a,FIND,631,True,Homepage Version 4 - Help • http://www.lib...
1,66,input,s.q,364,True,created 5-29-2013 • 20 days 4 hours 59 min...
2,72,a,lib.montana.edu/find/,139,True,
3,133,area,Montana State University - Home,122,False,
4,83,a,REQUEST,72,True,


In [12]:
hp_v5.head(5)
#services

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,69,a,FIND,397,True,Homepage Version 5 - Services • http://www...
1,61,input,s.q,323,True,created 5-29-2013 • 20 days 4 hours 59 min...
2,67,a,lib.montana.edu/find/,106,True,
3,62,button,Search,85,True,
4,98,a,Hours,81,True,


## Calculating the Click-through Rate

To calculate the Click-through Rate, we need to find the amount of clicks and Non-clicks. 
The clicks for the changed button is provided in the table.

In [13]:
hp_v1.query("Name=='INTERACT'")
#interact 

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
9,87,a,INTERACT,42,True,


In [14]:
hp_v2.query("Name=='CONNECT'")
#connect

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
6,92,a,CONNECT,53,True,


In [15]:
hp_v3.query("Name=='LEARN'")
#learn

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
10,87,a,LEARN,21,True,


In [16]:
hp_v4.query("Name=='HELP'")
#help

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
7,92,a,HELP,38,True,


In [17]:
hp_v5.query("Name=='SERVICES'")
#services

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
7,87,a,SERVICES,45,True,


For the Non-clicks we substract the amount of visits with the given number of clicks

In [18]:
visits_v1=10283
visits_v2=2742
visits_v3=2747
visits_v4=3180
visits_v5=2064

In [19]:
no_cl_1=visits_v1-42
no_cl_2=visits_v2-53
no_cl_3=visits_v3-21
no_cl_4=visits_v4-38
no_cl_5=visits_v5-45
no_cl_1,no_cl_2,no_cl_3,no_cl_4,no_cl_5

(10241, 2689, 2726, 3142, 2019)

In [20]:
crt_v1=42/visits_v1
crt_v2=53/visits_v2
crt_v3=21/visits_v3
crt_v4=38/visits_v4
crt_v5=45/visits_v4

In [21]:
crts_1 = [crt_v1,crt_v2,crt_v3,crt_v4,crt_v5]
Table_1= pd.DataFrame([crts_1], 
                 columns = ["Version 1 (Interact)", "Version 2 (Connect)","Version 3 (Learn)", "Version 4 (Help)","Version 5 (Services)"],
                 index = ["Click-through rate"])

In [22]:
Table_1

Unnamed: 0,Version 1 (Interact),Version 2 (Connect),Version 3 (Learn),Version 4 (Help),Version 5 (Services)
Click-through rate,0.004084,0.019329,0.007645,0.01195,0.014151


Out of the five Homepage versions,  Version 5 (Conncect) has the best Click-trough rate.

## Chi Square Test

Null Hypothesis: 
The 5 versions of the button are equally likely to receive clicks, and the observed differences are due to chance.

Alternative Hypothesis: 
The observed differences are not due to chance: there are versions of the button that are more likely to receive clicks (i.e. they have a better CTR, a better performance).

The desired statistical significance was set at 90%. From this we define alpha as 10%.

In [23]:
alpha=0.1

### Approach Nr.1

In [24]:

clicks_1 = [42,53,21,38,45]
non_click_1 = [10241, 2689, 2726, 3142, 2019]

In [25]:
obs_1= pd.DataFrame([clicks_1,non_click_1], 
                 columns = ["Version 1 (Interact)", "Version 2 (Connect)","Version 3 (Learn)", "Version 4 (Help)","Version 5 (Services)"],
                 index = ["Clicks","No-Click"])
obs_1

Unnamed: 0,Version 1 (Interact),Version 2 (Connect),Version 3 (Learn),Version 4 (Help),Version 5 (Services)
Clicks,42,53,21,38,45
No-Click,10241,2689,2726,3142,2019


In [26]:
test_statistic_mt_1, p_value_mt_1, df_mt_1, expected_mt_1 = stats.chi2_contingency(obs_1)
test_statistic_mt_1.round(4), p_value_mt_1.round(4), df_mt_1

(96.7432, 0.0, 4)

In [27]:
if p_value_mt_1 > alpha:
  print('Do not reject the Null Hypothesis.')
else:
  print('Start Approach Nr.2')

Start Approach Nr.2


As the p-Value is still lower than given α, we kick out Version 1 (Interact) since it has the worst performance of the remaining three versions

### Approach Nr.2

In [28]:
crts_2 = [crt_v2,crt_v3,crt_v4,crt_v5]
Table_2= pd.DataFrame([crts_2], 
                 columns = [ "Version 2 (Connect)","Version 3 (Learn)", "Version 4 (Help)","Version 5 (Services)"],
                 index = ["Click-through rate"])
Table_2

Unnamed: 0,Version 2 (Connect),Version 3 (Learn),Version 4 (Help),Version 5 (Services)
Click-through rate,0.019329,0.007645,0.01195,0.014151


In [29]:
clicks_2 = [53,21,38,45]
non_click_2 = [2689, 2726, 3142, 2019]
obs_2= pd.DataFrame([clicks_2,non_click_2], 
                 columns = ["Version 2 (Connect)","Version 3 (Learn)", "Version 4 (Help)","Version 5 (Services)"],
                 index = ["Clicks","No-Click"])
obs_2

Unnamed: 0,Version 2 (Connect),Version 3 (Learn),Version 4 (Help),Version 5 (Services)
Clicks,53,21,38,45
No-Click,2689,2726,3142,2019


In [30]:
test_statistic_mt_2,p_value_mt_2,df_mt_2,expected_mt_2=stats.chi2_contingency(obs_2)
test_statistic_mt_2.round(4),p_value_mt_2.round(4),df_mt_2

(22.451, 0.0001, 3)

In [31]:
if p_value_mt_2 > alpha:
  print('Do not reject the Null Hypothesis.')
else:
  print('Start Approach Nr.3')

Start Approach Nr.3


### Approach Nr.3

As the p-Value is still lower than given α, we kick out Version 3(Learn) since it has the worst performance of the remaining three versions

In [32]:
#Approach nr3 
clicks_3 = [53,38,45]
non_click_3 = [2689, 3142, 2019]
obs_3= pd.DataFrame([clicks_3,non_click_3], 
                 columns = ["Version 2 (Connect)", "Version 4 (Help)","Version 5 (Services)"],
                 index = ["Clicks","No-Click"])
obs_3

Unnamed: 0,Version 2 (Connect),Version 4 (Help),Version 5 (Services)
Clicks,53,38,45
No-Click,2689,3142,2019


In [33]:
test_statistic_mt_3, p_value_mt_3, df_mt_3, expected_mt_3 = stats.chi2_contingency(obs_3)

In [34]:
test_statistic_mt_3.round(4),p_value_mt_3.round(4),df_mt_3

(8.5768, 0.0137, 2)

In [35]:
if p_value_mt_3 > alpha:
  print('Do not reject the Null Hypothesis.')
else:
  print('Start Approach Nr 4')

Start Approach Nr 4


### Approach Nr.4

As the p-Value is still lower than given α, we kick out Version 4 (Help) since it has the worst performance of the remaining three versions

In [36]:
clicks_4 = [53,45]
non_click_4 = [2689, 2019]
obs_4= pd.DataFrame([clicks_4,non_click_4], 
                 columns = ["Version 2 (Connect)","Version 5 (Services)"],
                 index = ["Clicks","No-Click"])
obs_4

Unnamed: 0,Version 2 (Connect),Version 5 (Services)
Clicks,53,45
No-Click,2689,2019


In [37]:
crts_4 = [crt_v2,crt_v5]
Table_4= pd.DataFrame([crts_4], 
                 columns = [ "Version 2 (Connect)","Version 5 (Services)"],
                 index = ["Click-through rate"])
Table_4

Unnamed: 0,Version 2 (Connect),Version 5 (Services)
Click-through rate,0.019329,0.014151


In [38]:
test_statistic_mt_4, p_value_mt_4, df_mt_4, expected_mt_4 = stats.chi2_contingency(obs_4)

In [39]:
test_statistic_mt_4.round(4),p_value_mt_4.round(4),df_mt_4

(0.2474, 0.6189, 1)

In [41]:
if p_value_mt_4 > alpha:
  print('Do not reject the Null Hypothesis.')
else:
  print('Start Approach Nr 4')

Do not reject the Null Hypothesis.


In approach Nr. 4 the p-Value is higher than the significance level α, which leads to not rejecting the Null Hypothesis. As result of that we have to decide between the Version 2 (Connect) and Version 5 (Service).

In addition to that, we were provided with information regarding the metrics Drop-off Rate and Homepage-return Rate. 

The bar chart show, that the Drop-off and Homepage-return Rates for Version 2 (Connect) are significantly higher than for Version 5 (Services).
Reference: https://platform.wbscodingschool.com/courses/data-science/10968/

## Interpret the data 

Given that, in our opinion, the metric drop off rate is clearer and leaves less room for interpretation than the homepage return rate, we conclude the University should choose Homepage Version 2 (Connect).


Sample solution: https://drive.google.com/file/d/1h8ZG8gZtFChCdDqccuvy2987sefU8JI_/view?usp=sharing