## Eniac A/B Testing
### 1. Objective
#### To determine which button version (color and text) leads to the highest click-through rate (CTR) on the homepage.



### 2. Hypotheses

#### •	Null Hypothesis (H0): All versions have the same CTR.
#### •	Alternative Hypothesis (H1): There is a difference in the CTR for the different versions.
#### alpha = 0.05

###  3. Determine Metrics to Track
##### •	Click-through Rate (CTR): Amount of clicks on the button divided by the total visits to the page.
##### •	Drop-off Rate: Percentage of visitors who initiate a conversion process but do not complete it.
##### •	Homepage-return Rate: Measures how often users return to the homepage after clicking the button.

### 4. Experiment Parameters

##### •	Versions to Test:
##### •	White “SHOP NOW”
##### •	Red “SHOP NOW”
##### •	White “SEE DEALS”
##### •	Red “SEE DEALS”

### 5. Data Collection
#### •	Test Period: November 2, 2021, to November 16, 2021 (Period of 14 days)


### 6. Analysis Plan
##### 1. Calculate CTR: click-through rate (CTR) for each button version.
##### 2. Determine the Winner: version with the highest CTR.
##### 3. Statistical Significance Test: chi-square test to check for statistical significance.
##### 4. Additional Metrics: calculateing drop-off rate and homepage-return rate.


In [148]:
import pandas as pd
import re
from scipy.stats import chi2_contingency

# Define file paths
eniac_a_file = '/Users/vee/Downloads/eniac_a.csv'
eniac_b_file = '/Users/vee/Downloads/eniac_b.csv'
eniac_c_file = '/Users/vee/Downloads/eniac_c.csv'
eniac_d_file = '/Users/vee/Downloads/eniac_d.csv'


In [87]:
# Load data into pandas DataFrames
df_a = pd.read_csv(eniac_a_file)
df_b = pd.read_csv(eniac_b_file)
df_c = pd.read_csv(eniac_c_file)
df_d = pd.read_csv(eniac_d_file)


In [121]:
# Set display options
pd.set_option('display.max_columns', None) 
pd.set_option('display.max_colwidth', None) 
pd.set_option('display.width', 1000)  

In [122]:
df_a.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,269,True,Homepage Version A - white SHOP NOW • https://eniac.com/index-a.php
1,25,div,mySidebar,309,True,"created 2021-09-14 • 14 days 0 hours 34 mins • 25326 visits, 23174 clicks"
2,4,a,Mac,279,True,
3,69,a,iPhone,246,True,
4,105,a,Accessories,1235,True,


In [126]:
df_b.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,236,True,Homepage Version B - red SHOP NOW • https://eniac.com/index-b.php
1,25,div,mySidebar,304,True,"created 2021-10-27 • 14 days 0 hours 34 mins • 24747 visits, 22592 clicks"
2,4,a,Mac,268,True,
3,69,a,iPhone,260,True,
4,105,a,Accessories,1214,True,


In [124]:
df_c.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,288,True,Homepage Version C - white SEE DEALS • https://eniac.com/index-c.php
1,25,div,mySidebar,283,True,"created 2021-10-27 • 14 days 0 hours 34 mins • 24876 visits, 23031 clicks"
2,4,a,Mac,262,True,
3,69,a,iPhone,234,True,
4,105,a,Accessories,1288,True,


In [125]:
df_d.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,285,True,Homepage Version D - red SEE DEALS • https://eniac.com/index-d.php
1,25,div,mySidebar,305,True,"created 2021-10-27 • 14 days 0 hours 34 mins • 25233 visits, 23062 clicks"
2,4,a,Mac,274,True,
3,69,a,iPhone,243,True,
4,105,a,Accessories,1267,True,


In [93]:
df_a.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Element ID            57 non-null     int64 
 1   Tag name              57 non-null     object
 2   Name                  57 non-null     object
 3   No. clicks            57 non-null     int64 
 4   Visible?              57 non-null     bool  
 5   Snapshot information  2 non-null      object
dtypes: bool(1), int64(2), object(3)
memory usage: 2.4+ KB


In [94]:
df_b.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Element ID            57 non-null     int64 
 1   Tag name              57 non-null     object
 2   Name                  57 non-null     object
 3   No. clicks            57 non-null     int64 
 4   Visible?              57 non-null     bool  
 5   Snapshot information  2 non-null      object
dtypes: bool(1), int64(2), object(3)
memory usage: 2.4+ KB


In [95]:
df_c.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Element ID            57 non-null     int64 
 1   Tag name              57 non-null     object
 2   Name                  57 non-null     object
 3   No. clicks            57 non-null     int64 
 4   Visible?              57 non-null     bool  
 5   Snapshot information  2 non-null      object
dtypes: bool(1), int64(2), object(3)
memory usage: 2.4+ KB


In [96]:
df_d.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Element ID            57 non-null     int64 
 1   Tag name              57 non-null     object
 2   Name                  57 non-null     object
 3   No. clicks            57 non-null     int64 
 4   Visible?              57 non-null     bool  
 5   Snapshot information  2 non-null      object
dtypes: bool(1), int64(2), object(3)
memory usage: 2.4+ KB


In [140]:
# Extracting the number of clicks for each version
eniac_a_clicks = df_a.loc[df_a["Name"]=="SHOP NOW", "No. clicks"].iloc[0]
eniac_b_clicks = df_b.loc[df_b["Name"]=="SHOP NOW", "No. clicks"].iloc[0]
eniac_c_clicks = df_c.loc[df_c["Name"]=="SEE DEALS", "No. clicks"].iloc[0]
eniac_d_clicks = df_d.loc[df_d["Name"]=="SEE DEALS", "No. clicks"].iloc[0]

In [144]:
eniac_a_clicks, eniac_b_clicks, eniac_c_clicks, eniac_d_clicks

(512, 281, 527, 193)

In [141]:
# Manually provided number of visits for each version
eniac_a_visits = 25326
eniac_b_visits = 24747
eniac_c_visits = 24876
eniac_d_visits = 25233

In [143]:
# Calculate non-clicks
eniac_a_non_clicks = eniac_a_visits - eniac_a_clicks
eniac_b_non_clicks = eniac_b_visits - eniac_b_clicks
eniac_c_non_clicks = eniac_c_visits - eniac_c_clicks
eniac_d_non_clicks = eniac_d_visits - eniac_d_clicks
eniac_a_non_clicks, eniac_b_non_clicks, eniac_c_non_clicks,eniac_a_non_clicks

(24814, 24466, 24349, 24814)

In [155]:
# creating a contingency table with clicks and non-clicks
clicks = [eniac_a_clicks, eniac_b_clicks, eniac_c_clicks, eniac_d_clicks]
noclicks = [eniac_a_non_clicks, eniac_b_non_clicks, eniac_c_non_clicks, eniac_d_non_clicks]

observed_results = pd.DataFrame(data=[clicks, noclicks],
                                columns=["Version_A", "Version_B", "Version_C", "Version_D"],
                                index=["Click", "No-click"])
print("Observed Results:")
observed_results

Observed Results:


Unnamed: 0,Version_A,Version_B,Version_C,Version_D
Click,512,281,527,193
No-click,24814,24466,24349,25040


In [154]:
# perform chi-square test and calculate the results
chi2, p, dof, expected = chi2_contingency(observed_results)

alpha = 0.05

print(f"\nChi-square test result: chi2 = {chi2}, p-value = {p}")

if p > alpha:
    print("Do not reject the null hypothesis")
else:
    print("Reject the null hypothesis")


Chi-square test result: chi2 = 224.01877488058412, p-value = 2.7161216607868712e-48
Reject the null hypothesis


In [157]:
# Calculate CTR for each version
ctr_a = (eniac_a_clicks / eniac_a_visits) * 100
ctr_b = (eniac_b_clicks / eniac_b_visits) * 100
ctr_c = (eniac_c_clicks / eniac_c_visits) * 100
ctr_d = (eniac_d_clicks / eniac_d_visits) * 100

# Create a dictionary with CTR values
ctr_dict = {
    "White SHOP NOW": ctr_a,
    "Red SHOP NOW": ctr_b,
    "White SEE DEALS": ctr_c,
    "Red SEE DEALS": ctr_d
}
winner = max(ctr_dict, key=ctr_dict.get)
print(f"\nThe winning version is '{winner}' with a CTR of {ctr_dict[winner]:.2f}%")


The winning version is 'White SEE DEALS' with a CTR of 2.12%


In [163]:
# Visualize with Plotly
import plotly.graph_objects as go
versions = list(ctr_dict.keys())
ctr_values = list(ctr_dict.values())
colors = ['lightblue' if version != winner else 'brown' for version in versions]

fig = go.Figure(data=[go.Bar(
    x=versions,
    y=ctr_values,
    marker_color=colors
)])

fig.update_layout(
    title="CTR Comparison of Different Versions",
    xaxis_title="Version",
    yaxis_title="Click-Through Rate (CTR)",
    yaxis=dict(tickformat=".2%"),
    showlegend=False,
    width=800, 
    height=500 
)

fig.show()


### Results Analysis
#### Winning Version: “White SEE DEALS”
#### •	CTR: 2.12%
#### •	Significance: This version of the button is the most effective in terms of click-through rate, making it a strong candidate for implementation on the website.