## Website A/B Testing
The goal of this analysis is to determine whether an experimental homepage design is more effective than the existing control homepage in driving user engagement, as measured by the click-through rate. We will conduct an exploratory data analysis to clean and understand the dataset, followed by a statistical test to evaluate the difference in conversion rates between the two groups.

In [11]:
#loading required packages
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns
from statsmodels.stats.proportion import proportions_ztest

In [2]:
#loading dataset
homepage = pd.read_csv("D:\Programming\Datasets\homepage_actions.csv")
homepage.head()

Unnamed: 0,timestamp,id,group,action
0,2016-09-24 17:42:27.839496,804196,experiment,view
1,2016-09-24 19:19:03.542569,434745,experiment,view
2,2016-09-24 19:36:00.944135,507599,experiment,view
3,2016-09-24 19:59:02.646620,671993,control,view
4,2016-09-24 20:26:14.466886,536734,experiment,view


### Exploratory Analysis

#### 1. Investigating the id column


In [5]:
#Number of unique ids
total_viewers = homepage["id"].nunique()
print(f"Total unique viewers: {total_viewers}")

Total unique viewers: 6328


In [6]:
# Number of unique users who clicked
clicked_viewers = homepage[homepage["action"] == "click"]["id"].nunique()
print(f"Total unique viewers who clicked: {clicked_viewers}")

Total unique viewers who clicked: 1860


In [8]:
#Checking for anomalies: Ids who clicked but did not view
viewed_id = set(homepage[homepage["action"] == "view"]["id"])
clicked_id = set(homepage[homepage["action"] == "click"]["id"])

#ids that cliked but did not view
anomalies = clicked_id - viewed_id

print(f"Id's that clicked but did not view: {anomalies}")


Id's that clicked but did not view: set()


#### 2. Check for overlap between control and experiment groups

In [9]:
#Checking for overlap between control and experiment group
control_id = set(homepage[homepage["group"] == "control"]["id"])
experiment_id = set(homepage[homepage["group"] == "experiment"]["id"])

#Finding overlaping ids
overlap_ids = control_id & experiment_id

print(f"Number of overlap between control and experiment group: {overlap_ids}")

Number of overlap between control and experiment group: set()


## Conducting a Statistical Test

#### 1. Defining the hypothesis

- Null hypothesis($\text{H}_0$): The conversion rate of the experiment group is equal to control group
- Alternative hypothesis($\text{H}_a$): The conversion rate of experiment group is different from control group

#### 2. Calculate Conversion Rates

In [12]:
#Grouping data to calculate clicks and views for control and experiment
grouped_data = homepage.groupby(["group", "action"]).size().unstack()

#views and clicks for each group
view_control = grouped_data.loc["control", "view"]
click_control = grouped_data.loc["control", "click"]

view_experiement = grouped_data.loc["experiment", "view"]
click_experiment = grouped_data.loc["experiment", "click"]


#The conversion rates
conversion_control = click_control / view_control
conversion_experiment = click_experiment / view_experiement

print(f"Control Conversion Rate: {conversion_control:.4f}")
print(f"Experiment Conversion Rate: {conversion_experiment:.4f}")

Control Conversion Rate: 0.2797
Experiment Conversion Rate: 0.3097


#### 3. Conduct Two-Proportion Z-Test

In [14]:
#success (clicks) and trial (views) for both groups
success = [click_control, click_experiment]
trial = [view_control, view_experiement]

#performing a two-proportion z-test
z_stat, p_value = proportions_ztest(success, trial)

print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")

Z-statistic: -2.6186
P-value: 0.0088


#### Interpreting the result
- P_value (0.0088) < 0.05, Therefore we reject the null hypothesis and conclude that the new homepage as a significant higher conversion rate.
- The experimental group has a higher conversion rate (30.97%) than control group conversion rate(27.97%) suggesting that experimental homepage is more effective at driving user engagement(clicks) than original homepage

