# Evaluating Website Redesign Impact with A/B Testing


## Business Context and Problem Statement


In this project, a controlled A/B experiment is evaluated to determine whether a new website landing page leads to a meaningful improvement in user conversion compared to the existing design. Users are randomly assigned to either a control group (old landing page) or a treatment group (new landing page), and conversion outcomes are observed. 

The objective of this analysis is to assess whether the observed difference in conversion performance between the two groups is statistically significant and practically meaningful, and to provide a data-driven recommendation on whether the redesigned page should be adopted.

## Dataset Overview

This analysis is based on user-level data from a controlled A/B experiment designed to evaluate the impact of a website landing page redesign on conversion behavior. Each observation represents an individual user interaction recorded during the experiment.

The dataset is structured to allow a comparison between a control group and a treatment group, with conversion outcome as the primary metric of interest. A detailed inspection of the dataset structure and variables is conducted after loading the data.

## Dataset Source

The dataset used in this project was obtained from Kaggle:

- **Source**: Kaggle — A/B Testing Dataset  
- **Link**: https://www.kaggle.com/datasets/zhangluyuan/ab-testing

### Import Required Libraries

In [2]:
import pandas as pd
import numpy as np

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)

### Load Dataset

In [3]:
df = pd.read_csv("../data/ab_data.csv")
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


### Dataset Shape

In [4]:
df.shape

(294478, 5)

### Column Names

In [5]:
df.columns.tolist()

['user_id', 'timestamp', 'group', 'landing_page', 'converted']

### Data Types

In [6]:
df.dtypes

user_id          int64
timestamp       object
group           object
landing_page    object
converted        int64
dtype: object

### Missing Values Check

In [7]:
df.isna().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

### Duplicate Rows Check

In [8]:
df.duplicated().sum()

0

### Observations from Initial Data Inspection

The dataset contains 294,478 user-level observations and five variables relevant to a controlled A/B experiment. All variables are present with appropriate data types, and no missing or duplicate records are observed.

### Unique Values in Key Columns

In [9]:
df["group"].unique(), df["landing_page"].unique(), df["converted"].unique()

(array(['control', 'treatment'], dtype=object),
 array(['old_page', 'new_page'], dtype=object),
 array([0, 1], dtype=int64))

### Distribution of Observations by Group

In [10]:
df["group"].value_counts()

group
treatment    147276
control      147202
Name: count, dtype: int64

### Distribution of Observations by Landing Page

In [11]:
df["landing_page"].value_counts()

landing_page
old_page    147239
new_page    147239
Name: count, dtype: int64

### Group vs Landing Page Cross-Tabulation

In [12]:
pd.crosstab(df["group"], df["landing_page"])

landing_page,new_page,old_page
group,Unnamed: 1_level_1,Unnamed: 2_level_1
control,1928,145274
treatment,145311,1965


### Count of Mismatched Group–Page Assignments


In [13]:
mismatch_mask = (
    ((df["group"] == "control") & (df["landing_page"] != "old_page")) |
    ((df["group"] == "treatment") & (df["landing_page"] != "new_page"))
)

mismatch_mask.sum()

3893

### Experiment Hygiene Observations

A small number of observations exhibit inconsistencies between experimental group assignment and the landing page actually shown to users. Specifically, 3,893 observatio ) involve control users exposed to the new page or treatment users exposed to the old page.


### Create Clean Experiment Dataset

In [14]:
df_clean = df[~mismatch_mask].copy()
df_clean.shape

(290585, 5)

### Clean Experiment Dataset Confirmation

After excluding observations with inconsistent group assignment and page exposure, the final analysis dataset contains 290,585 user-level observations. This cleaned dataset preserves the vast majority of the original data while ensuring a valid comparison between users exposed to the old and new landing pages.


## Experiment Definition and Hypotheses

### Objective
Evaluate whether the redesigned landing page (`new_page`) leads to a meaningful change in user conversion

### Unit of Analysis and Groups
- **Unit of analysis**: individual user  
- **Control**: users exposed to `old_page`  
- **Treatment**: users exposed to `new_page`

### Primary Metric
- **Conversion rate (CR)**: the proportion of users who converted  
$$
CR = \frac{\text{Number of converted users}}{\text{Total users}}
$$

### Hypotheses
Let $p_{old}$ denote the conversion rate for the existing landing page and $p_{new}$ denote the conversion rate for the redesigned landing page.

- **Null hypothesis**:  
  $H_0: p_{new} = p_{old}$

- **Alternative hypothesis**:  
  $H_1: p_{new} \neq p_{old}$

### Significance Level
The significance level for the hypothesis test is set to $\alpha = 0.05$


### Test Type
A two-sided hypothesis test is used, as the redesigned landing page could plausibly lead to either an increase or a decrease in conversion rate.

## Conversion Rate Analysis

### Conversion Rate by Landing Page


In [18]:
# Compute conversion rates by landing page using the cleaned dataset
conversion_rates = (
    df_clean
    .groupby("landing_page")["converted"]
    .mean()
)

conversion_rates

landing_page
new_page    0.118807
old_page    0.120386
Name: converted, dtype: float64

### Observed Conversion Rates

The observed conversion rate for the existing landing page is 12.04%, while the redesigned landing page exhibits a conversion rate of 11.88%. At a descriptive level, the redesigned page shows a slightly lower conversion rate compared to the existing design.


### Absolute Difference and Relative Change

In [19]:
p_old = conversion_rates["old_page"]
p_new = conversion_rates["new_page"]

absolute_diff = p_new - p_old
relative_change = absolute_diff / p_old

absolute_diff, relative_change

(-0.0015790565976871451, -0.0131165800315857)

### Conversion Rate Difference and Practical Impact

The redesigned landing page exhibits an absolute conversion rate difference of −0.16 percentage points relative to the existing design, corresponding to an approximate 1.3% decrease in conversion rate. While the magnitude of this difference is small, its direction suggests that the redesigned page performs slightly worse than the existing version at a descriptive level.