<a href="https://www.kaggle.com/code/angelchaudhary/a-b-test-analysis-for-product-decisions?scriptVersionId=290577868" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Did This Feature Actually Improve Conversions?

# Introduction
Product teams often release new features believing they will improve user conversion.However, without proper A/B testing, it is unclear whether the change truly helped users or if the observed improvement happened by chance. This case study analyzes an A/B experiment to determine whether a new product feature led to a **real and statistically significant improvement in conversion rate**.

---

#### Why This Case Study
- Many product decisions are made based on assumptions instead of data  
- A/B testing is a core skill for **product analytics, growth, and business analytics roles**
- This case study demonstrates how data can **validate or reject a product decision**

#### What We Are Solving
- Did the new feature perform better than the existing one?
- Is the difference in conversion **statistically significant**?
- Should the product team **roll out or revert** the feature?

---

## Approach
1. Split users into **Control (A)** and **Variant (B)** groups  
2. Compare conversion rates between both groups  
3. Perform statistical testing to measure significance  
4. Interpret results from a **business decision perspective**

The final outcome focuses not just on numbers but on what action the product team should take.

# LET'S DO IT!!!
![funny gif](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExaGs3eW91dXdpZTFmNm1jYzhlMmRnN2tlc3Fqb2xwanBwNWU5cjE2biZlcD12MV9naWZzX3NlYXJjaCZjdD1n/5RNNQvq3fhYlOYDIQ2/giphy.gif)

### Dataset Description

This dataset contains user-level transaction data including demographics and purchase behavior. Each row represents a purchase made by a user on an e-commerce platform.

In [13]:
import kagglehub

path = kagglehub.dataset_download("refiaozturk/online-shopping-dataset")

print("Path to dataset files:", path)

Path to dataset files: /kaggle/input/online-shopping-dataset


In [14]:
import pandas as pd
import numpy as np 

df = pd.read_csv("/kaggle/input/online-shopping-dataset/dataset.csv")
df.describe()

Unnamed: 0,User ID,Age,Purchase Amount
count,15000.0,13500.0,13200.0
mean,7500.5,43.396,253.21772
std,4330.271354,14.927082,143.113919
min,1.0,18.0,5.05
25%,3750.75,31.0,130.335
50%,7500.5,43.0,253.645
75%,11250.25,56.0,378.585
max,15000.0,69.0,499.95


In [15]:
df.head(3)

Unnamed: 0,User ID,Age,Gender,Country,Purchase Amount,Purchase Date,Product Category
0,1,56.0,Female,USA,331.79,2021-11-21,Sports
1,2,69.0,Male,Australia,335.72,2022-03-05,Home & Kitchen
2,3,46.0,,Germany,493.18,,Books


In [16]:
# Check missing values
df.isna().sum()

User ID                0
Age                 1500
Gender               750
Country             1200
Purchase Amount     1800
Purchase Date       1050
Product Category     900
dtype: int64

In [17]:
df = df.dropna(subset=["Age", "Purchase Amount", "Purchase Date"])

# Fill missing gender as 'Unknown'
df["Gender"] = df["Gender"].fillna("Unknown")
df["Purchase Date"] = pd.to_datetime(df["Purchase Date"])
df.shape

(11035, 7)

## Experiment Setup
Since the dataset does not contain explicit A/B test labels, we simulate a real-world product experiment by randomly assigning users into control and treatment groups. The experiment assumes a new product feature aimed at improving purchase behavior.

In [18]:
# create A/B Test groups
np.random.seed(42)

df["experiment_group"] = np.random.choice(["Control", "Treatment"],size=len(df),p=[0.5, 0.5])
df["experiment_group"].value_counts()

experiment_group
Control      5580
Treatment    5455
Name: count, dtype: int64

## Conversion Definition
A user is considered converted if a purchase was made. Since all rows represent purchases, conversion is defined at the user level.

## Conversion Limitation & Assumption
The dataset only contains users who completed a purchase. To simulate a realistic product experiment, additional non-converting users are introduced to represent users who were exposed to the feature but did not purchase.

In [19]:
df["converted"] = 1

In [20]:
# Current purchasers
purchased_users = user_level_df.copy()
num_purchased = len(purchased_users)

# Estimate total exposed users
total_users = int(num_purchased / 0.20)
num_non_converted = total_users - num_purchased

num_purchased, num_non_converted

(11035, 44140)

In [21]:
np.random.seed(42)

non_converted_users = pd.DataFrame({
    "User ID": range(
        user_level_df["User ID"].max() + 1,
        user_level_df["User ID"].max() + 1 + num_non_converted
    ),
    "experiment_group": np.random.choice(
        ["Control", "Treatment"],
        size=num_non_converted,
        p=[0.5, 0.5]
    ),
    "converted": 0,
    "total_purchase_amount": 0
})

In [22]:
ab_final_df = pd.concat([purchased_users, non_converted_users],ignore_index=True)
ab_final_df.head()

Unnamed: 0,User ID,experiment_group,converted,total_purchase_amount
0,1,Control,1,331.79
1,2,Treatment,1,335.72
2,4,Treatment,1,80.97
3,7,Treatment,1,222.2
4,8,Control,1,217.27


## A/B Test Metrics
We compare conversion rate and average revenue between Control and Treatment groups
to evaluate the impact of the new product feature.

In [23]:
ab_final_df.groupby("experiment_group").agg(
    users=("User ID", "count"),
    conversion_rate=("converted", "mean"),
    avg_revenue=("total_purchase_amount", "mean")
)

Unnamed: 0_level_0,users,conversion_rate,avg_revenue
experiment_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,27666,0.201692,50.961479
Treatment,27509,0.198299,50.18721


## Statistical Significance Testing

A two-proportion z-test is used to determine whether the difference in conversion
rates between Control and Treatment is statistically significant.

In [24]:
from statsmodels.stats.proportion import proportions_ztest

# Aggregate conversions
summary = ab_final_df.groupby("experiment_group").agg(conversions=("converted", "sum"),users=("User ID", "count"))
summary

Unnamed: 0_level_0,conversions,users
experiment_group,Unnamed: 1_level_1,Unnamed: 2_level_1
Control,5580,27666
Treatment,5455,27509


In [25]:
# Z-test
count = summary["conversions"].values
nobs = summary["users"].values

z_stat, p_value = proportions_ztest(count, nobs)

z_stat, p_value

(0.9961997873769245, 0.31915308270624365)

### Interpretation

The p-value (0.32) is greater than the standard significance level of 0.05. This means we fail to reject the null hypothesis.

There is no statistically significant difference in conversion rates between
the Control and Treatment groups.

## Product Decision
Based on the results, the new feature does not demonstrate a statistically
significant improvement in conversion rate.

The recommended action is to not roll out the feature in its current form.
Further iterations, longer experiment duration, or targeting specific user
segments may be required before re-testing.

## Final Summary & Takeaways

This case study evaluated whether a newly introduced product feature led to a
meaningful improvement in user conversion using an A/B testing framework.

Users were split into Control (A) and Treatment (B) groups, and conversion behavior
was analyzed at the user level. A two-proportion z-test was conducted to determine
whether the observed difference in conversion rates was statistically significant.

The results showed **no statistically significant difference** between the Control
and Treatment groups (p-value = 0.32). This indicates that the new feature did not
have a measurable impact on conversion during the experiment period.

From a product decision perspective, the data does not support rolling out the
feature in its current form. The recommended action is to iterate on the feature,
run the experiment for a longer duration, or test it on specific user segments
before making a rollout decision.

Overall, this analysis demonstrates how A/B testing can be used to validate product
decisions, avoid false positives, and ensure that changes are driven by data rather
than assumptions.