In [None]:
## 📘 Introduction

This case study evaluates a marketing A/B test designed to measure the effectiveness of ad campaigns in converting users. The dataset includes control (PSA) and experimental (ad) groups, and we aim to:

- Analyze group performance and test if ad exposure improves conversion.
- Assess statistical significance using a Chi-square test.
- Estimate potential revenue uplift from successful ad conversions.
- Provide actionable insights for marketing strategy improvements.


In [None]:
# Load libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import chi2_contingency

# Style
sns.set(style="whitegrid")

# Load dataset
df = pd.read_csv("marketing_AB.csv")
df.head()


In [None]:
# Basic structure
df.info()
df.describe(include="all")


In [None]:
# Check for missing values
print("Missing values:\n", df.isnull().sum())

# Drop rows with critical missing data if any
df.dropna(subset=['user id', 'test group', 'converted'], inplace=True)

# Check duplicates
print(f"Duplicate rows: {df.duplicated().sum()}")

# Drop duplicates
df = df.drop_duplicates()

# Check column types
df['converted'] = df['converted'].astype(bool)
df['test group'] = df['test group'].astype('category')
df['most ads day'] = df['most ads day'].astype('category')


In [None]:
## 📊 Exploratory Data Analysis

We start by examining the basic distribution of ads, conversions, and test group behavior.


In [None]:
# Conversion rate by group
group_conv = df.groupby("test group")["converted"].mean().reset_index()

sns.barplot(data=group_conv, x="test group", y="converted", palette="viridis")
plt.title("Conversion Rate by Test Group")
plt.ylim(0, 0.03)
plt.show()


In [None]:
# Total ads vs conversion
sns.boxplot(data=df, x="converted", y="total ads", palette="Set2")
plt.yscale("log")
plt.title("Total Ads Seen by Conversion Outcome")
plt.show()


In [None]:
# Conversion by day
day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
sns.barplot(data=df.groupby("most ads day")["converted"].mean().reindex(day_order).reset_index(),
            x="most ads day", y="converted", palette="crest")
plt.title("Conversion Rate by Day of Most Ads Seen")
plt.xticks(rotation=45)
plt.show()


In [None]:
# Conversion by hour
hourly = df.groupby("most ads hour")["converted"].mean().reset_index()
sns.lineplot(data=hourly, x="most ads hour", y="converted", marker='o', color='teal')
plt.title("Conversion Rate by Hour of Most Ads Seen")
plt.xticks(range(0, 24))
plt.show()


In [None]:
## 📐 Hypothesis Testing (A/B Test)

We use a Chi-square test to determine if the observed differences in conversion rates between the ad and PSA groups are statistically significant.

**Null Hypothesis (H₀)**: Conversion rate is the same for both groups.  
**Alternative Hypothesis (H₁)**: Conversion rate is different between groups.


In [None]:
contingency = pd.crosstab(df['test group'], df['converted'])
chi2, p, dof, expected = chi2_contingency(contingency)

print(f"Chi2 Statistic: {chi2:.2f}")
print(f"P-value: {p:.4f}")
if p < 0.05:
    print("✅ Statistically significant difference — ads likely impacted conversion.")
else:
    print("❌ No statistically significant difference found.")


In [None]:
## 💰 Revenue Estimation

Assuming a fixed revenue of $50 per successful conversion, we estimate potential ad campaign earnings.


In [None]:
revenue_per_conversion = 50
ad_conversions = contingency.loc['ad', True]
estimated_revenue = ad_conversions * revenue_per_conversion

print(f"Estimated Revenue from Ads: ${estimated_revenue:,.2f}")


In [None]:
## 📈 Insights and Business Recommendations

### ✅ Key Insights:
- Ads resulted in a **higher conversion rate** than PSA, supported by statistical testing.
- Most conversions occurred during **midday hours (11 AM – 3 PM)** and on **weekdays**, indicating optimal ad delivery windows.
- Users who converted saw a moderate number of ads, suggesting a **sweet spot for ad frequency**.

### 💡 Recommendations:
1. **Scale the ad campaign** to reach more users in the "PSA" group.
2. **Target ad delivery during peak conversion hours** (e.g., lunch breaks).
3. **Optimize ad frequency** to avoid oversaturation but maintain visibility.
4. **Run segmentation experiments** (e.g., by demographics or behavior) to fine-tune targeting.
