Skip to content

AndyNavarro77/ab-testing-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧪 A/B Testing Framework — E-Commerce Conversion Rate Optimization

End-to-end experimentation framework that analyzes 500K+ user sessions, applies frequentist and Bayesian statistical methods, and delivers a $10.6M projected annual revenue lift — with 100% confidence that the new checkout design wins.

Python SciPy Statsmodels MySQL Power BI Status


🧠 The Business Problem

An e-commerce company redesigned its checkout page to improve conversion — streamlined layout, mobile-optimized, reduced friction. Before rolling out to 100% of traffic, the product team needs to answer three questions:

  • Is the improvement real or just random variation?
  • How large is the effect — and is it practically meaningful?
  • Which user segments benefit the most from the change?

Making the wrong call has a real cost:

  • Ship a losing variant → lose revenue at scale
  • Hold back a winning variant → leave millions on the table
  • Ignore segment differences → miss optimization opportunities

This framework answers those questions with statistical rigor.


✅ The Solution

A complete A/B testing pipeline that generates realistic experiment data (500K users, industry-standard parameters), runs both frequentist (Z-test, T-test) and Bayesian statistical analysis, validates experiment integrity, performs segmentation analysis across device/country/channel/user type, and projects business impact at scale.

From experiment design to ship/no-ship decision — with p<0.001, 100% Bayesian confidence, and $10.6M in projected annual revenue lift.


📐 Architecture Overview

┌─────────────────────┐    ┌──────────────────────┐    ┌─────────────────────┐
│  Synthetic Data     │───▶│   Python Analysis    │───▶│     MySQL DB        │
│  500K users         │    │  Stats · Bayesian    │    │  4 structured tables│
│  Industry params    │    │  Segmentation        │    │  ab_results         │
└─────────────────────┘    └──────────────────────┘    │  ab_summary         │
                                                        │  statistical_results│
                                                        │  business_impact    │
                                                        └──────────┬──────────┘
                                                                   │
                                                    ┌──────────────▼──────────────┐
                                                    │      Power BI Dashboard      │
                                                    │  Experiment Results Monitor  │
                                                    └─────────────────────────────┘

🔄 Framework — Step by Step

Step Action Technology Business Value
1 Experiment design — sample size, power, duration statsmodels Ensure experiment is adequately powered before launch
2 Data generation — 500K users, realistic parameters Python · numpy Industry-standard CR, AOV, device and segment distributions
3 EDA — group balance, daily trends, segment breakdown Python · pandas · seaborn Validate randomization and understand baseline metrics
4 SRM check — sample ratio mismatch detection scipy Detect broken randomization before analysis
5 Z-test — conversion rate significance scipy · statsmodels Frequentist test for primary metric
6 T-test — revenue per user & AOV significance scipy Validate secondary metrics
7 Effect size — Cohen's h scipy Measure practical significance beyond p-value
8 Confidence intervals — 95% CI for all metrics statsmodels Quantify uncertainty around observed lifts
9 Bayesian analysis — Beta-Binomial model numpy P(Treatment > Control) with credible intervals
10 Segmentation — device, country, channel, user type Python · pandas Identify where the effect is strongest
11 Business impact — monthly/annual revenue projection Python · pandas Translate statistics into executive decisions

📊 Key Results

Metric Control Treatment Lift Significant
Conversion Rate 3.022% 3.475% +15.0% ✅ p<0.001
Avg Order Value $85.40 $91.16 +6.7% ✅ p<0.001
Revenue per User $2.58 $3.17 +22.8% ✅ p<0.001
Business Projection Value
Extra conversions/month +6,798
Revenue lift/month +$880,732
Revenue lift/year +$10,568,782
Conservative estimate/year +$5,284,391
Bayesian confidence 100%
Recommendation ✅ SHIP IT

🔬 Statistical Methods

Frequentist approach:

  • Z-test for conversion rate (primary metric) — p=1.60e-19
  • T-test for revenue per user — p=2.82e-31
  • T-test for average order value — p=8.62e-15
  • Cohen's h effect size — Small magnitude, large practical impact
  • 95% Confidence Interval — [+0.355pp, +0.551pp] — entirely positive

Bayesian approach (Beta-Binomial model):

  • Prior: Beta(1,1) — uninformative
  • P(Treatment > Control): 100.0%
  • 95% Credible Interval: [+11.56%, +18.54%]
  • Expected loss if shipping Treatment: 0.000000pp

Experiment validation:

  • Sample Ratio Mismatch (SRM): ✅ Passed
  • Statistical power achieved: 100% (min required: 23,993/group · our sample: 250,000)

🎯 Segmentation Insights

Segment Lift Significant
Mobile (device) +17.0%
Social (channel) +20.5%
Australia (country) +21.9%
New users +16.8%
Loyal users +16.3%
Direct traffic +4.8%
UK (country) +8.3% ✅ (lower)

Key insight: Mobile users show the highest lift (+17%) — the mobile-optimized design delivers exactly the intended improvement. Direct traffic shows no significant effect — users who already know the site are less impacted by UX changes.


🔍 Analysis Deep Dive

Experiment Overview & EDA Experiment Overview Conversion rate by group (3.02% vs 3.48%), revenue per user ($2.58 vs $3.17), daily CR trend over 14 days, CR by device and user segment, and revenue distribution for converted users showing AOV shift from $85 to $91.

Bayesian Analysis & Segmentation Bayesian Segmentation Beta-Binomial posterior distributions showing zero overlap between control and treatment, P(Treatment > Control) = 100%, segmentation heatmap across all dimensions, and CR lift by device.

Executive Summary & Business Recommendations Executive Summary Key metrics comparison, relative lift across CR/AOV/RPU, 12-month cumulative revenue projection ($5.3M–$10.6M range), statistical tests summary, top segments by lift, and business recommendation table.


🛠️ Tech Stack

Layer Technology Purpose
Data generation Python · numpy Synthetic experiment data with industry parameters
Analysis Python · pandas Data manipulation, segmentation, business impact
Frequentist stats scipy · statsmodels Z-test, T-test, confidence intervals, power analysis
Bayesian stats numpy (Beta-Binomial) Probabilistic decision framework
Visualization matplotlib · seaborn Analysis charts and executive reporting
Database MySQL 8.0 · SQLAlchemy Structured storage with indexed tables
ETL Python · pymysql Automated data loading pipeline
Dashboard Power BI · DAX Interactive experiment results monitor

📁 Repository Structure

ab-testing-framework/
│
├── notebooks/
│   └── 01_ab_testing.ipynb           # Full analysis: design, stats, Bayesian, segmentation
├── scripts/
│   └── load_to_mysql.py              # ETL: experiment results → MySQL
├── dashboard/
│   └── ab_testing.pbix               # Power BI dashboard
├── data/                             # Generated experiment data (not tracked in git)
├── img/                              # Analysis and dashboard screenshots
├── .env.example                      # Environment variables template
├── .gitignore
├── requirements.txt
├── LICENSE
└── README_ES.md                      # Spanish version

👤 Author

Andrés Navarro Data Analyst · Experimentation · Statistical Analysis · Python · SQL

GitHub LinkedIn Portfolio


Built to demonstrate production-grade experimentation capabilities — experiment design, frequentist and Bayesian statistical analysis, segmentation, and business impact quantification — skills directly applicable to e-commerce, fintech, SaaS, and any product-led company running data-driven experiments.

About

End-to-end A/B testing framework — frequentist & Bayesian analysis on 500K+ sessions. Python · SciPy · MySQL · Power BI. +15% CR lift · $10.6M projected annual revenue · 100% Bayesian confidence.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors