# A/B Test Experimental Analysis Framework

This notebook provides a framework for analyzing experimental data from your A/B test.
Use this template alongside the downloaded CSV data to work through the analysis questions.

## Setup and Data Import
First, let's import the necessary libraries and load the experimental data.


In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Libraries imported successfully!")


In [None]:
# Load the experimental data
# Replace 'experiment_data.csv' with the actual filename you downloaded
df = pd.read_csv('experiment_data.csv')

print(f"Data loaded successfully!")
print(f"Dataset shape: {df.shape}")
print(f"\nColumn names: {list(df.columns)}")


In [None]:
# Explore the data structure
print("First 5 rows:")
print(df.head())

print("\nData types:")
print(df.dtypes)

print("\nBasic statistics:")
print(df.describe())


## Data Overview
Let's examine the experimental groups and overall conversion patterns.


In [None]:
# Check the distribution of users across groups
print("Group distribution:")
print(df['group'].value_counts())

print("\nConversion distribution:")
print(df['converted'].value_counts())

print("\nConversion by group:")
print(pd.crosstab(df['group'], df['converted'], margins=True))


## Question 1: Control Group Conversion Rate

Calculate the conversion rate for the control group.
**Formula**: Conversions ÷ Total Users in Control Group


In [None]:
# Filter control group data
control_data = df[df['group'] == 'control']

# Calculate control group metrics
control_total_users = len(control_data)  # Total users in control group
control_conversions = control_data['converted'].sum()  # Number of conversions in control group
control_conversion_rate = control_conversions / control_total_users  # Calculate: conversions / total_users

print(f"Control Group Analysis:")
print(f"Total users: {control_total_users:,}")
print(f"Conversions: {control_conversions:,}")
print(f"Conversion rate: {control_conversion_rate:.3%}")
print(f"Conversion rate (as percentage): {control_conversion_rate * 100:.3f}%")


## Question 2: Treatment Group Conversion Rate

Calculate the conversion rate for the treatment group.
**Formula**: Conversions ÷ Total Users in Treatment Group


In [None]:
# Filter treatment group data
treatment_data = df[df['group'] == 'treatment']

# Calculate treatment group metrics
treatment_total_users = len(treatment_data)  # Total users in treatment group
treatment_conversions = treatment_data['converted'].sum()  # Number of conversions in treatment group
treatment_conversion_rate = treatment_conversions / treatment_total_users  # Calculate: conversions / total_users

print(f"Treatment Group Analysis:")
print(f"Total users: {treatment_total_users:,}")
print(f"Conversions: {treatment_conversions:,}")
print(f"Conversion rate: {treatment_conversion_rate:.3%}")
print(f"Conversion rate (as percentage): {treatment_conversion_rate * 100:.3f}%")


## Question 3: Absolute Lift

Calculate the absolute lift between treatment and control groups.
**Formula**: Treatment Rate - Control Rate (in percentage points)


In [None]:
# Calculate the absolute lift
absolute_lift = treatment_conversion_rate - control_conversion_rate  # Calculate: treatment_rate - control_rate

print(f"Absolute Lift Analysis:")
print(f"Treatment rate: {treatment_conversion_rate:.3%}")
print(f"Control rate: {control_conversion_rate:.3%}")
print(f"Absolute lift: {absolute_lift:.3f}")
print(f"Absolute lift (as percentage): {absolute_lift * 100:.3f} percentage points")


## Question 4: Relative Lift

Calculate the relative lift between treatment and control groups.
**Formula**: (Treatment Rate - Control Rate) ÷ Control Rate × 100


In [None]:
# Calculate the relative lift
relative_lift = (treatment_conversion_rate - control_conversion_rate) / control_conversion_rate * 100  # Calculate: (treatment_rate - control_rate) / control_rate * 100

print(f"Relative Lift Analysis:")
print(f"Treatment rate: {treatment_conversion_rate:.3%}")
print(f"Control rate: {control_conversion_rate:.3%}")
print(f"Relative lift: {relative_lift:.1f}%")


## Question 5: Statistical Significance (P-value)

Perform a two-proportion z-test to determine if the difference is statistically significant.
**Formula**: Two-proportion z-test p-value


In [None]:
# Perform two-proportion z-test
from statsmodels.stats.proportion import proportions_ztest
import numpy as np

# Prepare data for the test
successes = np.array([treatment_conversions, control_conversions])
samples = np.array([treatment_total_users, control_total_users])

# Perform the test
z_stat, p_value = proportions_ztest(successes, samples)

print(f"Statistical Significance Analysis:")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.6f}")
print(f"Significant at α=0.05? {'Yes' if p_value < 0.05 else 'No'}")
print(f"Significant at α=0.01? {'Yes' if p_value < 0.01 else 'No'}")


## Question 6: Confidence Interval for Difference

Calculate the 95% confidence interval for the difference between treatment and control groups.
**Formula**: (p1 - p2) ± z_α/2 × SE(p1 - p2)


In [None]:
# Question 7: Rollout Decision
# Based on confidence interval and business target

# Business target (from experiment design)
business_target_absolute = None  # Fill in: Business target absolute lift (e.g., 0.03 for 3%)
business_target_relative = None  # Fill in: Business target relative lift (e.g., 0.20 for 20%)

# Decision logic
ci_lower, ci_upper = None, None  # Use your CI from Question 6

if ci_lower is not None and ci_upper is not None and business_target_absolute is not None:
    if ci_upper < business_target_absolute:
        rollout_decision = "do_not_proceed"
        reasoning = "Upper bound is below business target - target not achievable"
    elif ci_lower >= business_target_absolute:
        rollout_decision = "proceed_with_confidence"
        reasoning = "Lower bound is above business target - target very likely achievable"
    else:
        rollout_decision = "proceed_with_caution"
        reasoning = "Target is within CI bounds - proceed with caution"
else:
    rollout_decision = "need_more_data"
    reasoning = "Insufficient data for decision"

print(f"Rollout Decision: {rollout_decision}")
print(f"Reasoning: {reasoning}")


In [None]:
# Calculate 95% confidence interval for the difference
p1, p2 = treatment_conversion_rate, control_conversion_rate
n1, n2 = treatment_total_users, control_total_users

# Standard error of difference
se_diff = np.sqrt(p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2)

# 95% CI for difference (z_0.025 = 1.96)
diff = p1 - p2
margin_error = 1.96 * se_diff
ci_lower = (diff - margin_error) * 100  # Convert to percentage
ci_upper = (diff + margin_error) * 100  # Convert to percentage

print(f"Confidence Interval Analysis:")
print(f"Difference: {diff:.3%}")
print(f"Standard Error: {se_diff:.6f}")
print(f"Margin of Error: {margin_error * 100:.2f} percentage points")
print(f"95% CI: [{ci_lower:.2f}%, {ci_upper:.2f}%]")
print(f"CI includes zero? {'Yes' if ci_lower <= 0 <= ci_upper else 'No'}")


## Final Summary

Your complete experimental analysis results:

### Key Metrics
- **Control conversion rate**: [Your answer]%
- **Treatment conversion rate**: [Your answer]%
- **Absolute lift**: [Your calculation] percentage points
- **Relative lift**: [Your calculation]%

### Statistical Analysis
- **P-value**: [Your result]
- **95% Confidence Interval**: [Your result]%
- **Statistically significant**: [Yes/No]

### Business Decision
- **Business target lift**: [Your calculation]%
- **Rollout recommendation**: [proceed_with_confidence/proceed_with_caution/do_not_proceed]
- **Reasoning**: [Your explanation based on CI vs target]

### Sample Sizes
- **Control group**: [Your answer] users
- **Treatment group**: [Your answer] users
- **Total experiment size**: [Your calculation] users


## Resources
- [Two-proportion z-test explanation](https://stattrek.com/hypothesis-test/difference-in-proportions)
- [Statistical significance calculator](https://www.evanmiller.org/ab-testing/chi-squared.html)
- [Effect size interpretation](https://en.wikipedia.org/wiki/Effect_size)
