# üß™ A/B Testing for Social Media Marketing Optimization

## Social Media ROI Attribution & Influencer Performance Analyzer

This notebook applies rigorous statistical A/B testing to answer key marketing questions:

| # | Test | Business Question |
|---|------|-------------------|
| 1 | Sponsored vs Organic | Does sponsorship hurt engagement? |
| 2 | CTA vs No CTA | Do CTAs drive more conversions? |
| 3 | Discount Code Impact | Do discounts boost sales or lower AOV? |
| 4 | Reels vs Carousels | Which format drives more purchase intent? |
| 5 | Micro vs Macro Influencers | Where should brands invest? |
| 6 | Weekday vs Weekend Posting | When should brands post? |

**Statistical Methods Used:**
- Two-sample t-test (Welch's)
- Mann-Whitney U test (non-parametric)
- Chi-squared test (proportions)
- Cohen's d (effect size)
- Bootstrap confidence intervals
- Bonferroni correction (multiple testing)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11

print("‚úÖ Libraries loaded!")

In [None]:
# Load data
data_dir = Path("../data/raw")
posts = pd.read_csv(data_dir / "posts.csv")
conversions = pd.read_csv(data_dir / "conversions.csv")
influencers = pd.read_csv(data_dir / "influencers.csv")

# Parse dates
posts['post_date'] = pd.to_datetime(posts['post_date'])
conversions['conversion_date'] = pd.to_datetime(conversions['conversion_date'])

# Compute engagement rate
posts['total_engagement'] = posts['likes'] + posts['comments'] + posts['shares'] + posts['saves']
posts['engagement_rate'] = posts['total_engagement'] / posts['reach'] * 100

# Merge influencer tier into posts
posts = posts.merge(influencers[['influencer_id', 'tier', 'follower_count', 'avg_collaboration_cost']], on='influencer_id', how='left')

print(f"üìä Loaded {len(posts):,} posts, {len(conversions):,} conversions, {len(influencers):,} influencers")

---
## Helper Functions

In [None]:
def cohens_d(group1, group2):
    """Calculate Cohen's d effect size."""
    n1, n2 = len(group1), len(group2)
    pooled_std = np.sqrt(((n1 - 1) * group1.std()**2 + (n2 - 1) * group2.std()**2) / (n1 + n2 - 2))
    return (group1.mean() - group2.mean()) / pooled_std if pooled_std > 0 else 0

def interpret_effect(d):
    """Interpret Cohen's d."""
    d = abs(d)
    if d < 0.2: return "Negligible"
    elif d < 0.5: return "Small"
    elif d < 0.8: return "Medium"
    else: return "Large"

def bootstrap_ci(data, n_bootstrap=10000, ci=0.95):
    """Calculate bootstrap confidence interval for the mean."""
    boot_means = [np.random.choice(data, size=len(data), replace=True).mean() for _ in range(n_bootstrap)]
    lower = np.percentile(boot_means, (1 - ci) / 2 * 100)
    upper = np.percentile(boot_means, (1 + ci) / 2 * 100)
    return lower, upper

def run_ab_test(group_a, group_b, label_a, label_b, metric_name, alpha=0.05):
    """Run a complete A/B test with t-test, Mann-Whitney, and effect size."""
    print(f"\n{'='*60}")
    print(f"üìä {metric_name}")
    print(f"{'='*60}")
    
    # Sample sizes
    print(f"\n   {label_a}: n={len(group_a):,}, mean={group_a.mean():.4f}, std={group_a.std():.4f}")
    print(f"   {label_b}: n={len(group_b):,}, mean={group_b.mean():.4f}, std={group_b.std():.4f}")
    
    # Welch's t-test (does not assume equal variances)
    t_stat, t_pval = stats.ttest_ind(group_a, group_b, equal_var=False)
    print(f"\n   üìê Welch's t-test: t={t_stat:.4f}, p={t_pval:.6f}")
    
    # Mann-Whitney U test (non-parametric)
    u_stat, u_pval = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')
    print(f"   üìê Mann-Whitney U: U={u_stat:,.0f}, p={u_pval:.6f}")
    
    # Cohen's d
    d = cohens_d(group_a, group_b)
    print(f"   üìê Cohen's d: {d:.4f} ({interpret_effect(d)} effect)")
    
    # Bootstrap CI for difference in means
    diff = group_a.mean() - group_b.mean()
    ci_a = bootstrap_ci(group_a.values)
    ci_b = bootstrap_ci(group_b.values)
    print(f"\n   üìä Difference in means: {diff:.4f}")
    print(f"   üìä 95% CI for {label_a}: ({ci_a[0]:.4f}, {ci_a[1]:.4f})")
    print(f"   üìä 95% CI for {label_b}: ({ci_b[0]:.4f}, {ci_b[1]:.4f})")
    
    # Decision
    if t_pval < alpha:
        winner = label_a if diff > 0 else label_b
        print(f"\n   ‚úÖ SIGNIFICANT (p < {alpha}): {winner} performs better")
    else:
        print(f"\n   ‚ö™ NOT SIGNIFICANT (p >= {alpha}): No meaningful difference")
    
    return {'t_stat': t_stat, 't_pval': t_pval, 'u_pval': u_pval, 'cohens_d': d, 'diff': diff}

def plot_ab_comparison(group_a, group_b, label_a, label_b, metric_name, ax=None):
    """Visualize A/B test results."""
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 5))
    
    data = pd.DataFrame({
        'value': pd.concat([group_a, group_b]),
        'group': [label_a]*len(group_a) + [label_b]*len(group_b)
    })
    
    sns.boxplot(data=data, x='group', y='value', ax=ax, palette=['#4C72B0', '#DD8452'])
    ax.set_title(metric_name, fontweight='bold', fontsize=13)
    ax.set_xlabel('')
    ax.set_ylabel(metric_name)
    
    # Add means as diamonds
    means = data.groupby('group')['value'].mean()
    for i, (group, mean_val) in enumerate(means.items()):
        ax.scatter(i, mean_val, marker='D', color='red', s=60, zorder=5, label='Mean' if i == 0 else '')
    ax.legend()

print("‚úÖ Helper functions defined!")

---
## Test 1: Sponsored vs Organic Posts

**Hypothesis:** Sponsored posts have lower engagement rates than organic posts because audiences perceive them as ads.

- **H‚ÇÄ**: Œº_sponsored = Œº_organic (no difference in engagement)
- **H‚ÇÅ**: Œº_sponsored ‚â† Œº_organic (there is a difference)

In [None]:
# Split groups
sponsored = posts[posts['is_sponsored'] == True]['engagement_rate']
organic = posts[posts['is_sponsored'] == False]['engagement_rate']

result_1 = run_ab_test(organic, sponsored, 'Organic', 'Sponsored', 'Engagement Rate: Sponsored vs Organic')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot
plot_ab_comparison(organic, sponsored, 'Organic', 'Sponsored', 'Engagement Rate (%)', ax=axes[0])

# Distribution overlay
axes[1].hist(organic, bins=50, alpha=0.6, label=f'Organic (n={len(organic):,})', color='#4C72B0', density=True)
axes[1].hist(sponsored, bins=50, alpha=0.6, label=f'Sponsored (n={len(sponsored):,})', color='#DD8452', density=True)
axes[1].set_title('Engagement Rate Distribution', fontweight='bold', fontsize=13)
axes[1].set_xlabel('Engagement Rate (%)')
axes[1].set_ylabel('Density')
axes[1].legend()

plt.tight_layout()
plt.savefig('../data/ab_test1_sponsored.png', dpi=150, bbox_inches='tight')
plt.show()

---
## Test 2: CTA vs No CTA

**Hypothesis:** Posts with a Call-to-Action drive more saves (purchase intent proxy).

- **H‚ÇÄ**: Œº_cta = Œº_no_cta (no difference in save rate)
- **H‚ÇÅ**: Œº_cta > Œº_no_cta (CTA increases saves)

In [None]:
# Save rate = saves / likes (proxy for purchase intent)
posts['save_rate'] = posts['saves'] / posts['likes'].replace(0, 1) * 100

cta = posts[posts['has_cta'] == True]['save_rate']
no_cta = posts[posts['has_cta'] == False]['save_rate']

result_2 = run_ab_test(cta, no_cta, 'With CTA', 'Without CTA', 'Save Rate: CTA vs No CTA')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_ab_comparison(cta, no_cta, 'With CTA', 'Without CTA', 'Save Rate (%)', ax=axes[0])

# Also test engagement rate
cta_eng = posts[posts['has_cta'] == True]['engagement_rate']
no_cta_eng = posts[posts['has_cta'] == False]['engagement_rate']
plot_ab_comparison(cta_eng, no_cta_eng, 'With CTA', 'Without CTA', 'Engagement Rate (%)', ax=axes[1])

plt.tight_layout()
plt.savefig('../data/ab_test2_cta.png', dpi=150, bbox_inches='tight')
plt.show()

---
## Test 3: Discount Code Impact

**Hypothesis:** Discount codes increase conversion count but lower average order value.

- **H‚ÇÄ**: Œº_discount = Œº_no_discount (no difference in order value)
- **H‚ÇÅ**: Œº_discount < Œº_no_discount (discounts lower AOV)

In [None]:
discount_used = conversions[conversions['discount_code_used'] == True]['order_value']
no_discount = conversions[conversions['discount_code_used'] == False]['order_value']

result_3 = run_ab_test(no_discount, discount_used, 'No Discount', 'With Discount', 'Order Value: Discount Code Impact')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_ab_comparison(no_discount, discount_used, 'No Discount', 'With Discount', 'Order Value ($)', ax=axes[0])

# Conversion count comparison (chi-squared)
total_with_discount = len(conversions[conversions['discount_code_used'] == True])
total_without_discount = len(conversions[conversions['discount_code_used'] == False])

# Revenue comparison
rev_data = pd.DataFrame({
    'Group': ['With Discount', 'No Discount'],
    'Total Revenue': [discount_used.sum(), no_discount.sum()],
    'Avg Order Value': [discount_used.mean(), no_discount.mean()],
    'Conversions': [total_with_discount, total_without_discount]
})
rev_data.plot(x='Group', y=['Total Revenue', 'Avg Order Value'], kind='bar', ax=axes[1],
              color=['#4C72B0', '#DD8452'], secondary_y='Avg Order Value')
axes[1].set_title('Revenue Impact of Discount Codes', fontweight='bold', fontsize=13)
axes[1].tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.savefig('../data/ab_test3_discount.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nüìä Discount Code Summary:")
print(f"   With discount: {total_with_discount:,} conversions, ${discount_used.sum():,.2f} revenue")
print(f"   Without discount: {total_without_discount:,} conversions, ${no_discount.sum():,.2f} revenue")

---
## Test 4: Reels vs Carousels (Instagram)

**Hypothesis:** Reels generate higher engagement than carousels on Instagram.

- **H‚ÇÄ**: Œº_reels = Œº_carousels
- **H‚ÇÅ**: Œº_reels ‚â† Œº_carousels

In [None]:
# Filter Instagram only
ig_posts = posts[posts['platform'] == 'Instagram']

reels = ig_posts[ig_posts['content_type'] == 'reel']['engagement_rate']
carousels = ig_posts[ig_posts['content_type'] == 'carousel']['engagement_rate']

result_4 = run_ab_test(reels, carousels, 'Reels', 'Carousels', 'Instagram Engagement: Reels vs Carousels')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_ab_comparison(reels, carousels, 'Reels', 'Carousels', 'Engagement Rate (%)', ax=axes[0])

# Also compare save rates (purchase intent)
reels_saves = ig_posts[ig_posts['content_type'] == 'reel']['save_rate']
carousel_saves = ig_posts[ig_posts['content_type'] == 'carousel']['save_rate']
plot_ab_comparison(reels_saves, carousel_saves, 'Reels', 'Carousels', 'Save Rate (Purchase Intent %)', ax=axes[1])

plt.tight_layout()
plt.savefig('../data/ab_test4_reels_vs_carousels.png', dpi=150, bbox_inches='tight')
plt.show()

---
## Test 5: Micro vs Macro Influencers

**Hypothesis:** Micro-influencers deliver better ROI than macro-influencers despite lower reach.

- **H‚ÇÄ**: ROI_micro = ROI_macro
- **H‚ÇÅ**: ROI_micro > ROI_macro

In [None]:
# Engagement rate comparison
micro_eng = posts[posts['tier'] == 'micro']['engagement_rate']
macro_eng = posts[posts['tier'] == 'macro']['engagement_rate']

result_5a = run_ab_test(micro_eng, macro_eng, 'Micro', 'Macro', 'Engagement Rate: Micro vs Macro Influencers')

In [None]:
# Cost efficiency: engagement per dollar
micro_posts = posts[posts['tier'] == 'micro'].copy()
macro_posts = posts[posts['tier'] == 'macro'].copy()

micro_posts['engagement_per_dollar'] = micro_posts['total_engagement'] / micro_posts['avg_collaboration_cost'].replace(0, 1)
macro_posts['engagement_per_dollar'] = macro_posts['total_engagement'] / macro_posts['avg_collaboration_cost'].replace(0, 1)

result_5b = run_ab_test(
    micro_posts['engagement_per_dollar'], 
    macro_posts['engagement_per_dollar'],
    'Micro', 'Macro', 'Cost Efficiency: Engagement per Dollar Spent'
)

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

plot_ab_comparison(micro_eng, macro_eng, 'Micro', 'Macro', 'Engagement Rate (%)', ax=axes[0])
plot_ab_comparison(
    micro_posts['engagement_per_dollar'], macro_posts['engagement_per_dollar'],
    'Micro', 'Macro', 'Engagement per Dollar', ax=axes[1]
)

# Average cost comparison
tier_costs = posts.groupby('tier')['avg_collaboration_cost'].mean()
tier_costs = tier_costs.reindex(['nano', 'micro', 'mid', 'macro', 'mega'])
axes[2].bar(tier_costs.index, tier_costs.values, color=sns.color_palette('RdYlGn_r', 5))
axes[2].set_title('Avg Cost per Post by Tier', fontweight='bold', fontsize=13)
axes[2].set_ylabel('Cost ($)')
axes[2].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('../data/ab_test5_micro_vs_macro.png', dpi=150, bbox_inches='tight')
plt.show()

---
## Test 6: Weekday vs Weekend Posting

**Hypothesis:** Weekday posts get higher engagement than weekend posts.

- **H‚ÇÄ**: Œº_weekday = Œº_weekend
- **H‚ÇÅ**: Œº_weekday ‚â† Œº_weekend

In [None]:
weekday = posts[posts['day_of_week'] < 5]['engagement_rate']
weekend = posts[posts['day_of_week'] >= 5]['engagement_rate']

result_6 = run_ab_test(weekday, weekend, 'Weekday', 'Weekend', 'Engagement Rate: Weekday vs Weekend')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_ab_comparison(weekday, weekend, 'Weekday', 'Weekend', 'Engagement Rate (%)', ax=axes[0])

# Day-by-day breakdown
day_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
daily_eng = posts.groupby('day_of_week')['engagement_rate'].mean()
colors = ['#4C72B0']*5 + ['#DD8452']*2
axes[1].bar([day_names[i] for i in daily_eng.index], daily_eng.values, color=colors)
axes[1].set_title('Avg Engagement Rate by Day', fontweight='bold', fontsize=13)
axes[1].set_ylabel('Engagement Rate (%)')
axes[1].axhline(y=daily_eng.mean(), color='red', linestyle='--', alpha=0.5, label='Overall Mean')
axes[1].legend()

plt.tight_layout()
plt.savefig('../data/ab_test6_weekday_weekend.png', dpi=150, bbox_inches='tight')
plt.show()

---
## üìä Results Summary & Multiple Testing Correction

In [None]:
# Compile all results
results = [
    {'Test': 'Sponsored vs Organic', 'Metric': 'Engagement Rate', 'p-value': result_1['t_pval'], "Cohen's d": result_1['cohens_d'], 'Diff': result_1['diff']},
    {'Test': 'CTA vs No CTA', 'Metric': 'Save Rate', 'p-value': result_2['t_pval'], "Cohen's d": result_2['cohens_d'], 'Diff': result_2['diff']},
    {'Test': 'Discount Code Impact', 'Metric': 'Order Value', 'p-value': result_3['t_pval'], "Cohen's d": result_3['cohens_d'], 'Diff': result_3['diff']},
    {'Test': 'Reels vs Carousels', 'Metric': 'Engagement Rate', 'p-value': result_4['t_pval'], "Cohen's d": result_4['cohens_d'], 'Diff': result_4['diff']},
    {'Test': 'Micro vs Macro (Engagement)', 'Metric': 'Engagement Rate', 'p-value': result_5a['t_pval'], "Cohen's d": result_5a['cohens_d'], 'Diff': result_5a['diff']},
    {'Test': 'Micro vs Macro (Efficiency)', 'Metric': 'Engmt per Dollar', 'p-value': result_5b['t_pval'], "Cohen's d": result_5b['cohens_d'], 'Diff': result_5b['diff']},
    {'Test': 'Weekday vs Weekend', 'Metric': 'Engagement Rate', 'p-value': result_6['t_pval'], "Cohen's d": result_6['cohens_d'], 'Diff': result_6['diff']}
]

results_df = pd.DataFrame(results)

# Bonferroni correction for multiple testing
n_tests = len(results_df)
alpha = 0.05
bonferroni_alpha = alpha / n_tests

results_df['Significant (Œ±=0.05)'] = results_df['p-value'] < alpha
results_df['Significant (Bonferroni)'] = results_df['p-value'] < bonferroni_alpha
results_df['Effect Size'] = results_df["Cohen's d"].apply(lambda d: interpret_effect(d))

print("üìä A/B TESTING RESULTS SUMMARY")
print("=" * 80)
print(f"\n   Bonferroni-corrected Œ±: {bonferroni_alpha:.4f} (original: {alpha})")
print(f"   Number of tests: {n_tests}\n")
print(results_df[['Test', 'Metric', 'p-value', "Cohen's d", 'Effect Size', 'Significant (Bonferroni)']].to_string(index=False))

In [None]:
# Visualization of all test results
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# P-values (log scale)
colors = ['green' if sig else 'gray' for sig in results_df['Significant (Bonferroni)']]
axes[0].barh(results_df['Test'], -np.log10(results_df['p-value']), color=colors)
axes[0].axvline(x=-np.log10(bonferroni_alpha), color='red', linestyle='--', label=f'Bonferroni Œ±={bonferroni_alpha:.4f}')
axes[0].axvline(x=-np.log10(0.05), color='orange', linestyle='--', label='Œ±=0.05')
axes[0].set_xlabel('-log‚ÇÅ‚ÇÄ(p-value)', fontsize=12)
axes[0].set_title('Statistical Significance of A/B Tests', fontweight='bold', fontsize=14)
axes[0].legend()

# Effect sizes
effect_colors = []
for d in results_df["Cohen's d"].abs():
    if d >= 0.8: effect_colors.append('#d62728')
    elif d >= 0.5: effect_colors.append('#ff7f0e')
    elif d >= 0.2: effect_colors.append('#2ca02c')
    else: effect_colors.append('#7f7f7f')

axes[1].barh(results_df['Test'], results_df["Cohen's d"].abs(), color=effect_colors)
axes[1].axvline(x=0.2, color='green', linestyle=':', alpha=0.5, label='Small (0.2)')
axes[1].axvline(x=0.5, color='orange', linestyle=':', alpha=0.5, label='Medium (0.5)')
axes[1].axvline(x=0.8, color='red', linestyle=':', alpha=0.5, label='Large (0.8)')
axes[1].set_xlabel("|Cohen's d|", fontsize=12)
axes[1].set_title('Effect Sizes of A/B Tests', fontweight='bold', fontsize=14)
axes[1].legend()

plt.tight_layout()
plt.savefig('../data/ab_test_summary.png', dpi=150, bbox_inches='tight')
plt.show()

---
## üí° Business Recommendations

In [None]:
print("="*60)
print("üí° STRATEGIC RECOMMENDATIONS FROM A/B TESTS")
print("="*60)

significant_tests = results_df[results_df['Significant (Bonferroni)'] == True]
insignificant_tests = results_df[results_df['Significant (Bonferroni)'] == False]

print(f"\nüìä {len(significant_tests)} of {n_tests} tests were statistically significant after Bonferroni correction.")

print("\n" + "-"*60)
print("ACTIONABLE INSIGHTS:")
print("-"*60)

for _, row in results_df.iterrows():
    sig_marker = "‚úÖ" if row['Significant (Bonferroni)'] else "‚¨ú"
    print(f"\n{sig_marker} {row['Test']}")
    print(f"   p-value: {row['p-value']:.6f} | Effect: {row['Effect Size']} (d={row[\"Cohen's d\"]:.3f})")
    
    if row['Test'] == 'Sponsored vs Organic':
        if row['Diff'] > 0:
            print("   ‚Üí Organic posts outperform sponsored. Make sponsored content feel more native.")
        else:
            print("   ‚Üí Sponsored and organic perform similarly. Sponsorship doesn't hurt engagement.")
    elif row['Test'] == 'CTA vs No CTA':
        print("   ‚Üí Evaluate if CTAs affect save behavior. Test softer CTAs vs aggressive ones.")
    elif row['Test'] == 'Discount Code Impact':
        if row['Diff'] > 0:
            print("   ‚Üí Full-price purchases yield higher AOV. Reserve discounts for acquisition campaigns.")
        else:
            print("   ‚Üí Discounts don't significantly reduce AOV. Safe to use for volume growth.")
    elif row['Test'] == 'Reels vs Carousels':
        if row['Diff'] > 0:
            print("   ‚Üí Reels outperform carousels. Shift content mix toward short-form video.")
        else:
            print("   ‚Üí Carousels match or beat Reels. Maintain a balanced content mix.")
    elif 'Micro vs Macro' in row['Test']:
        if row['Diff'] > 0:
            print("   ‚Üí Micro-influencers deliver better value. Reallocate budget from macro to micro.")
    elif row['Test'] == 'Weekday vs Weekend':
        if row['Diff'] > 0:
            print("   ‚Üí Weekdays outperform weekends. Concentrate posting on Tue-Thu.")
        else:
            print("   ‚Üí No significant day-of-week effect. Schedule content for convenience.")

print("\n" + "="*60)

---
## ‚úÖ A/B Testing Complete!

**Statistical Methods Applied:**
- Welch's t-test (unequal variances)
- Mann-Whitney U (non-parametric backup)
- Cohen's d (practical significance)
- Bootstrap confidence intervals
- Bonferroni correction (multiple comparisons)

**Charts Saved:**
- `ab_test1_sponsored.png` through `ab_test6_weekday_weekend.png`
- `ab_test_summary.png` (overall results)