# Analyzing A/B Test Results

This notebook covers all result analysis methods in pyexpstats:
- Conversion (binary) analysis
- Revenue/magnitude (continuous) analysis
- Multi-variant testing
- Confidence intervals
- Difference-in-differences

In [1]:
from pyexpstats import conversion, magnitude

## Conversion Analysis

In [2]:
# Analyze a standard A/B test on conversion rate
result = conversion.analyze(
    control_visitors=8000,
    control_conversions=400,    # 5.0%
    variant_visitors=8000,
    variant_conversions=456,    # 5.7%
)

print(f"Control rate:  {result.control_rate:.2%}")
print(f"Variant rate:  {result.variant_rate:.2%}")
print(f"Lift:          {result.lift_percent:+.2f}%")
print(f"P-value:       {result.p_value:.4f}")
print(f"Significant:   {result.is_significant}")
print(f"Winner:        {result.winner}")
print(f"95% CI:        [{result.confidence_interval_lower:+.2f}%, {result.confidence_interval_upper:+.2f}%]")

Control rate:  5.00%
Variant rate:  5.70%
Lift:          +14.00%
P-value:       0.0491
Significant:   True
Winner:        variant
95% CI:        [+0.00%, +0.01%]


In [3]:
# Readable markdown summary
print(conversion.summarize(result, test_name="Homepage CTA Test"))

## üìä Homepage CTA Test Results

### ‚úÖ Significant Result

**The test variant performed significantly higher than the control.**

- **Control conversion rate:** 5.00% (400 / 8,000)
- **Variant conversion rate:** 5.70% (456 / 8,000)
- **Relative lift:** +14.0% increase
- **P-value:** 0.0491
- **Confidence level:** 95%

### üìù What This Means

With 95% confidence, the difference is statistically significant. 
The p-value of **0.0491** indicates there's only a **4.91%** chance 
this result is due to random variation. 
The variant shows a **14.0%** improvement over control.


## Revenue / Magnitude Analysis

In [4]:
# Analyze average order value (continuous metric)
rev_result = magnitude.analyze(
    control_visitors=3000,
    control_mean=75.20,
    control_std=28.50,
    variant_visitors=3000,
    variant_mean=79.80,
    variant_std=30.10,
)

print(f"Control mean:  ${rev_result.control_mean:.2f}")
print(f"Variant mean:  ${rev_result.variant_mean:.2f}")
print(f"Lift:          {rev_result.lift_percent:+.2f}%")
print(f"P-value:       {rev_result.p_value:.4f}")
print(f"Significant:   {rev_result.is_significant}")
print(f"Winner:        {rev_result.winner}")

Control mean:  $75.20
Variant mean:  $79.80
Lift:          +6.12%
P-value:       0.0000
Significant:   True
Winner:        variant


In [5]:
print(magnitude.summarize(
    rev_result,
    test_name="Upsell Widget Test",
    metric_name="Average Order Value",
    currency="$"
))

## üìä Upsell Widget Test Results

### ‚úÖ Significant Result

**The test variant's average order value is significantly higher than control.**

- **Control average order value:** $75.20 (n=3,000, std=$28.50)
- **Variant average order value:** $79.80 (n=3,000, std=$30.10)
- **Relative lift:** +6.1% increase
- **Absolute difference:** $+4.60
- **P-value:** 0.0000
- **Confidence level:** 95%

### üìù What This Means

With 95% confidence, the difference is statistically significant. 
The p-value of **0.0000** indicates there's only a **0.00%** chance 
this result is due to random variation. 
The variant shows a **$4.60** (6.1%) improvement over control.


## Multi-Variant Testing

In [6]:
# Compare multiple variants at once (conversion)
multi_result = conversion.analyze_multi(
    variants=[
        {"name": "Control",   "visitors": 5000, "conversions": 250},
        {"name": "Variant A", "visitors": 5000, "conversions": 290},
        {"name": "Variant B", "visitors": 5000, "conversions": 310},
    ],
    confidence=95,
    correction="bonferroni",
)

print(f"Overall significant: {multi_result.is_significant}")
print(f"P-value: {multi_result.p_value:.4f}")
print(f"Best variant: {multi_result.best_variant}")
print(f"\nPairwise comparisons:")
for comp in multi_result.pairwise_comparisons:
    sig = "*" if comp.is_significant else " "
    print(f"  {comp.variant_a} vs {comp.variant_b}: "
          f"lift={comp.lift_percent:+.1f}%, p={comp.p_value_adjusted:.4f} {sig}")

Overall significant: True
P-value: 0.0304
Best variant: Variant B

Pairwise comparisons:
  Control vs Variant A: lift=+16.0%, p=0.2303  
  Control vs Variant B: lift=+24.0%, p=0.0272 *
  Variant A vs Variant B: lift=+6.9%, p=1.0000  


In [7]:
print(conversion.summarize_multi(multi_result, test_name="Landing Page Variants"))

## üìä Landing Page Variants Results

### ‚úÖ Significant Differences Detected

**At least one variant performs differently from the others.**

### Variant Performance

| Variant | Visitors | Conversions | Rate |
|---------|----------|-------------|------|
| Variant B üèÜ | 5,000 | 310 | 6.20% |
| Variant A | 5,000 | 290 | 5.80% |
| Control | 5,000 | 250 | 5.00% |

### Overall Test (Chi-Square)

- **Test statistic:** 6.98
- **Degrees of freedom:** 2
- **P-value:** 0.0304
- **Confidence level:** 95%

### Significant Pairwise Differences

- **Variant B** beats **Control** by 24.0% (p=0.0272)

### üìù What This Means

With 95% confidence, there are real differences between your variants. 
**Variant B** has the highest conversion rate. 
The pairwise comparisons above show which specific differences are statistically significant 
(adjusted for multiple comparisons using Bonferroni correction).


In [8]:
# Multi-variant revenue analysis
rev_multi = magnitude.analyze_multi(
    variants=[
        {"name": "Control",   "visitors": 2000, "mean": 75.0, "std": 28.0},
        {"name": "Variant A", "visitors": 2000, "mean": 79.5, "std": 30.0},
        {"name": "Variant B", "visitors": 2000, "mean": 73.0, "std": 27.0},
    ],
    confidence=95,
)

print(f"Best variant: {rev_multi.best_variant}")
print(f"Significant: {rev_multi.is_significant}")
for comp in rev_multi.pairwise_comparisons:
    print(f"  {comp.variant_a} vs {comp.variant_b}: lift={comp.lift_percent:+.1f}%")

Best variant: Variant A
Significant: True
  Control vs Variant A: lift=+6.0%
  Control vs Variant B: lift=-2.7%
  Variant A vs Variant B: lift=-8.2%


## Confidence Intervals

In [9]:
# Standalone confidence interval for a conversion rate
ci = conversion.confidence_interval(visitors=5000, conversions=250)
print(f"Rate: {ci.rate:.2%}")
print(f"95% CI: [{ci.lower:.4f}, {ci.upper:.4f}]")
print(f"Margin of error: +/- {ci.margin_of_error:.4f}")

# For a continuous metric
ci_rev = magnitude.confidence_interval(visitors=3000, mean=75.20, std=28.50)
print(f"\nMean: ${ci_rev.mean:.2f}")
print(f"95% CI: [${ci_rev.lower:.2f}, ${ci_rev.upper:.2f}]")
print(f"Margin of error: +/- ${ci_rev.margin_of_error:.2f}")

Rate: 5.00%
95% CI: [0.0443, 0.0564]
Margin of error: +/- 0.0060

Mean: $75.20
95% CI: [$74.18, $76.22]
Margin of error: +/- $1.02


## Difference-in-Differences

In [10]:
# Compare pre/post treatment effect while controlling for time trends
did = conversion.diff_in_diff(
    control_pre_visitors=5000,
    control_pre_conversions=250,     # 5.0%
    control_post_visitors=5000,
    control_post_conversions=260,    # 5.2% (natural drift)
    treatment_pre_visitors=5000,
    treatment_pre_conversions=245,   # 4.9%
    treatment_post_visitors=5000,
    treatment_post_conversions=290,  # 5.8% (treatment + drift)
)

print(f"Control change:   {did.control_change:+.2%}")
print(f"Treatment change: {did.treatment_change:+.2%}")
print(f"DiD effect:       {did.diff_in_diff:+.4f} ({did.diff_in_diff_percent:+.1f}%)")
print(f"Significant:      {did.is_significant}")
print(f"P-value:          {did.p_value:.4f}")

Control change:   +0.20%
Treatment change: +0.90%
DiD effect:       +0.0070 (+14.3%)
Significant:      False
P-value:          0.2660


In [11]:
print(conversion.summarize_diff_in_diff(did, test_name="Checkout Redesign (DiD)"))

## üìä Checkout Redesign (DiD)

### ‚è≥ No Significant Treatment Effect

**The treatment effect is not statistically significant.**

### Conversion Rates

| Group | Pre-Period | Post-Period | Change |
|-------|------------|-------------|--------|
| Control | 5.00% | 5.20% | +0.20% |
| Treatment | 4.90% | 5.80% | +0.90% |

### Difference-in-Differences Estimate

- **DiD Effect:** +0.70% (+14.3% relative)
- **95% CI:** [-0.53%, 1.93%]
- **Z-statistic:** 1.11
- **P-value:** 0.2660
- **Confidence level:** 95%

### Sample Sizes

| Group | Pre-Period | Post-Period |
|-------|------------|-------------|
| Control | 5,000 | 5,000 |
| Treatment | 5,000 | 5,000 |

### üìù What This Means

The treatment group's conversion rate changed by **+0.90%** 
while the control group changed by **+0.20%**. 
The difference (+0.70%) is not statistically significant. 
This could mean the treatment had no real effect, or the sample size is insufficient to detect it.
