# Sequential & Bayesian Testing

Beyond fixed-horizon frequentist tests, pyexpstats supports:
- **Sequential testing** — peek at results early without inflating false positives
- **Bayesian testing** — get probability statements about which variant is better

This notebook demonstrates both approaches.

In [1]:
from pyexpstats.methods import sequential, bayesian

## Sequential Testing with O'Brien-Fleming Boundaries

In [2]:
# O'Brien-Fleming: conservative early, lenient late
# Good when you want to protect against early stopping
result_obf = sequential.analyze(
    control_visitors=3000,
    control_conversions=150,
    variant_visitors=3000,
    variant_conversions=180,
    expected_visitors_per_variant=10000,
    method="obrien-fleming",
)

print(f"Decision: {result_obf.decision}")
print(f"Can stop: {result_obf.can_stop}")
print(f"Lift: {result_obf.lift_percent:+.1f}%")
print(f"Z-statistic: {result_obf.z_statistic:.3f}")
print(f"Upper boundary: {result_obf.upper_boundary:.3f}")
print(f"Lower boundary: {result_obf.lower_boundary:.3f}")
print(f"Information fraction: {result_obf.information_fraction:.1%}")
print(f"P(variant better): {result_obf.confidence_variant_better:.1f}%")

Decision: keep_running
Can stop: False
Lift: +20.0%
Z-statistic: 1.699
Upper boundary: 3.578
Lower boundary: -3.578
Information fraction: 30.0%
P(variant better): 95.5%


In [3]:
# Summarize the sequential result
print(sequential.summarize(result_obf))

## Sequential A/B Test - Sequential Analysis

### **KEEP RUNNING**

**Progress:** [######--------------] 30%

### Current Results

| Metric | Control | Variant |
|--------|---------|---------|
| Visitors | 3,000 | 3,000 |
| Conversions | 150 | 180 |
| Rate | 5.00% | 6.00% |

### Key Metrics

- **Lift:** +20.0%
- **Confidence variant is better:** 95.5%
- **Z-statistic:** 1.70
- **Upper boundary:** 3.58
- **Lower boundary:** -3.58

### Recommendation

Keep running the test

Results are not yet conclusive. You're 30% through the planned test.

Current results (not final):
- Control rate: 5.00%
- Variant rate: 6.00%
- Observed lift: +20.0%
- Current confidence: 1.0% that variant is better

Why you can't stop yet:
Stopping now would inflate your false positive rate. The observed difference hasn't crossed the statistical threshold needed for a valid conclusion.

Estimated remaining: ~14,000 more visitors needed.



## Sequential Testing with Pocock Boundaries

In [4]:
# Pocock: constant boundaries — easier to stop early
result_pocock = sequential.analyze(
    control_visitors=3000,
    control_conversions=150,
    variant_visitors=3000,
    variant_conversions=180,
    expected_visitors_per_variant=10000,
    method="pocock",
)

print(f"Decision: {result_pocock.decision}")
print(f"Can stop: {result_pocock.can_stop}")
print(f"Upper boundary: {result_pocock.upper_boundary:.3f}")
print(f"Lower boundary: {result_pocock.lower_boundary:.3f}")
print(f"Adjusted alpha: {result_pocock.adjusted_alpha:.4f}")

Decision: keep_running
Can stop: False
Upper boundary: 2.394
Lower boundary: -2.394
Adjusted alpha: 0.0150


In [5]:
# Compare boundaries at different information fractions
print("Information Fraction | O'Brien-Fleming | Pocock")
print("-" * 55)
for frac in [0.2, 0.4, 0.6, 0.8, 1.0]:
    obf = sequential.get_boundaries(frac, method="obrien-fleming")
    poc = sequential.get_boundaries(frac, method="pocock")
    print(f"       {frac:.0%}            |     {obf.upper:.3f}      |  {poc.upper:.3f}")

Information Fraction | O'Brien-Fleming | Pocock
-------------------------------------------------------
       20%            |     4.383      |  2.394
       40%            |     3.099      |  2.394
       60%            |     2.530      |  2.394
       80%            |     2.191      |  2.394
       100%            |     1.960      |  2.394


## Bayesian A/B Testing

In [6]:
# Bayesian analysis with uniform prior
bayes = bayesian.analyze(
    control_visitors=5000,
    control_conversions=250,
    variant_visitors=5000,
    variant_conversions=290,
)

print(f"Control rate: {bayes.control_rate:.2%}")
print(f"Variant rate: {bayes.variant_rate:.2%}")
print(f"Lift: {bayes.lift_percent:+.1f}%")
print(f"\nP(variant better): {bayes.probability_variant_better:.1f}%")
print(f"P(control better): {bayes.probability_control_better:.1f}%")
print(f"\nExpected loss (choosing variant): {bayes.expected_loss_choosing_variant:.4f}")
print(f"Expected loss (choosing control): {bayes.expected_loss_choosing_control:.4f}")
print(f"\nHas winner: {bayes.has_winner}")
print(f"Winner: {bayes.winner}")

Control rate: 5.00%
Variant rate: 5.80%
Lift: +16.0%

P(variant better): 96.2%
P(control better): 3.8%

Expected loss (choosing variant): 0.0068
Expected loss (choosing control): 0.8072

Has winner: True
Winner: variant


In [7]:
# Credible intervals (Bayesian confidence intervals)
print(f"Control 95% credible interval: [{bayes.control_credible_interval[0]:.4f}, {bayes.control_credible_interval[1]:.4f}]")
print(f"Variant 95% credible interval: [{bayes.variant_credible_interval[0]:.4f}, {bayes.variant_credible_interval[1]:.4f}]")
print(f"Lift 95% credible interval:    [{bayes.lift_credible_interval[0]:.4f}, {bayes.lift_credible_interval[1]:.4f}]")

Control 95% credible interval: [0.0443, 0.0564]
Variant 95% credible interval: [0.0519, 0.0648]
Lift 95% credible interval:    [-1.6200, 36.7700]


In [8]:
# With an informative prior (e.g., from historical data)
bayes_informed = bayesian.analyze(
    control_visitors=5000,
    control_conversions=250,
    variant_visitors=5000,
    variant_conversions=290,
    prior_alpha=50,     # Strong prior: ~50 conversions
    prior_beta=950,     # ~950 non-conversions (5% rate)
    confidence_threshold=0.95,
)

print(f"With informative prior (alpha=50, beta=950):")
print(f"  P(variant better): {bayes_informed.probability_variant_better:.1f}%")
print(f"  Winner: {bayes_informed.winner}")

With informative prior (alpha=50, beta=950):
  P(variant better): 94.8%
  Winner: none


In [9]:
print(bayesian.summarize(bayes))

## Bayesian A/B Test

### **Winner: Variant** (96.2% confidence)

**Probability Variant Wins:** ███████████████████░ 96.2%
**Probability Control Wins:** ░░░░░░░░░░░░░░░░░░░░ 3.8%

### Results

| Metric | Control | Variant |
|--------|---------|---------|
| Visitors | 5,000 | 5,000 |
| Conversions | 250 | 290 |
| Rate | 5.00% | 5.80% |
| 95% CI | [4.43%, 5.64%] | [5.19%, 6.48%] |

### Lift Analysis

- **Observed lift:** +16.0%
- **95% credible interval:** -1.6% to +36.8%

### Risk Analysis

- Expected loss if choosing variant: 0.007 percentage points
- Expected loss if choosing control: 0.807 percentage points

### Interpretation

With **96.2%** confidence, 
the **variant** performs better. This exceeds the 95% threshold.


## Bayesian Multi-Variant Testing

In [10]:
# Compare 3+ variants with Bayesian analysis
bayes_multi = bayesian.analyze_multi(
    variants=[
        {"name": "Control",   "visitors": 5000, "conversions": 250},
        {"name": "Variant A", "visitors": 5000, "conversions": 290},
        {"name": "Variant B", "visitors": 5000, "conversions": 275},
    ],
)

print(f"Best variant: {bayes_multi.best_variant}")
print(f"\nProbability each is best:")
for name, prob in bayes_multi.probabilities_best.items():
    print(f"  {name}: {prob:.1f}%")
print(f"\nExpected loss for each:")
for name, loss in bayes_multi.expected_losses.items():
    print(f"  {name}: {loss:.4f}")

Best variant: Variant A

Probability each is best:
  Control: 1.9%
  Variant A: 73.1%
  Variant B: 25.0%

Expected loss for each:
  Control: 0.8747
  Variant A: 0.0746
  Variant B: 0.3748


## When to Use Each Method

| Method | Best For | Key Advantage |
|--------|----------|---------------|
| **Frequentist** (default) | Standard tests with fixed sample size | Familiar, well-understood guarantees |
| **Sequential** | When you need to peek at results early | Controls false positive rate across peeks |
| **Bayesian** | When you want probability statements | "95% chance variant is better" is intuitive |

**Sequential method choice:**
- **O'Brien-Fleming**: Harder to stop early, best for protecting against premature decisions
- **Pocock**: Equal boundaries at each peek, easier to stop early