
# Pricing A/B Testing Playbook — High vs Low Price Variant

This notebook is a **complete, end-to-end analysis** of a pricing A/B test for an e-commerce or subscription product.

- The company experimented with **two price points** (e.g. 9.99 vs 7.99).  
- Each user was randomly assigned to either **control (old price)** or **treatment (new price)**.  
- The main question: *Should we change the price, given the impact on conversion and revenue?*

All Markdown is in **English**, and the code is written to be clear, typed, and re-usable.



## 0) Data and experiment setup

We assume a pricing A/B dataset called `pricing_ab.csv` with the following columns:

- `user_id` — unique user identifier (string or integer).  
- `group` — `"control"` or `"treatment"` (pricing buckets).  
- `price` — price shown to the user (float).  
- `purchased` — 1 if the user purchased, 0 otherwise (binary).  
- `revenue` — total revenue from the user during the experiment window (float; 0 for non-buyers).  
- *(Optional)* `segment` — user segment (e.g., `"new"`, `"returning"`, `"high_value"`).  
- *(Optional)* `date` or `timestamp` — when the user saw the price (for day-level checks).

> This structure is common across many public pricing A/B datasets.  
> If your file uses different column names, you can adapt the small parts where they are referenced.



## 1) Setup

We import the usual scientific Python stack, then define typed helper functions for proportions,
t-tests, SRM checks, and power/MDE calculations.


In [None]:

from __future__ import annotations

from dataclasses import dataclass
from typing import Tuple

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (7, 4.5)
plt.rcParams["axes.grid"] = True

# Optional: used later for GLM (logit)
try:
    import statsmodels.api as sm  # type: ignore
except Exception as e:  # pragma: no cover
    sm = None
    print("statsmodels not available; GLM cells will be skipped.", e)



### 1.1 Proportion helpers (conversion) and SRM


In [None]:

@dataclass(frozen=True)
class PropSummary:
    """Summary of a Bernoulli proportion.

    Attributes
    ----------
    p : float
        Sample proportion x / n.
    n : int
        Sample size.
    x : int
        Number of successes.
    """
    p: float
    n: int
    x: int


def summarize_prop(x: int, n: int) -> PropSummary:
    """Validate and summarize a proportion sample.

    Parameters
    ----------
    x : int
        Number of successes, in [0, n].
    n : int
        Sample size, must be positive.

    Returns
    -------
    PropSummary
        Dataclass with p, n, x.
    """
    if n <= 0:
        raise ValueError("n must be positive.")
    if not (0 <= x <= n):
        raise ValueError("x must satisfy 0 <= x <= n.")
    return PropSummary(p=x / n, n=n, x=x)


def invPhi(u: float) -> float:
    """Inverse standard normal CDF using erfcinv.

    Parameters
    ----------
    u : float
        Probability in (0, 1).

    Returns
    -------
    float
        z such that Phi(z) = u.
    """
    if not 0.0 < u < 1.0:
        raise ValueError("u must be in (0,1).")
    return math.sqrt(2.0) * math.erfcinv(2.0 * (1.0 - u))


def two_prop_ztest(
    x1: int,
    n1: int,
    x2: int,
    n2: int,
    two_sided: bool = True,
) -> Tuple[float, float]:
    """Two-sample z-test for proportions with pooled variance.

    Tests H0: p1 = p2 vs H1: p1 != p2 (two-sided by default).

    Parameters
    ----------
    x1, n1, x2, n2 : int
        Success counts and sample sizes for groups 1 and 2.
    two_sided : bool
        If True, compute a two-sided p-value. If False, right-sided (p2 > p1).

    Returns
    -------
    z : float
        z-statistic (signed).
    p_value : float
        Corresponding p-value.
    """
    s1, s2 = summarize_prop(x1, n1), summarize_prop(x2, n2)
    p_pool = (s1.x + s2.x) / (s1.n + s2.n)
    se = math.sqrt(p_pool * (1.0 - p_pool) * (1.0 / s1.n + 1.0 / s2.n))
    if se == 0.0:
        raise ZeroDivisionError("Standard error is zero; check inputs.")
    z = (s2.p - s1.p) / se
    # standard normal tail via erf
    if two_sided:
        p = 2.0 * (1.0 - 0.5 * (1.0 + math.erf(abs(z) / math.sqrt(2.0))))
    else:
        p = 1.0 - 0.5 * (1.0 + math.erf(z / math.sqrt(2.0)))
    return float(z), float(p)


def chisq_srm(nA: int, nB: int) -> float:
    """Chi-square SRM (sample ratio mismatch) test for 50/50 split.

    Parameters
    ----------
    nA, nB : int
        Sample sizes for arms A and B.

    Returns
    -------
    float
        Approximate two-sided p-value for chi-square(1).
    """
    n = nA + nB
    exp_A = exp_B = n / 2.0
    chi2 = (nA - exp_A) ** 2 / exp_A + (nB - exp_B) ** 2 / exp_B
    z = math.sqrt(chi2)
    # Tail of chi-square(1) via normal on sqrt(chi2)
    p = 2.0 * (1.0 - 0.5 * (1.0 + math.erf(z / math.sqrt(2.0))))
    return float(p)



### 1.2 T-test helpers (revenue) and power/MDE


In [None]:

def welch_ttest(
    x: np.ndarray,
    y: np.ndarray,
) -> Tuple[float, float]:
    """Welch t-test for difference in means (two-sided).

    Parameters
    ----------
    x, y : np.ndarray
        Samples from two groups (e.g., revenue per user in control vs treatment).

    Returns
    -------
    t_stat : float
        Welch t-statistic.
    p_value : float
        Two-sided p-value under approximate normality.
    """
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    n1, n2 = x.size, y.size
    if n1 < 2 or n2 < 2:
        raise ValueError("Need at least 2 observations per group.")
    m1, m2 = float(x.mean()), float(y.mean())
    v1, v2 = float(x.var(ddof=1)), float(y.var(ddof=1))
    se = math.sqrt(v1 / n1 + v2 / n2)
    if se == 0.0:
        raise ZeroDivisionError("Standard error is zero; check variance.")
    t_stat = (m2 - m1) / se
    # Use normal approximation for p-value (large n)
    p = 2.0 * (1.0 - 0.5 * (1.0 + math.erf(abs(t_stat) / math.sqrt(2.0))))
    return float(t_stat), float(p)


def mde_for_n(
    p_baseline: float,
    n_per_arm: int,
    alpha: float = 0.05,
    power: float = 0.8,
    two_sided: bool = True,
) -> float:
    """Compute absolute MDE on a proportion metric for given n per arm.

    Uses normal approximation for a two-sample proportion test.

    Parameters
    ----------
    p_baseline : float
        Baseline conversion rate (in (0, 1)).
    n_per_arm : int
        Sample size per arm.
    alpha : float
        Significance level.
    power : float
        Desired power (1 - beta).
    two_sided : bool
        If True, uses two-sided z_{alpha/2}.

    Returns
    -------
    float
        Approximate minimal detectable effect (absolute difference in p).
    """
    if not 0.0 < p_baseline < 1.0:
        raise ValueError("p_baseline must be in (0,1).")
    if n_per_arm <= 0:
        raise ValueError("n_per_arm must be positive.")

    z_alpha = abs(invPhi(1.0 - alpha / 2.0)) if two_sided else abs(invPhi(1.0 - alpha))
    z_beta = abs(invPhi(power))
    se = math.sqrt(2.0 * p_baseline * (1.0 - p_baseline))
    return float((z_alpha + z_beta) * se / math.sqrt(n_per_arm))



## 2) Load and inspect the pricing A/B dataset

We expect `pricing_ab.csv` in the same directory as this notebook. If the file name or path differ,
you can adjust `DATA_PATH` below.


In [None]:

from pathlib import Path

DATA_PATH = Path("pricing_ab.csv")

if not DATA_PATH.exists():
    raise FileNotFoundError(
        "pricing_ab.csv not found in the current directory.\n"
        "Place your pricing A/B dataset here or update DATA_PATH."
    )

df_raw = pd.read_csv(DATA_PATH)
df_raw.head()



### 2.1 Basic cleaning and SRM check

We:

1. Keep only `group` in {`control`, `treatment`}.  
2. Drop duplicate `user_id` rows.  
3. Confirm that prices align with expectations (e.g., one main price per group or a consistent pattern).  
4. Run an SRM check on the group sizes.


In [None]:

df = df_raw.copy()

# Filter to the two main groups
df = df[df["group"].isin(["control", "treatment"])].copy()

# Drop duplicate users, if any
df = df.drop_duplicates(subset=["user_id"], keep="first").reset_index(drop=True)

# Basic sanity checks
print("Groups:", df["group"].value_counts())
print("Price per group (describe):")
print(df.groupby("group")["price"].describe().round(4))

n_control = (df["group"] == "control").sum()
n_treat = (df["group"] == "treatment").sum()
p_srm = chisq_srm(n_control, n_treat)

n_control, n_treat, p_srm



**Reading SRM**

- Very small SRM p-values (e.g., < 0.01) can indicate randomization or logging issues.  
- A healthy A/B test for pricing often targets either a 50/50 split or another pre-defined ratio.



## 3) EDA: price and metrics overview

We explore:

- Price distributions by group.  
- Basic conversion and revenue metrics.  
- Optional segment breakdown if a `segment` column exists.


In [None]:

# Price distribution by group
plt.figure()
for g in ["control", "treatment"]:
    vals = df.loc[df["group"] == g, "price"].values
    plt.hist(vals, bins=40, alpha=0.5, label=g)
plt.title("Price distribution by group")
plt.xlabel("price")
plt.ylabel("count")
plt.legend()
plt.tight_layout()
plt.show()


In [None]:

# Basic metrics by group
metrics = (
    df.assign(
        revenue_per_user=df["revenue"].astype(float),
        conversion=df["purchased"].astype(int),
    )
    .groupby("group")[["conversion", "revenue_per_user"]]
    .agg(["mean", "median", "std", "count"])
)
metrics



If a `segment` column exists, we can quickly inspect conversion by segment and group.


In [None]:

if "segment" in df.columns:
    seg_summary = (
        df.groupby(["segment", "group"])["purchased"]
        .agg(["mean", "count"])
        .rename(columns={"mean": "conversion_rate"})
        .reset_index()
    )
    seg_summary.head(20)
else:
    print("No 'segment' column found; skipping segment breakdown.")



## 4) Primary inference: conversion rate

We treat `purchased` as a Bernoulli metric and compare **conversion rates**:

\[
\text{CR}_g = \mathbb{P}(\text{purchased}=1 \mid group=g).
\]

We run a two-proportion z-test and compute a 95% CI for the difference in conversion rates.


In [None]:

conv_by_group = (
    df.groupby("group")["purchased"]
    .agg(["sum", "count", "mean"])
    .rename(columns={"sum": "x", "count": "n", "mean": "rate"})
)
conv_by_group


In [None]:

xA = int(conv_by_group.loc["control", "x"])
nA = int(conv_by_group.loc["control", "n"])
xB = int(conv_by_group.loc["treatment", "x"])
nB = int(conv_by_group.loc["treatment", "n"])

sA = summarize_prop(xA, nA)
sB = summarize_prop(xB, nB)

z_conv, p_conv = two_prop_ztest(xA, nA, xB, nB, two_sided=True)

# Normal-approximation CI for the difference (B - A)
diff_conv = sB.p - sA.p
alpha = 0.05
z_alpha = abs(invPhi(1.0 - alpha / 2.0))
se_diff_conv = math.sqrt(
    (sA.p * (1.0 - sA.p)) / sA.n + (sB.p * (1.0 - sB.p)) / sB.n
)
ci_lo_conv = diff_conv - z_alpha * se_diff_conv
ci_hi_conv = diff_conv + z_alpha * se_diff_conv

pd.DataFrame(
    {
        "arm": ["control", "treatment"],
        "n": [sA.n, sB.n],
        "x": [sA.x, sB.x],
        "rate": [sA.p, sB.p],
        "diff_B_minus_A": [diff_conv, diff_conv],
        "diff_CI95_lo": [ci_lo_conv, ci_lo_conv],
        "diff_CI95_hi": [ci_hi_conv, ci_hi_conv],
        "z_stat": [z_conv, z_conv],
        "p_value": [p_conv, p_conv],
    }
)



**Interpretation (conversion)**

- `diff_B_minus_A` is the absolute lift in conversion when showing the **treatment price**.  
- The 95% CI shows which lifts are compatible with the data.  
- The p-value tests the null hypothesis of **no difference** in conversion rate.

Next we look at **revenue per user**, which is the more direct business objective in pricing tests.



## 5) Revenue per user: Welch t-test

Our key business metric is often **revenue per eligible user** (RPU), defined as:

\[
\text{RPU}_g = \mathbb{E}[\text{revenue} \mid group = g].
\]

We use a **Welch t-test** on revenue per user to compare groups.


In [None]:

df["revenue"] = df["revenue"].astype(float)
revenue_control = df.loc[df["group"] == "control", "revenue"].to_numpy()
revenue_treat = df.loc[df["group"] == "treatment", "revenue"].to_numpy()

t_rev, p_rev = welch_ttest(revenue_control, revenue_treat)

mean_rev_A = float(revenue_control.mean())
mean_rev_B = float(revenue_treat.mean())

# Approximate CI using normal approximation around mean difference
diff_rev = mean_rev_B - mean_rev_A
varA = float(revenue_control.var(ddof=1))
varB = float(revenue_treat.var(ddof=1))
se_diff_rev = math.sqrt(varA / revenue_control.size + varB / revenue_treat.size)
ci_lo_rev = diff_rev - z_alpha * se_diff_rev
ci_hi_rev = diff_rev + z_alpha * se_diff_rev

pd.DataFrame(
    {
        "arm": ["control", "treatment"],
        "mean_revenue": [mean_rev_A, mean_rev_B],
        "diff_B_minus_A": [diff_rev, diff_rev],
        "diff_CI95_lo": [ci_lo_rev, ci_lo_rev],
        "diff_CI95_hi": [ci_hi_rev, ci_hi_rev],
        "t_stat": [t_rev, t_rev],
        "p_value": [p_rev, p_rev],
    }
)



**Interpretation (revenue)**

- If conversion goes down but revenue per user goes up, a **higher price** can still be beneficial.  
- The decision should focus on RPU (and downstream margin), not only on conversion.

Next, we connect price differences with demand differences via a simple **elasticity** calculation.



## 6) Price elasticity (coarse estimate)

A very rough **price elasticity of demand** can be approximated using the two price points and conversion rates:

\[
\epsilon \approx \frac{\Delta Q / Q}{\Delta P / P},
\]

where:

- \(Q\) is demand proxy (e.g., conversion rate),  
- \(P\) is price.

We use control as baseline and treatment as the alternative price.


In [None]:

price_control = float(df.loc[df["group"] == "control", "price"].mean())
price_treat = float(df.loc[df["group"] == "treatment", "price"].mean())

Q_control = float(conv_by_group.loc["control", "rate"])
Q_treat = float(conv_by_group.loc["treatment", "rate"])

dP_over_P = (price_treat - price_control) / price_control if price_control != 0 else float("nan")
dQ_over_Q = (Q_treat - Q_control) / Q_control if Q_control != 0 else float("nan")

elasticity = dQ_over_Q / dP_over_P if dP_over_P not in (0.0, float("nan")) else float("nan")

{
    "price_control": price_control,
    "price_treat": price_treat,
    "conv_control": Q_control,
    "conv_treat": Q_treat,
    "dP_over_P": dP_over_P,
    "dQ_over_Q": dQ_over_Q,
    "elasticity": elasticity,
}



**Reading elasticity**

- A negative elasticity (common) means that increasing price reduces demand.  
- Values with larger absolute magnitude indicate **more sensitive demand**.  
- This is a **very coarse** estimate; a full pricing project often uses more price points and regression models.

Next we check whether the experiment was capable of detecting a meaningful conversion change (MDE).



## 7) Power and MDE for conversion

Using the baseline conversion rate and the smallest group size, we compute the **MDE** at 80% power and 5% alpha.


In [None]:

p_baseline = sA.p  # control conversion
n_per_arm = min(sA.n, sB.n)
mde_80 = mde_for_n(p_baseline, n_per_arm, alpha=0.05, power=0.8, two_sided=True)

{
    "baseline_conversion_control": p_baseline,
    "n_per_arm": n_per_arm,
    "MDE_abs_at_80pct_power": mde_80,
}



If your observed conversion lift is much smaller (in absolute value) than this MDE, the test might be
**underpowered** to detect such subtle effects. In pricing, this is common: small price changes can have
small but important effects that require large sample sizes to pick up.



## 8) Executive summary template

When you run this notebook on your real `pricing_ab.csv`, you can use the following structure for a decision memo:

1. **Sanity checks**
   - SRM p-value and any obvious data-quality issues.

2. **Conversion and revenue results**
   - Conversion rate per group and absolute lift (with 95% CI and p-value).  
   - Revenue per user per group and absolute lift (with 95% CI and p-value).  
   - Simple elasticity estimate and whether it is plausible from a business standpoint.

3. **Risk and upside**
   - If price increases: trade-off between lower conversion and higher RPU.  
   - If price decreases: higher conversion vs lower per-order revenue, and overall RPU impact.

4. **Power / MDE context**
   - Whether the test could realistically detect the effect sizes you care about.

5. **Decision and rollout**
   - Ship / hold / roll back and why.  
   - If ship: ramp strategy (e.g. 20% → 50% → 100%) and monitoring.  
   - If hold: whether you need more data, more segments, or a different price grid.

This keeps the pricing experiment focused on **business outcomes** rather than only statistical significance.



## 8a) CUPED with a pre-period metric (if available)

In pricing experiments it is common to have a **pre-period** metric per user, for example:

- `pre_revenue`: revenue for this user in the 30 days before the test, or  
- `past_purchases`: number of past purchases before the test.

If this covariate is predictive of **revenue during the experiment**, we can use **CUPED** to reduce
variance on our revenue per user metric.

The CUPED transform is:

\[
Y_i^{*} = Y_i - \theta (X_i - \bar X), \quad 
\theta = \frac{\mathrm{Cov}(Y, X)}{\mathrm{Var}(X)},
\]

where:
- \(Y_i\) is the outcome (here: revenue per user during the test),  
- \(X_i\) is a pre-period covariate,  
- \(\bar X\) is the sample mean of \(X\).

We then compare groups on \(Y^{*}\) instead of \(Y\). When \(X\) is correlated with \(Y\), the variance
of group means (and their difference) shrinks.


In [None]:

import numpy as np

def cuped_adjust(y: np.ndarray, x: np.ndarray) -> tuple[np.ndarray, float]:
    """Apply CUPED adjustment Y* = Y - θ (X - mean(X)).

    Parameters
    ----------
    y : np.ndarray
        Outcome vector (shape (n,)).
    x : np.ndarray
        Covariate vector (shape (n,)), ideally pre-experiment.

    Returns
    -------
    y_adj : np.ndarray
        CUPED-adjusted outcome.
    theta : float
        Estimated CUPED coefficient Cov(Y,X)/Var(X).
    """
    y = np.asarray(y, dtype=float)
    x = np.asarray(x, dtype=float)
    if y.shape != x.shape:
        raise ValueError("y and x must have the same shape.")
    if y.ndim != 1:
        raise ValueError("y and x must be 1D arrays.")
    vx = float(np.var(x, ddof=1))
    if vx == 0.0:
        # No variation in X: no adjustment
        return y.copy(), 0.0
    cov_yx = float(np.cov(y, x, ddof=1)[0, 1])
    theta = cov_yx / vx
    x_centered = x - float(np.mean(x))
    y_adj = y - theta * x_centered
    return y_adj, theta


# Try to find a reasonable pre-period covariate
pre_candidates: list[str] = [
    c for c in ["pre_revenue", "past_purchases"] if c in df.columns
]
pre_candidates


In [None]:

if not pre_candidates:
    print(
        "No pre-period covariate found (expected 'pre_revenue' or 'past_purchases').\n"
        "To use CUPED here, add one of these columns to pricing_ab.csv and rerun."
    )
else:
    pre_col = pre_candidates[0]
    print(f"Using pre-period covariate: {pre_col!r}")

    y = df["revenue"].to_numpy(dtype=float)
    x = df[pre_col].to_numpy(dtype=float)

    y_adj, theta_hat = cuped_adjust(y, x)
    df["revenue_cuped"] = y_adj

    theta_hat


In [None]:

if "revenue_cuped" in df.columns:
    # Compare group-level stats before and after CUPED
    cuped_summary = (
        df.assign(revenue_raw=df["revenue"].astype(float))
          .groupby("group")[["revenue_raw", "revenue_cuped"]]
          .agg(["mean", "var", "count"])
    )
    display(cuped_summary)

    # A/B comparison on CUPED-adjusted revenue (difference in means, Welch t-test)
    rev_A_c = df.loc[df["group"] == "control", "revenue_cuped"].to_numpy(dtype=float)
    rev_B_c = df.loc[df["group"] == "treatment", "revenue_cuped"].to_numpy(dtype=float)

    t_cuped, p_cuped = welch_ttest(rev_A_c, rev_B_c)

    mean_A_c = float(rev_A_c.mean())
    mean_B_c = float(rev_B_c.mean())
    diff_c = mean_B_c - mean_A_c

    varA_c = float(rev_A_c.var(ddof=1))
    varB_c = float(rev_B_c.var(ddof=1))
    se_diff_c = math.sqrt(varA_c / rev_A_c.size + varB_c / rev_B_c.size)
    z_alpha_local = abs(invPhi(1.0 - 0.05 / 2.0))
    ci_lo_c = diff_c - z_alpha_local * se_diff_c
    ci_hi_c = diff_c + z_alpha_local * se_diff_c

    {
        "theta_hat": theta_hat,
        "diff_raw_revenue": float(
            df.loc[df["group"] == "treatment", "revenue"].mean()
            - df.loc[df["group"] == "control", "revenue"].mean()
        ),
        "diff_cuped_revenue": diff_c,
        "cuped_CI95": (ci_lo_c, ci_hi_c),
        "t_cuped": t_cuped,
        "p_cuped": p_cuped,
    }
else:
    print("No 'revenue_cuped' column — CUPED not applied.")



**Interpretation (CUPED)**

- When a true pre-period metric is available and correlated with experiment revenue, CUPED can
  reduce the variance of revenue per user, often cutting required sample size by 10–30% or more.  
- In this notebook we only apply CUPED if a column like `pre_revenue` or `past_purchases` exists.
  You can plug in any numeric pre-period feature you trust.  
- The logic is identical for other metrics (e.g., conversion) — simply change `y` accordingly.



## 9) Bayesian pricing view — posterior over RPU difference

For pricing, the key quantity is often **revenue per user (RPU)** in each arm:

\[
\mu_A = \mathbb{E}[\text{revenue} \mid \text{control}], \quad
\mu_B = \mathbb{E}[\text{revenue} \mid \text{treatment}].
\]

We want a distribution for the **difference**:

\[
\Delta_\mu = \mu_B - \mu_A.
\]

Assuming revenue per user is approximately **normal** in each arm, we can use a simple Bayesian
approximation:

- Prior: vague / flat for each mean \(\mu_g\).  
- Likelihood: Normal with unknown variance, approximated using the sample variance.  
- Posterior (approximate):

\[
\mu_g \mid \text{data} \approx \mathcal{N}\left(\bar y_g,\ \frac{s_g^2}{n_g}\right),
\]

where \(\bar y_g\) and \(s_g^2\) are the sample mean and variance for arm \(g\).

We then draw Monte Carlo samples of \(\mu_A, \mu_B\) and derive a posterior distribution for \(\Delta_\mu\).


In [None]:

def sample_posterior_rpu(
    revenue_A: np.ndarray,
    revenue_B: np.ndarray,
    n_draws: int = 50_000,
    seed: int | None = 123,
) -> pd.DataFrame:
    """Approximate posterior for mean revenue per user in each arm.

    Assumes a normal model with unknown variance approximated by the sample variance.

    Parameters
    ----------
    revenue_A, revenue_B : np.ndarray
        Revenue per user in control and treatment arms.
    n_draws : int
        Number of Monte Carlo draws.
    seed : int | None
        Random seed for reproducibility.

    Returns
    -------
    DataFrame
        Columns: mu_A, mu_B, lift (mu_B - mu_A).
    """
    rng = np.random.default_rng(seed)
    revenue_A = np.asarray(revenue_A, dtype=float)
    revenue_B = np.asarray(revenue_B, dtype=float)

    nA = revenue_A.size
    nB = revenue_B.size
    if nA < 2 or nB < 2:
        raise ValueError("Need at least 2 observations per group.")

    mean_A = float(revenue_A.mean())
    mean_B = float(revenue_B.mean())
    var_A = float(revenue_A.var(ddof=1))
    var_B = float(revenue_B.var(ddof=1))

    # Posterior approx: Normal(mean_g, var_g / n_g)
    mu_A_draws = rng.normal(loc=mean_A, scale=math.sqrt(var_A / nA), size=n_draws)
    mu_B_draws = rng.normal(loc=mean_B, scale=math.sqrt(var_B / nB), size=n_draws)
    lift = mu_B_draws - mu_A_draws

    return pd.DataFrame({"mu_A": mu_A_draws, "mu_B": mu_B_draws, "lift": lift})


post_rpu = sample_posterior_rpu(revenue_control, revenue_treat, n_draws=60_000, seed=2025)
post_rpu.describe(percentiles=[0.025, 0.5, 0.975])


In [None]:

# Posterior probability that treatment RPU is higher
prob_rpu_better: float = float((post_rpu["lift"] > 0).mean())
ci_lo_rpu, ci_hi_rpu = np.quantile(post_rpu["lift"], [0.025, 0.975])

plt.figure()
plt.hist(post_rpu["lift"], bins=60, density=True)
plt.axvline(0.0, linestyle="--")
plt.title("Posterior distribution of RPU lift (μ_B - μ_A)")
plt.xlabel("RPU lift")
plt.ylabel("density")
plt.tight_layout()
plt.show()

{
    "posterior_prob_RPU_treatment_higher": prob_rpu_better,
    "RPU_lift_cred_int_95": (ci_lo_rpu, ci_hi_rpu),
}



**Reading the Bayesian RPU view**

- `posterior_prob_RPU_treatment_higher` is the probability (under this model) that **treatment pricing**
  yields higher revenue per user than control.  
- The credible interval for the RPU lift tells you which **revenue deltas** are plausible
  given your data and assumptions.  
- For a more detailed demand view, you can combine this with a **Beta–Binomial** posterior on conversion,
  as in the e-commerce notebook.

This Bayesian view is useful when stakeholders want answers like
“**How likely is it that the new price is better, in revenue terms?**”
instead of only a p-value.



## 10) DR / causal block — non-random discount assignment (simulation)

In many real systems, **pricing** or **discounts** are **not assigned randomly**:

- A model decides who sees a discount based on features (segment, RFM, device, etc.).  
- High-value users may be shown higher prices or fewer coupons.  
- New users may automatically see a discount banner.

In such cases, **naive comparisons** of discounted vs non-discounted users are biased:
there is **confounding** by user features.

Here we simulate a simple scenario where:

- User features \(X\) influence both **discount assignment** \(D\) and **purchase outcome** \(Y\).  
- We then compare:
  - **Naive difference in means**,
  - **IPW** using the true propensities,
  - **DR (doubly-robust)** estimator with logistic outcome models.


In [None]:

from sklearn.linear_model import LogisticRegression

def sigmoid(z: np.ndarray | float) -> np.ndarray | float:
    """Numerically stable logistic function."""
    return 1.0 / (1.0 + np.exp(-z))


def simulate_discount_confounded(
    n: int = 20_000,
    seed: int | None = 123,
) -> pd.DataFrame:
    """Simulate discount assignment and purchase outcome with confounding via X.

    Data generating process
    -----------------------
    - Features X ~ N(0, I_d) with d = 4.
    - Discount propensity e(x) = sigmoid(bias_e + w_e^T x).
    - Discount D ~ Bernoulli(e(x)).
    - Outcome logit: logit P(Y=1 | X, D) = bias_y + w_y^T x + tau_true * D.

    Parameters
    ----------
    n : int
        Number of samples.
    seed : int | None
        Random seed.

    Returns
    -------
    DataFrame
        Columns: x1..x4, D, Y, e_true (true propensity of discount).
    """
    rng = np.random.default_rng(seed)
    d = 4
    X = rng.normal(size=(n, d))

    # Propensity model (drives confounding)
    w_e = np.array([0.8, -0.4, 0.6, 0.3])
    bias_e = -0.2
    e = sigmoid(bias_e + X @ w_e)

    D = rng.binomial(1, e)

    # Outcome model
    w_y = np.array([0.4, 0.2, -0.3, 0.1])
    bias_y = -1.0
    tau_true = 0.35  # log-odds lift from being discounted

    lin = bias_y + X @ w_y + tau_true * D
    p = sigmoid(lin)
    Y = rng.binomial(1, p)

    cols = {f"x{j+1}": X[:, j] for j in range(d)}
    df_sim = pd.DataFrame(cols)
    df_sim["D"] = D
    df_sim["Y"] = Y
    df_sim["e_true"] = e
    return df_sim


def naive_ate_discount(df_sim: pd.DataFrame) -> float:
    """Naive ATE: difference in purchase rates E[Y|D=1] - E[Y|D=0], ignoring confounding."""
    m1 = float(df_sim.loc[df_sim["D"] == 1, "Y"].mean())
    m0 = float(df_sim.loc[df_sim["D"] == 0, "Y"].mean())
    return m1 - m0


def ipw_ate_discount(df_sim: pd.DataFrame, e_col: str = "e_true") -> float:
    """IPW ATE estimator using a known or estimated propensity column."""
    if not {"D", "Y", e_col}.issubset(df_sim.columns):
        raise ValueError(f"df_sim must contain D, Y, and {e_col}.")

    d = df_sim["D"].to_numpy()
    y = df_sim["Y"].to_numpy()
    e = np.clip(df_sim[e_col].to_numpy(), 1e-6, 1.0 - 1.0e-6)

    w1 = d / e
    w0 = (1.0 - d) / (1.0 - e)

    p1_hat = (w1 * y).sum() / max(w1.sum(), 1e-12)
    p0_hat = (w0 * y).sum() / max(w0.sum(), 1e-12)

    return float(p1_hat - p0_hat)


def dr_logistic_ate_discount(df_sim: pd.DataFrame, e_col: str = "e_true") -> float:
    """Doubly-robust ATE with logistic outcome models for discounted vs non-discounted.

    - Fit logistic models m1(x) and m0(x) for Y|X,D=1 and Y|X,D=0.
    - Combine them with propensity-adjusted residuals.
    """
    feature_cols = [c for c in df_sim.columns if c.startswith("x")]
    if not feature_cols:
        raise ValueError("df_sim must contain feature columns starting with 'x'.")
    if not {"D", "Y", e_col}.issubset(df_sim.columns):
        raise ValueError("df_sim must contain D, Y, and the propensity column.")

    X = df_sim[feature_cols].to_numpy()
    d = df_sim["D"].to_numpy()
    y = df_sim["Y"].to_numpy()
    e = np.clip(df_sim[e_col].to_numpy(), 1e-6, 1.0 - 1.0e-6)

    # Outcome models for Y|X,D=1 and Y|X,D=0
    X1 = X[d == 1]
    y1 = y[d == 1]
    X0 = X[d == 0]
    y0 = y[d == 0]

    mdl1 = LogisticRegression(max_iter=1000).fit(X1, y1)
    mdl0 = LogisticRegression(max_iter=1000).fit(X0, y0)

    m1_hat = mdl1.predict_proba(X)[:, 1]
    m0_hat = mdl0.predict_proba(X)[:, 1]

    # DR formula
    term = (m1_hat - m0_hat) + (d * (y - m1_hat) / e) - ((1.0 - d) * (y - m0_hat) / (1.0 - e))
    return float(np.mean(term))


# Single simulation demo
df_sim_disc = simulate_discount_confounded(n=30_000, seed=777)
ate_naive = naive_ate_discount(df_sim_disc)
ate_ipw = ipw_ate_discount(df_sim_disc, e_col="e_true")
ate_dr = dr_logistic_ate_discount(df_sim_disc, e_col="e_true")

{"ATE_naive": ate_naive, "ATE_IPW_true": ate_ipw, "ATE_DR_logistic": ate_dr}



### 10.1 Repeated simulation: bias and variance of estimators

We now repeat the confounded discount experiment many times and compare the sampling distributions of:

- naive difference in purchase rates,  
- IPW with true propensities,  
- DR with logistic outcome models.


In [None]:

def compare_discount_estimators(
    R: int = 120,
    n: int = 20_000,
    seed: int | None = 999,
) -> pd.DataFrame:
    """Compare naive, IPW, and DR estimators of ATE over R simulated experiments."""
    rng = np.random.default_rng(seed)
    rows: list[tuple[float, float, float]] = []
    for _ in range(R):
        s = int(rng.integers(0, 10_000_000))
        df_s = simulate_discount_confounded(n=n, seed=s)
        rows.append(
            (
                naive_ate_discount(df_s),
                ipw_ate_discount(df_s, e_col="e_true"),
                dr_logistic_ate_discount(df_s, e_col="e_true"),
            )
        )
    return pd.DataFrame(rows, columns=["naive", "ipw", "dr_logit"])


res_disc = compare_discount_estimators(R=120, n=20_000, seed=2025)
res_disc.agg(["mean", "std", "min", "max"])


In [None]:

ax = res_disc.plot(kind="hist", bins=40, alpha=0.5)
ax.set_title("ATE estimators under confounded discount assignment")
ax.set_xlabel("Estimated ATE on purchase probability")
ax.set_ylabel("Frequency")
plt.tight_layout()
plt.show()



**Key message for real pricing systems**

- If discounts or price variants are **not randomized**, naive group comparisons are usually **biased**.  
- IPW and DR estimators correct for this confounding when you have a decent propensity model and
  an outcome model.  
- The same pattern can be extended from **purchase probability** to **revenue** (with suitable
  outcome models for revenue).  
- For production systems, you would:
  - log features \(X\), propensities \(e(X)\), discount/price actually shown,
  - fit outcome models \(m_t(X)\) for conversion or revenue,
  - use DR-style estimators to evaluate counterfactual pricing or discount policies.



## 11) Bayesian demand and price elasticity

So far we looked at:

- **Frequentist** conversion and revenue tests,
- A Bayesian view on **RPU**,
- A coarse, point-estimate elasticity.

Here we build a **Bayesian model for demand** (conversion probability) and derive a posterior
distribution for the **price elasticity of demand** itself.

We use:

- A **Beta–Binomial** model for conversion in each arm,
- The observed mean price in each arm,
- A Monte Carlo approximation for the posterior distribution of elasticity.

Let:

- \(p_A\) and \(p_B\) be conversion probabilities at prices \(P_A\) and \(P_B\),
- Control is baseline (arm A), treatment is alternative (arm B).

The (arc) elasticity we approximate is:

\[
\epsilon = \frac{(p_B - p_A)/p_A}{(P_B - P_A)/P_A},
\]

which answers: *“For a given relative price change, what is the relative change in demand?”*

We will obtain a **posterior distribution of \\(\epsilon\\)** instead of a single point estimate.


In [None]:

def posterior_demand_elasticity(
    xA: int,
    nA: int,
    xB: int,
    nB: int,
    price_A: float,
    price_B: float,
    alpha0: float = 1.0,
    beta0: float = 1.0,
    n_draws: int = 50_000,
    seed: int | None = 42,
) -> pd.DataFrame:
    """Sample from the posterior of demand and price elasticity.

    We assume a Beta-Binomial model for each arm:

    - p_A ~ Beta(alpha0, beta0),  p_B ~ Beta(alpha0, beta0)
    - Data: xA successes out of nA, xB out of nB.
    - Posterior:
        p_A | data ~ Beta(alpha0 + xA, beta0 + nA - xA)
        p_B | data ~ Beta(alpha0 + xB, beta0 + nB - xB)

    We then define an arc elasticity (using control as baseline):

        eps = ((p_B - p_A) / p_A) / ((P_B - P_A) / P_A)

    Parameters
    ----------
    xA, nA, xB, nB : int
        Conversions and sample sizes for control (A) and treatment (B).
    price_A, price_B : float
        Mean prices in control and treatment arms.
    alpha0, beta0 : float
        Hyperparameters of the Beta prior (default: Beta(1, 1), uniform).
    n_draws : int
        Number of Monte Carlo draws from the posterior.
    seed : int | None
        Random seed for reproducibility.

    Returns
    -------
    DataFrame
        Columns: pA, pB, elasticity with n_draws rows.
    """
    if nA <= 0 or nB <= 0:
        raise ValueError("Sample sizes nA and nB must be positive.")
    if price_A <= 0.0:
        raise ValueError("price_A must be positive.")
    if price_B == price_A:
        raise ValueError("price_B must differ from price_A to define elasticity.")

    rng = np.random.default_rng(seed)

    # Posterior parameters
    alphaA = alpha0 + xA
    betaA = beta0 + nA - xA
    alphaB = alpha0 + xB
    betaB = beta0 + nB - xB

    # Sample posterior conversion probabilities
    pA_draws = rng.beta(alphaA, betaA, size=n_draws)
    pB_draws = rng.beta(alphaB, betaB, size=n_draws)

    # Relative price change (baseline: control)
    dP_over_P = (price_B - price_A) / price_A

    # Avoid division by zero in demand ratio
    eps_list = []
    for pA_i, pB_i in zip(pA_draws, pB_draws):
        if pA_i <= 0.0 or dP_over_P == 0.0:
            eps_list.append(np.nan)
        else:
            dQ_over_Q = (pB_i - pA_i) / pA_i
            eps_list.append(dQ_over_Q / dP_over_P)

    return pd.DataFrame(
        {
            "pA": pA_draws,
            "pB": pB_draws,
            "elasticity": np.asarray(eps_list, dtype=float),
        }
    )


# Use observed conversion counts and prices from earlier sections
xA_el = xA  # control conversions
nA_el = nA  # control users
xB_el = xB  # treatment conversions
nB_el = nB  # treatment users

price_A = float(df.loc[df["group"] == "control", "price"].mean())
price_B = float(df.loc[df["group"] == "treatment", "price"].mean())

post_elasticity = posterior_demand_elasticity(
    xA=xA_el,
    nA=nA_el,
    xB=xB_el,
    nB=nB_el,
    price_A=price_A,
    price_B=price_B,
    alpha0=1.0,
    beta0=1.0,
    n_draws=60_000,
    seed=2025,
)
post_elasticity.describe(percentiles=[0.025, 0.5, 0.975])


In [None]:

# Clean NaNs if any (extreme rare corner cases)
post_elasticity_valid = post_elasticity.dropna(subset=["elasticity"]).copy()

# Posterior summaries
el_mean = float(post_elasticity_valid["elasticity"].mean())
el_median = float(post_elasticity_valid["elasticity"].median())
el_lo, el_hi = np.quantile(post_elasticity_valid["elasticity"], [0.025, 0.975])

# Probability of "elastic" demand in absolute value
prob_elastic = float((post_elasticity_valid["elasticity"].abs() > 1.0).mean())

plt.figure()
plt.hist(post_elasticity_valid["elasticity"], bins=80, density=True)
plt.axvline(0.0, linestyle="--")
plt.title("Posterior distribution of price elasticity of demand")
plt.xlabel("elasticity (ε)")
plt.ylabel("density")
plt.tight_layout()
plt.show()

{
    "elasticity_mean": el_mean,
    "elasticity_median": el_median,
    "elasticity_CI95": (el_lo, el_hi),
    "prob_abs_elasticity_gt_1": prob_elastic,
}



**Interpreting Bayesian demand elasticity**

From the posterior distribution of elasticity:

- Values **< 0** are the usual case: higher prices reduce demand.  
- The **mean / median** give central estimates, but you should look at the **95% credible interval**
  to understand uncertainty.  
- `prob_abs_elasticity_gt_1` is the probability that demand is **elastic in magnitude**
  (|ε| > 1), meaning demand changes more than proportionally to price changes.

This is often more informative for pricing decisions than a single elasticity point estimate:
you can articulate statements like:

> “Given this experiment and our prior, there is a 70% probability that demand is elastic
>  (|ε| > 1), and the 95% credible interval for ε is [−1.8, −0.4].”

You can adapt the prior (`alpha0`, `beta0`) to incorporate historical data on conversion rates
at similar price points, or extend the model to more than two prices via regression.
