<span style="color:#333333; font-size:24px; font-weight:bold"> Compiled by <a href=https://github.com/cyterat style="color:#00b2b7;">cyterat</a></span>

# 1. Estimating a Proportion (e.g., conversion rate, click rate)
__Use case:__ Estimate the proportion of users who click a button, convert, churn, etc. Determine how many users to survey to estimate satisfaction rate within a margin of error.

__Technique:__ Use the normal approximation for binomial proportion.

__Formula:__

## $n = \frac{{Z^2 \cdot p \cdot (1 - p)}}{e^2}$

__Where:__

- 𝑛 = required sample size

- 𝑍 = Z-score for desired confidence level (e.g., 1.96 for 95%)

- 𝑝 = estimated proportion (use 0.5 if unknown for most conservative estimate)

- 𝑒 = desired margin of error



In [None]:
from scipy.stats import norm

def sample_size_proportion(p=0.5, margin_error=0.05, confidence_level=0.95):
    z = norm.ppf(1 - (1 - confidence_level) / 2)
    return int((z**2 * p * (1 - p)) / margin_error**2)

# 2. Estimating a Mean (e.g., time on site, revenue)
__Use case:__ Estimate average user session duration, revenue, or retention time with desired precision.

__Technique:__ Use the normal approximation of the mean.

__Formula:__

## $n = \left( \frac{{Z \cdot \sigma}}{e} \right)^2$

__Where:__

- 𝜎 = estimated standard deviation (from past data or pilot)

- 𝑒 = margin of error


In [None]:
def sample_size_mean(std_dev, margin_error=1.0, confidence_level=0.95):
    z = norm.ppf(1 - (1 - confidence_level) / 2)
    return int((z * std_dev / margin_error) ** 2)

# 3. A/B Testing – Difference in Proportions (Binary Outcomes)
__Use case:__ Detect a difference in conversion rates between two variants. Measure impact of a feature on click-through or signup rate.

__Technique:__ Two-sample Z-test for proportions, assuming equal sample size per group.

__Formula:__

## $n = \frac{2 \cdot (Z_{1-\alpha/2} + Z_{1-\beta})^2 \cdot p(1 - p)}{d^2}$

__Where:__

- 𝑝 = pooled baseline rate (or average of p₁ and p₂)

- 𝑑 = minimum detectable effect (𝑝₂−𝑝₁)

- α = Type I error (e.g., 0.05)

- β = Type II error (e.g., 0.2 for 80% power)


In [None]:
from statsmodels.stats.power import NormalIndPower

def sample_size_ab_test(p1, p2, alpha=0.05, power=0.8):
    effect_size = NormalIndPower()._effect_size_proportions(p1, p2)
    analysis = NormalIndPower()
    return int(analysis.solve_power(effect_size, power=power, alpha=alpha, alternative='two-sided'))


# 4. A/B Testing – Difference in Means (e.g., revenue, time)
__Use case:__ Detect difference in average purchase value, engagement time, or retention days.

__Technique:__ Two-sample t-test for means (Welch’s t-test, unequal variance allowed).

__Effect size definition (Cohen’s d):__

## $d = \frac{\mu_1 - \mu_2}{s_{\text{pooled}}}$

In [None]:
from statsmodels.stats.power import TTestIndPower

def sample_size_diff_means(effect_size, alpha=0.05, power=0.8):
    analysis = TTestIndPower()
    return int(analysis.solve_power(effect_size=effect_size, power=power, alpha=alpha, alternative='two-sided'))

# 5. Finite Population Correction (Optional Adjustment)
__Use case:__ Your population is relatively small (e.g., sampling from 10,000 users). You want more accurate estimates with reduced sample size.

__Formula:__

## $n_{\text{adj}} = \frac{n}{1 + \left(\frac{n - 1}{N}\right)}$

__Where:__ 
𝑁 = population size.

In [None]:
def finite_population_correction(n, population_size):
    return int(n / (1 + (n - 1) / population_size))