## A/B Testing

Playground to experiment with A/B Testing statistics.

Resources:
- http://vanbelle.org/chapters/webchapter2.pdf
- https://towardsdatascience.com/required-sample-size-for-a-b-testing-6f6608dd330a
- https://towardsdatascience.com/understanding-power-analysis-in-ab-testing-14808e8a1554

In [21]:
import scipy.stats

### Designing a Test

When designing an A/B test, the following factors are considered:

- Null Hypothesis (H0): Design change has no effect on test outcome
- Type I Error ($\alpha$): Probability of rejecting the null hypothesis when it should *not* be rejected (false positive)
- Type II Error ($\beta$): Probability of not rejecting the null hypothesis when it should be rejected (false negative)
- Power = 1 - $\beta$: Probability of rejecting the null hypothesis when it should be rejected (true negative)
- $\sigma^2_0$ and $\sigma^2_1$: Variances under the null and alternative hypotheses (can be the same)
- $\mu_0$ and $\mu_1$: Means under the null and alternative hypotheses

For e-commerce applications, Type I Error means deploying a feature change when it actually has no positive effect on conversion rate. Type II Error means *not* deploying a feature change when it actually does have a positive effect on conversion.

One of the first questions in designing the experiment is "how large should the sample size be"? To determine this, the following factors must be estimated:
1. Minimum Detectable Effect Size
2. Conversion Rate
3. Significance ($\alpha$)
4. Power
5. Sample standard deviations

### Example Calc

Let's try an example from the towardsdatascience article:
- Assume the mean daily conversion rate for the past 6 months is 0.15.
- With the new feature, we expect to see a 3% absolute increase in conversion rate. Thus, for the conversion rate for the treatment group will be 0.18. 
- Assume the sample standard deviation for the two groups is 0.05.
- Assume alpha is 0.05 and power is 0.80 (typical values)

In [22]:
def min_sample_size(alpha, beta, one_tailed, sigma0, sigma1, mu0, mu1):
    """Return minumum sample size for A/B testing"""

    if one_tailed:
        # z-critical value, one-tailed
        z_alpha = scipy.stats.norm.ppf(1-alpha)
    else:
        # z-critical value, two-tailed
        z_alpha = scipy.stats.norm.ppf(1-alpha/2)

    z_beta = scipy.stats.norm.ppf(1-beta)

    var = sigma0**2 + sigma1**2
    delta = abs(mu1 - mu0)
    n = (var * (z_alpha + z_beta)**2) / delta**2
    return n
        

In [23]:
# Inputs
alpha = 0.05
power = 0.8
beta = 1 - power
one_tailed = True
sigma0 = 0.05
sigma1 = 0.05
mu0 = 0.15
mu1 = 0.18

n = min_sample_size(alpha, beta, one_tailed, sigma0, sigma1, mu0, mu1)
print(f'Min sample size: {n:.2f}')

Min sample size: 34.35
