# A/B/n Testing Sample Size Calculator
In the following scripts you will find different ways to calculate in Python the minimum sample size, and when to use each.

#### **1**. ***A/B/n Test Size for Discrete Variables***
Use only for discrete variables, such as:
- **UI change**: Test different layouts of a landing page to see which one increases user engagement, measured as the ad-to-cart click though rate
- **Email marketing**: Test different email subject lines or content to determine which version maximizes the email opening by the recipient, or webinar subscription.
- **Feature rollout**: Measure the impact of different user interface changes on impulsive purchases.
- **Pricing changes**: Evaluate the effect of different pricing structures on getting the sale.


Returns the sample size required if you have a control group and one or multiple tests. You can determine the allocation rate of the control group, and the tests you want to perform. This is used to ensure that a failed test doesn't affect too much the expected results of the business.

If you want that both control and test group have the same views, use the same allocation rate for each (50%, 50%).

In [1]:
# Import library
from scipy.stats import norm

# Create a function to calculate the sample size of tests with discrete values
def calculate_sample_size_abn_test_discrete(baseline_rate, minimum_effect, allocation_ratios, alpha=0.05, power=0.8):

    """
    Calculate sample size for an A/B/n test with discrete variables and custom allocation ratios.

    Parameters:
        - baseline_rate (float): Known/ extimated average value of the metric being measured (known baseline conversion rate of control)
        - minimum_effect (float): The smallest change in baseline_rate that you want to be able to detect with the test (2% increase in clik-though rate))
        - allocation_ratios (list): List of allocation ratios for each group (e.g., [0.75, 0.10, 0.15] for 75% control, 10% test_1, 15% test_2).
        - alpha (float): Original significance level (default 0.05).
        - power (float): Percent of the time the minimum effect size will be detected, assuming it exists (default 0.8).
    
    Returns:
        dict: Required sample sizes for control and each test group.

    Use example:
        allocation_ratios = [0.7, 0.20, 0.10]
        baseline_conversion_rate = 0.1
        minimum_detectable_effect = 0.02

    Returns:
        Total sample size required: 25637
        Sample size per group:
        * Control sample size: 19703
        * Test_1 sample size: 3120
        * Test_2 sample size: 2814
    """

    # Ensure allocation_ratios is a list
    if not isinstance(allocation_ratios, list):
        raise TypeError("allocation_ratios must be a list of floats representing the allocation proportions.")

    # Make sure that the total percentages of the group ratios equals 1
    if sum(allocation_ratios) != 1:
        raise ValueError(f"The sum of the ratios is not 100%. Total is {sum(allocation_ratios)} Please update it before continuing.")

    # Number of variants for Bonferroni correction
    num_variants = len(allocation_ratios) - 1        # Excluding the control
    corrected_alpha = alpha / num_variants      # Adjust alpha for multiple comparisons

    # Z-scores for corrected alpha and power
    z_alpha = norm.ppf(1 - corrected_alpha / 2)     # Two-sided test with Bonferroni correction
    z_beta = norm.ppf(power)
    
    # Conversion rate for each variant (baseline + minimum detectable effect)
    variant_rate = baseline_rate + minimum_effect

    # Calculate total variance for unequal split
    sample_sizes = {}
    control_ratio = allocation_ratios[0]
    control_variance = baseline_rate * (1 - baseline_rate) / control_ratio

    # Calculate the required sample size for each variant
    for i, ratio in enumerate(allocation_ratios[1:], start=1):       # Skip control group
        variant_variance = variant_rate * (1 - variant_rate) / ratio
        numerator = (z_alpha + z_beta) ** 2 * (control_variance + variant_variance)
        denominator = (variant_rate - baseline_rate) ** 2
        total_sample_size = numerator / denominator
        sample_sizes[f'Test_{i}'] = int(total_sample_size * ratio)

    # Control group sample size
    sample_sizes['Control'] = int(total_sample_size * control_ratio)

    # Total sample size across all groups
    total_sample_size = sum(sample_sizes.values())

    # Sort the key sample_sizes dictionary so "Control" appears first
    sample_sizes = dict(sorted(sample_sizes.items()))

    return total_sample_size, sample_sizes

In [2]:
    #### Input parameters to calculate your sample size ####
allocation_ratios = [0.7, 0.20, 0.10]   # Allocation ratio for control test, followed by any number of tests allocations you want to perform
baseline_conversion_rate = 0.1          # Known/ estimated baseline conversion rate of control
minimum_detectable_effect = 0.02        # We want to detect at least 2% change

    #### Run the function ####
total_sample_size, sample_sizes = calculate_sample_size_abn_test_discrete(
    baseline_rate=baseline_conversion_rate,
    minimum_effect=minimum_detectable_effect,
    allocation_ratios=allocation_ratios )

    #### Print results ####
print(f"Total sample size required: {total_sample_size}")
print("Sample size per group:")
for group, size in sample_sizes.items():
        print(f" * {group} sample size: {size}")

Total sample size required: 25637
Sample size per group:
 * Control sample size: 19703
 * Test_1 sample size: 3120
 * Test_2 sample size: 2814


#### **2**. ***A/B/n Test Size for Continuous Variables***

Use only for continuous variables, such as:
- **UI change**: Test different layouts of a landing page to see which one increases user engagement, measured as the average time spent on the page.
- **Email marketing**: Test different email subject lines or content to determine which version maximizes the time recipients spend reading the email.
- **Feature rollout**: Measure the impact of different user interface changes on user session duration.
- **Pricing changes**: Evaluate the effect of different pricing structures on the time users spend on pricing pages or the checkout process.



Returns the sample size required if you have a control group and one or multiple tests. You can determine the allocation rate of the control group, and the tests you want to perform. This is used to ensure that a failed test doesn't affect too much the expected results of the business.

If you want that both control and test group have the same views, use the same allocation rate for each (50%, 50%).

When to Use a Different Approach:
- **Count Data**: If you find that the data on the number of clicks fits better with a Poisson or negative binomial distribution (common with count data), you may need a different approach that leverages statistical methods tailored for count data.
- **Non-Normal Distribution**: If your clicks data is highly skewed or not normally distributed, you might need non-parametric methods or transformations to validate assumptions for the test.

In [3]:
import numpy as np
from statsmodels.stats.power import TTestIndPower

# Create a function to calculate the sample size of tests with contonous values
def calculate_sample_size_abn_test_continuous(minimum_effect, std_control, allocation_ratios, alpha=0.05, power=0.8):
    """
    Calculate sample size for an A/B/n test with continuous variables and custom allocation ratios.

    Parameters:
        minimum_effect (float): Minimum detectable difference in the continuous variable (e.g., time spent).
        std_control (float): Known/ estimated standard deviation of the control group.
        allocation_ratios (list): List of allocation ratios for each group (e.g., [0.75, 0.10, 0.15] for 75% control, 10% test_1, 15% test_2).
        alpha (float): Original significance level (default 0.05).
        power (float): Percent of the time the minimum effect size will be detected, assuming it exists (default 0.8).
    
    Returns:
        dict: Required sample sizes for control and each test group.

    Use example:
        allocation_ratios = [0.75, 0.10, 0.15]
        minimum_effect = 5
        std_control = 15

    Returns:
        Total sample size required: 603
        Sample Sizes per Group:
        * Control sample size: 451
        * Test 1 sample size: 61
        * Test 2 sample size: 91
    """
    # Ensure allocation_ratios is a list
    if not isinstance(allocation_ratios, list):
        raise TypeError("allocation_ratios must be a list of floats representing the allocation proportions.")
    
    
    # Make sure that the total percentages of the group ratios equals 1
    if sum(allocation_ratios) != 1:
        raise ValueError(f"The sum of the ratios is not 100%. Total is {sum(allocation_ratios)} Please update it before continuing.")
    
    # Number of test groups is the number of allocations minus one (control group)
    num_tests = len(allocation_ratios) - 1  # Number of comparisons (test groups vs. control)

    # Adjust alpha for multiple comparisons (Bonferroni correction)
    adjusted_alpha = alpha / num_tests

    # Control group is the first element in allocation_ratios
    control_ratio = allocation_ratios[0]
    test_ratios = allocation_ratios[1:]

    # Initialize power analysis object
    analysis = TTestIndPower()

    # Calculate total sample size required based on control vs. any single test group
    # Using the average test-to-control ratio for the effect size calculation
    avg_test_ratio = np.mean(test_ratios)
    total_sample_size = analysis.solve_power(
        effect_size=minimum_effect / std_control,  # Standardized effect size
        alpha=adjusted_alpha,
        power=power,
        ratio=avg_test_ratio / control_ratio  # Average ratio of test to control
    )

    # Calculate individual group sizes based on allocation ratios
    sample_sizes = [int(np.ceil(total_sample_size * r)) for r in allocation_ratios]

    # Prepare output with control and test groups named accordingly
    result = {'total_sample_size': sum(sample_sizes)}
    result['Control'] = sample_sizes[0]
    for i, n in enumerate(sample_sizes[1:], start=1):
        result[f'Test {i}'] = n

    return result


In [4]:
    #### Input parameters to calculate your sample size ####
allocation_ratios = [0.75, 0.10, 0.15]      # Allocation ratio for control test, followed by any number of tests allocations you want to perform
std_control = 15                            # Kown/ estimated deviation of control group
effect_size = 5                             # We want to detect at least 5 seconds increase


    #### Run the function ####
sample_sizes = calculate_sample_size_abn_test_continuous(effect_size, std_control, allocation_ratios, alpha=0.05, power=0.8)

    #### Print results ####
print(f"Total sample size required: {sample_sizes["total_sample_size"]}")
print("Sample Sizes per Group:")
for group, size in sample_sizes.items():
        if group != "total_sample_size":        # Avoid printing the total sample size again
            print(f" * {group} sample size: {size}")

Total sample size required: 603
Sample Sizes per Group:
 * Control sample size: 451
 * Test 1 sample size: 61
 * Test 2 sample size: 91
