# A/B/n Testing Statistical Significance Calculator
In the following scripts you will find how to calculate if the observed change has statistical significance.

#### **1**. ***A/B/n Test Statistical Calculator for Discrete Variables***
Use only for discrete variables, such as:
- **UI change**: Test different layouts of a landing page to see which one increases user engagement, measured as the ad-to-cart click though rate
- **Email marketing**: Test different email subject lines or content to determine which version maximizes the email opening by the recipient, or webinar subscription.
- **Feature rollout**: Measure the impact of different user interface changes on impulsive purchases.
- **Pricing changes**: Evaluate the effect of different pricing structures on getting the sale.

Needs to have normal distribution. Otherwise another method should be used.

    ##### Create a function to calculate statistical significance of control groups and tests

In [25]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

# Create a function to calculate the sample size of tests with discrete values
def analyze_abn_test_discrete(data):
    """
        # Explanation:
            data: A dictionary following the following formatting:
                    data = {
                'Control': [387, 4113],     # [clicks, total views]
                'Test 1': [30, 216],        # [clicks, total views]
                'Test 2': [90, 500],        # [clicks, total views]
                'Test 3': [60, 300]         # [clicks, total views]
            }

            - z_stat: The z-statistic calculated to compare the two groups.
            - p_value: The p-value corresponding to the z-statistic, used to assess statistical significance.


        # Key points to consider:
            - If the p-value is smaller than 0.05, you reject the null hypothesis and conclude that there is a significant difference between the groups.
            - If the p-value is greater than 0.05, you fail to reject the null hypothesis and conclude that there is no significant difference.

        # Example data: successes and total sample sizes for control and test 1
            data = {
                'Control': [387, 4113],     # [clicks, total views]
                'Test 1': [30, 216]         # [clicks, total views]
                }

            Result:
                Conversion rates:
                    * Control: 9.41%
                    * Test 1: 13.89%

                Comparison of Control vs. Test 1:
                    * z-statistic: 2.1751
                    * p-value: 0.0296
                    * Test 1 shows a significant improvement over control by 47.61%.

                The best-performing test is **Test 1** with a conversion rate of 13.89%.


    """
    # Calculate click-through rates for each test
    ctr = {test: clicks / total for test, (clicks, total) in data.items()}
    print("Conversion rates:")
    for test, rate in ctr.items():
        print(f"    * {test}: {rate:.2%}")
    
    # Conduct pairwise z-tests between each test and the control
    control_clicks, control_total = data['Control']
    for test, (clicks, total) in data.items():
        if test != 'Control':
            count = np.array([clicks, control_clicks])
            nobs = np.array([total, control_total])
            z_stat, p_value = proportions_ztest(count, nobs)
            
            improvement = (ctr[test] - ctr['Control']) / ctr['Control'] * 100
            
            print(f"\nComparison of Control vs. {test}:")
            print(f"    * z-statistic: {z_stat:.4f}")
            print(f"    * p-value: {p_value:.4f}")
            if p_value < 0.05:
                if improvement > 0:
                    print(f"    * {test} shows a significant improvement over control by {improvement:.2f}%.")
                else:
                    print(f"    * {test} shows a significant worsening over control by {improvement:.2f}%.")
            else:
                print(f"    * {test} does not show a significant improvement over control.")
    
    # Find the best-performing test by conversion
    best_test = max(ctr, key=ctr.get)
    print(f"\nThe best-performing test is **{best_test}** with a conversion rate of {ctr[best_test]:.2%}.")



    ##### Input the tests results, and run the function

In [31]:
    #### Input parameters to calculate your sample size ####
data = {
    'Control': [387, 4113],     # [clicks, total views]
    'Test 1': [30, 216],        # [clicks, total views]
    'Test 2': [90, 500],        # [clicks, total views]
    'Test 3': [60, 300]         # [clicks, total views]
    }
            
    #### Run the function and print results ####
analyze_abn_test_discrete(data)


Conversion rates:
    * Control: 9.41%
    * Test 1: 13.89%
    * Test 2: 18.00%
    * Test 3: 20.00%

Comparison of Control vs. Test 1:
    * z-statistic: 2.1751
    * p-value: 0.0296
    * Test 1 shows a significant improvement over control by 47.61%.

Comparison of Control vs. Test 2:
    * z-statistic: 5.9572
    * p-value: 0.0000
    * Test 2 shows a significant improvement over control by 91.30%.

Comparison of Control vs. Test 3:
    * z-statistic: 5.8696
    * p-value: 0.0000
    * Test 3 shows a significant improvement over control by 112.56%.

The best-performing test is **Test 3** with a conversion rate of 20.00%.


#### **2**. ***A/B/n Test Statistical Calculator for Continuous Variables***

Use only for continuous variables, such as:
- **UI change**: Test different layouts of a landing page to see which one increases user engagement, measured as the average time spent on the page.
- **Email marketing**: Test different email subject lines or content to determine which version maximizes the time recipients spend reading the email.
- **Feature rollout**: Measure the impact of different user interface changes on user session duration.
- **Pricing changes**: Evaluate the effect of different pricing structures on the time users spend on pricing pages or the checkout process.

Needs to have normal distribution. Otherwise another method should be used.

    ##### Create a function to calculate statistical significance of control groups and tests

In [29]:
import math
from scipy import stats

# Create a function to calculate the sample size of tests with continous values
def analyze_abn_test_continous(control_mean, control_std, control_n, variations_data):
    """
    Compares the performance of variations against a control group for a continuous variable.
    Uses a two-tailed t-test with known means and standard deviations.
    
    Parameters:
    control_mean (float): Mean of the control group.
    control_std (float): Standard deviation of the control group.
    control_n (int): Sample size of the control group.
    variations_data (list of tuples): Each tuple contains (mean, std_dev, n) for each variation.
    

    # Example data: successes and total sample sizes for control and test 1
        control_mean = 50  # Control group mean
        control_std = 10   # Control group standard deviation
        control_n = 1000   # Control group sample size

        variations_data = [
            (52, 10, 800)  # Test 1 (mean, std, sample size)

    Result:
        Control Group Mean: 50.0000, Standard Deviation: 10.0000, Sample Size: 1000

        Test 1 Mean: 52.0000, Standard Deviation: 10.0000, Sample Size: 800
            * t-statistic: 4.2164
            * p-value: 0.0000
            * Statistically significant difference: 2.0000

        The best performing variation is Test 1, with a mean improvement of 2.0000.
    """
    
    print(f"Control Group Mean: {control_mean:.4f}, Standard Deviation: {control_std:.4f}, Sample Size: {control_n}")
    
    best_p_value = 1  # Start with an initial large p-value
    best_difference = 0  # Best observed difference in means
    best_variant = None  # Best performing test

    # Iterate over all variations
    for idx, (mean_variation, std_variation, n_variation) in enumerate(variations_data):
        # Calculate the standard error
        se = math.sqrt((control_std**2 / control_n) + (std_variation**2 / n_variation))
        
        # Calculate the t-statistic
        t_stat = (mean_variation - control_mean) / se
        
        # Calculate the degrees of freedom (using Welch-Satterthwaite equation)
        df = ( (control_std**2 / control_n + std_variation**2 / n_variation)**2 ) / \
            ( ( (control_std**2 / control_n)**2 / (control_n - 1) ) + ( (std_variation**2 / n_variation)**2 / (n_variation - 1) ) )
        
        # Calculate the p-value for a two-tailed test
        p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df=df))
        
        # Print results for this variation
        print(f"\nTest {idx + 1} Mean: {mean_variation:.4f}, Standard Deviation: {std_variation:.4f}, Sample Size: {n_variation}")
        print(f"    * t-statistic: {t_stat:.4f}")
        print(f"    * p-value: {p_value:.4f}")
        
        # Check if this variant shows a significant difference from the control
        if p_value < 0.05:
            mean_difference = (mean_variation - control_mean)
            print(f"    * Statistically significant difference: {mean_difference:.4f}")
            
            # Track the best variant based on the absolute difference
            if abs(mean_difference) > abs(best_difference):
                best_difference = mean_difference
                best_variant = idx + 1
        else:
            print("    * No significant difference compared to the control group.")

    # Conclusion: Best performing variant
    if best_variant is not None:
        print(f"\nThe best performing variation is **Test {best_variant}**, with a mean improvement of {best_difference:.4f}.")
    else:
        print("\nNo variation significantly improved over the control group.")


    ##### Input the tests results, and run the function

In [30]:
    #### Input parameters to calculate your sample size ####
control_mean = 50  # Control group mean
control_std = 10   # Control group standard deviation
control_n = 1000   # Control group sample size

variations_data = [
    (52, 10, 800),  # Test 1 (mean, std, sample size)
    (60, 50, 1000),
    (48, 50, 600)
]

    #### Run the function and print results ####
analyze_abn_test_continous(control_mean, control_std, control_n, variations_data)

Control Group Mean: 50.0000, Standard Deviation: 10.0000, Sample Size: 1000

Test 1 Mean: 52.0000, Standard Deviation: 10.0000, Sample Size: 800
    * t-statistic: 4.2164
    * p-value: 0.0000
    * Statistically significant difference: 2.0000

Test 2 Mean: 60.0000, Standard Deviation: 50.0000, Sample Size: 1000
    * t-statistic: 6.2017
    * p-value: 0.0000
    * Statistically significant difference: 10.0000

Test 3 Mean: 48.0000, Standard Deviation: 50.0000, Sample Size: 600
    * t-statistic: -0.9682
    * p-value: 0.3333
    * No significant difference compared to the control group.

The best performing variation is **Test 2**, with a mean improvement of 10.0000.
