### A/B testing case study

Recent history shows that there are about 3250 unique visitors per day, with slightly more visitors on Friday through Monday than the rest of the week. There are about 520 software downloads per day (a .16 rate) and about 65 licenses purchased each day (a .02 rate). In an ideal case, both the download rate and license purchase rate should increase with the new homepage; a statistically significant negative change should be a sign to not deploy the homepage change. However, if only one of our metrics shows a statistically significant positive change we should be happy enough to deploy the new homepage.

### Detecting Change in Downloads
Let's say that we want to detect an increase of 50 downloads per day (up to 570 per day, or a .175 rate). How many days of data would we need to collect in order to get enough visitors to detect this new rate at an overall 5% Type I error rate and at 80% power?

In [24]:
# write function that calculates total observations and days to run experiment

def experiment_days(p_null, p_alt, alpha = 0.05, beta = 0.2, obs_per_day = 3250):
    """
    This function calculates the number of days required to run an A/B test

    Parameters:
    null (float): the conversion rate under the null hypothesis
    alt (float): the conversion rate under the alternative hypothesis
    alpha (float): the significance level
    beta (float): the power of the test
    obs (int): the number of observations per day

    Returns:
    int: the number of days required to run the experiment
    int: the total number of observations required to run the experiment

    """
    import numpy as np
    from scipy import stats

    # defin parameters
    p_null = p_null
    p_alt = p_alt
    alpha = alpha
    beta = beta
    daily_visitors = obs_per_day

    # Adjust alpha for Bonferroni correction
    alpha = alpha / 2

    # Calculate Z-scores
    z_null = stats.norm.ppf(1 - alpha)
    z_alt = stats.norm.ppf(beta)

    # Calculate standard deviations
    sd_null = np.sqrt(p_null * (1 - p_null) + p_null * (1 - p_null))
    sd_alt = np.sqrt(p_null * (1 - p_null) + p_alt * (1 - p_alt))

    # Calculate effect size
    effect_size = p_alt - p_null

    # Calculate the sample size required for each group
    n_per_group = ((z_null * sd_null - z_alt * sd_alt) / effect_size) ** 2

    # Total number of samples needed
    total_samples = np.ceil(n_per_group) * 2

    # Calculate the number of days required
    days_required = total_samples / daily_visitors

    print(f"Number of samples needed: {total_samples}")
    print(f"Number of days required: {np.ceil(days_required)}")

    return total_samples, days_required
    

### Detecting Change in Licenses

In [23]:
# define parameters
null = 0.02 # conversion rate under the null hypothesis
alt = 0.023 # conversion rate under the alternative hypothesis
alpha = 0.05 # significance level
beta = 0.2 # power of the test
obs = 3250 # number of observations per day

# run function
samples, days = experiment_days(null, alt, alpha, beta, obs)

Number of samples needed: 69860.0
Number of days required: 22.0


### Detect Statistical Difference in Invariant Metric

Invariant Metric
First, we should check our invariant metric, the number of cookies assigned to each group. If there is a statistically significant difference detected, then we shouldn't move on to the evaluation metrics right away. We'd need to first dig deeper to see if there was an issue with the group assignment procedure, or if there is something about the manipulation that affected the number of cookies observed before we feel secure about analyzing and interpreting the evaluation metrics.

### Import the data

In [30]:
import pandas as pd

# read in the CSV in the appropriate file directory
data = pd.read_csv('../case_study/homepage-experiment-data.csv')

# preview 3 rows of sample data
data.sample(3)

Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
22,23,1631,249,26,1517,272,35
6,7,1534,262,5,1555,276,8
23,24,1489,241,29,1466,279,31


### What is the p-value for the test on the number of cookies assigned to each group?

We will need to create a one-sample z-test on proportions.  This type of test is used when you want to know whether your sample proportion is significantly different from a known population proportion.  In this case, we want to know if the proportion of cookies assigned to each group is significantly different from 0.5.  We will use a one-sample z-test on proportions to test this.

In [78]:
# define parameters for the test
n_control = data['Control Cookies'].sum()
n_experiment = data['Experiment Cookies'].sum()
n_obs = n_control + n_experiment

# sanity check levels before test
print(f"Number of observations: {n_obs}")
print(f"Total Control Cookies: {n_control}")
print(f"Total Experiment Cookies: {n_experiment}")

Number of observations: 94197
Total Control Cookies: 46851
Total Experiment Cookies: 47346


In [79]:
# create reusable function
def one_sample_ztest(p,n_obs,n_control):
    """
    This function calculates the Z-score and p-value for a one-sample z-test

    Parameters:
    p (float): the conversion rate under the null hypothesis
    n_obs (int): the total number of observations
    n_control (int): the total number of control group observations

    Returns:
    float: the Z-score
    float: the p-value

    """
    import numpy as np
    from scipy import stats

    # Calculate the standard deviation
    sd = np.sqrt(p * (1 - p) * n_obs)

    # Calculate the Z-score
    z = ((n_control + 0.5) - p * n_obs) / sd

    # Calculate the p-value
    p_value = 2 * stats.norm.cdf(z)

    return z, p_value

In [80]:
# run test
z_score, p_value = one_sample_ztest(null, n_obs, n_control)

# print results
print(f"Z-score: {z_score}")
print(f"P-value: {p_value}")

Z-score: -1.6095646049678511
P-value: 0.10749294050130412


### What is the p-value for the test on the download rate between groups?

We will need to execute a two proportions z test, first we will define parameters for the test, then create a reusable function

In [84]:
# define parameters for the test

# get the counts for cookies in the control and experiment groups
control_cookies = data['Control Cookies'].sum()
experiment_cookies = data['Experiment Cookies'].sum()
total_cookies = control_cookies + experiment_cookies

# get the counts for downloads in the control and experiment groups
control_downloads = data['Control Downloads'].sum()
experiment_downloads = data['Experiment Downloads'].sum()
total_downloads = control_downloads + experiment_downloads

In [95]:
# create resusable function for tests
def two_proportion_z_test(control_downloads, experiment_downloads, control_cookies, experiment_cookies):
    """
    This function calculates the Z-score and p-value for a two-proportion z-test

    Parameters:
    control_downloads (int): the number of downloads in the control group
    experiment_downloads (int): the number of downloads in the experiment group
    control_cookies (int): the number of cookies in the control group
    experiment_cookies (int): the number of cookies in the experiment group

    Returns:
    float: the Z-score
    float: the p-value

    """
    # import required libraries
    from scipy import stats
    import numpy as np


    # calculate the overall conversion rate
    p = (control_downloads + experiment_downloads) / (control_cookies + experiment_cookies)


    # calculate the Z-score
    z = (experiment_downloads - p * experiment_cookies) / np.sqrt(p * (1 - p) * experiment_cookies)
    
    # calculate the p-value
    p_value = 1 - stats.norm.cdf(z)

    print(f"Z-score: {z}")
    print(f"P-value: {p_value}")

    return z, p_value

In [96]:
# execute function
z, p_value = two_proportion_z_test(control_downloads, experiment_downloads, control_cookies, experiment_cookies)

Z-score: 5.5508773906714515
P-value: 1.4211969712185635e-08


### Checking the Evaluation Metric II
What is the p-value for the test on the license purchasing rate between groups?

First we will define parameters for the test, then we will create a reusable function
to execute

In [97]:
# define parameters for the test

# number of observations in control group (21 days of 28)
n_control_21 = data.query('Day < 22')['Control Cookies'].sum()

# number of conversions for licenses in control group
n_control_licenses = data['Control Licenses'].sum()

# conversion rate for licenses in control group
p_control_licenses = n_control_licenses / n_control_21

# number of observations in experiment group (21 days of 28)
n_experiment_21 = data.query('Day < 22')['Experiment Cookies'].sum()

# number of conversions for licenses in experiment group
n_experiment_licenses = data['Experiment Licenses'].sum()

# conversion rate for licenses in experiment group
p_experiment_licenses = n_experiment_licenses / n_experiment_21

print(f"Conversion rate for control group: {p_control_licenses :.3f}")
print(f"Conversion rate for experiment group: {p_experiment_licenses :.3f}")


Conversion rate for control group: 0.021
Conversion rate for experiment group: 0.021


In [98]:
# execute two proportion z-test
z, p_value = two_proportion_z_test(n_control_licenses, n_experiment_licenses, n_control_21, n_experiment_21)

Z-score: 0.1821302274513653
P-value: 0.4277402636830543


### The next function calculates the standard error differently.  There is a small difference in output between the two functions

In [102]:
# create reusable function for tests
def ztest(n_control_licenses, n_experiment_licenses, n_control_21, n_experiment_21):
    """
    This function calculates the Z-score and p-value for a two-proportion z-test

    Parameters:
    control_downloads (int): the number of downloads in the control group
    experiment_downloads (int): the number of downloads in the experiment group
    control_cookies (int): the number of cookies in the control group
    experiment_cookies (int): the number of cookies in the experiment group

    Returns:
    float: the Z-score
    float: the p-value

    """
    # import required libraries
    from scipy import stats
    import numpy as np


    # calculate the overall conversion rate
    p = (n_control_licenses + n_experiment_licenses) / (n_control_21 + n_experiment_21)

    # calculate the standard error
    se = np.sqrt(p * (1 - p) * (1 / n_control_21 + 1 / n_experiment_21))

    # caluculate the Z-score
    z = (p_experiment_licenses - p_control_licenses) / se

    # calculate the p-value
    p_value = 1 - stats.norm.cdf(z)

    print(f"Z-score: {z}")
    print(f"P-value: {p_value}")

    return z, p_value

In [104]:
# execute function
z, p_value = ztest(n_control_licenses, n_experiment_licenses, n_control_21, n_experiment_21)


Z-score: 0.2586750111658684
P-value: 0.3979430008399871
