## Experiment

## Experiement Design

### Metric Choice

To select appropriate metrics for our experiment. A funnel diagram shown below is useful:

View the course overview page

              |

Click the “Start free trial button”

              |

Complete checkout

              |

Remain enrolled past the 14-day boundary

Based on the funnel analysis we have the following choice of metrics:

#### Invariant metric
- Number of cookies: That is, number of unique cookies to view the course overview page. (dmin=3000)
- Number of clicks: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
- Click-through-probability: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)
#### Evaluation metric
- Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
- Retention: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01)
- Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)

For all three evaluation metrics, we expect to see them increase, if the tested change does make a difference.

### Measuring Standard Deviation

Since each of our evaluation metric is a probability, we can assume their sampling distributions follow Binomial distributions. We estimate their standard deviation for a sample size of 5000 cookies by using the given baseline data. 

In [1]:
import numpy as np
import pandas as pd

In [3]:
baseline = pd.read_csv("BaselineValues.csv", header=None)
baseline

Unnamed: 0,0,1
0,Unique cookies to view course overview page pe...,40000.0
1,"Unique cookies to click ""Start free trial"" per...",3200.0
2,Enrollments per day:,660.0
3,"Click-through-probability on ""Start free trial"":",0.08
4,"Probability of enrolling, given click:",0.20625
5,"Probability of payment, given enroll:",0.53
6,"Probability of payment, given click",0.109313


The standard deviation of a Binomial distribution is given by sqrt(p(1-p)/N). The tricky part is to calculate the correct N for each evalution metric corresponding to 5000 cookies

In [10]:
# read the table
n = baseline.iloc[0,1]
n_click = baseline.iloc[1,1]
n_enroll = baseline.iloc[2,1]
p_gc = baseline.iloc[4,1]
p_rt = baseline.iloc[5,1]
p_nc = baseline.iloc[6,1]
sample_size = 5000

In [14]:
std_gc = np.sqrt(p_gc*(1-p_gc) / ((n_click/n)*sample_size))
std_nc = np.sqrt(p_nc*(1-p_nc) / ((n_click/n)*sample_size))
std_rt = np.sqrt(p_rt*(1-p_rt) / ((n_enroll/n)*sample_size))

digits = 4
print("STD of gross conversion probability: ", round(std_gc, digits))
print("STD of net conversion probability: ", round(std_nc, digits))
print("STD of retention probability: ", round(std_rt, digits))

STD of gross conversion probability:  0.0202
STD of net conversion probability:  0.0156
STD of retention probability:  0.0549


Since all of our evaluation metrics are probabilities, it's probably safe to calulate their std analytically. 

### Sizing

#### Choosing number of samples given power

In [56]:
from scipy.stats import norm
def get_beta(N, alpha, dmin, s0):
    std = s0 * np.sqrt(2/N)
    z = norm.ppf(1-alpha/2)
    beta = norm.cdf(z*std, dmin, std) - norm.cdf(-z*std, dmin, std)
    return beta

def test_size(alpha, beta, dmin, s0, N0=1):
    """
    alpha:probability of rejecting the null when null is true
    beta: probability of accepting the null when null is false
    dmin: practical significance boundary
    s0: standard deviation when N=1
    N0: start searching size
    
    return: total number of samples required per group
    This method is obviously too slow if the required size is very large. Need to use implemented inverse function to solve N faster.
    """
    N = N0
    while get_beta(N, alpha, dmin, s0) >beta:
        N += 1
    return N

In [61]:
# we impose an overall alpha = 0.05 by using the Bonferroni correction
alpha = 0.05/3
# we require beta = 0.2 for each metric
# the result of each metric needs to be converted to the total number of cookies to be collected
N_gc = test_size(alpha, 0.2, 0.01, np.sqrt(p_gc*(1-p_gc)), N0=n_click) / (n_click/n)
N_nc = test_size(alpha, 0.2, 0.0075, np.sqrt(p_nc*(1-p_nc)), N0=n_click) / (n_click/n)
N_rt = test_size(alpha, 0.2, 0.01, np.sqrt(p_rt*(1-p_rt)), N0=n_enroll) / (n_enroll/n)
# the max number gives the total number of pageviews required for this experiemtn
tot_cookies = int(max(N_gc, N_nc, N_rt)*2)
print("Total number of pageviews needed: ", tot_cookies)

Total number of pageviews needed:  6322181


In [60]:
def analytical_size(alpha, beta, dmin, s0):
    nomin = 2*(norm.ppf(1-alpha/2)+norm.ppf(1-beta))**2
    denom = (dmin/s0)**2
    return nomin/denom # per group

Since the change most likey will affect the number of enrollments and thus the amount of payments, there is definite business
risk so that we shouldn't run the experiement on all traffic. We also do not want the experiment to run for too long, but we need at least 14 days to measure net conversion and retention rate. So maybe max duration equals to 30days is constraint.

In [66]:
max_days = 30
max_cookies = max_days * n
print("Maximum number of total cookies can be collected in 30 days: ", max_cookies)

Maximum number of total cookies can be collected in 30 days:  1200000.0


In [67]:
# now if we look at the number of cookies required for each metric
print(2*N_gc, 2*N_nc, 2*N_rt)

856975.0 906075.0 6322181.818181817


Even we run the experiment with 100% traffic for a month, the samples wouldn't be enough for testing the change of retention rate. So let's try relax imposing overall alpha = 0.05 and increasing the practical significance for retention. 

In [74]:
alpha = 0.05
# we require beta = 0.2 for each metric
# the result of each metric needs to be converted to the total number of cookies to be collected
N_gc = test_size(alpha, 0.2, 0.01, np.sqrt(p_gc*(1-p_gc)), N0=n_click) / (n_click/n)
N_nc = test_size(alpha, 0.2, 0.01, np.sqrt(p_nc*(1-p_nc)), N0=n_click) / (n_click/n)
N_rt = test_size(alpha, 0.2, 0.01, np.sqrt(p_rt*(1-p_rt)), N0=n_enroll) / (n_enroll/n)
print(2*N_gc,2*N_nc,2*N_rt)

642475.0 382100.0 4739878.787878788


It's still not enough for the rentation rate. One option is to remove this metric. Since rentation and net conversation rate are quite similar to each other, I think this is acceptable. 

In [75]:
# if we remove retention metric, the percentage traffic we need to run the experiment is 
print(2*max(N_gc, N_nc) / max_cookies)

0.5353958333333333


## Experiement Analysis

## Follow up experiment