### Experimental Setup
After clicking on "start free trial", students would see a screen which asks how many hours a week they can commit to studying. The screen is meant to improve the overall student experience as it would reduce the number of frustrated students because students who respond with fewer than 5 hours a week receive a message saying that Udacity courses usually require more time commitment and that they should access the course materials for free. They then can decide to access the materials for free or sign up for a free trial. For the more serious learners as indicated by studying 5 or more hours a week, they would would be taken through the checkout process as usual and get a free trial. 

The idea is that directing students to appropriate level of learning improves the overall student experience and coaches'capacity to support students.

**Null Hypothesis**: Adding the screen does not make significant change in reducing early course cancellation. 

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who would drop out from the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.



### Experimental Design

#### Metric Choice
**Invariant metrics**: They are metrics that remain invariant throughout the experiment. We use them for sanity checks by testing to see whether these metrics remain the same in both the control and experiment groups. 
- Number of cookies: That is, number of unique cookies to view the course overview page.
- Number of clicks: That is, the number of unique cookies to click the “Start free trial” button (which happens before the free trial screener is a trigger).
- Click-through-probability: That is, number of unique cookies to click the “Start free trial” button divided by number of unique cookies to view the course overview page.

**Evaluation metrics**: They are metrics that we care about. These are the metrics to watch for, to see if they move in the direction that we have theorized prior to the experiment and if the change is sizable enough to continue with the experiment.

- Gross conversion: That is, the number of user-ids to complete checkout and enroll in the free trial divided by the number of unique cookies to click the “Start free trial” button. (dmin= 0.01)
- Retention: That is, the number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01)
- Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the “Start free trial” button. (dmin= 0.0075)


#### Calculating Standard Deviation
Audacity provides a table of baseline values.


| Metric | Baseline Value |
| --- | --- | 
| Unique cookies to view course overview page per day | 40000|
| Unique cookies to click "Start free trial" per day | 3200|
| Enrollments per day | 660 |
| Click-through-probability on "Start free trial" |  0.08 |
| Probability of enrolling, given click | 0.20625 |
| Probability of payment, given enroll | 0.53 |
| Probability of payment, given click | 0.1093125 |

For each metric selected as an evaluation metric, I calculate analytical estimate of its standard deviation, given the sample size of 5000 cookies visiting the course overview page. 

We take that the event of clicking/paying follows the binomial distribution. 
The formula to estimate the standard deviation of success is below. 

Standard Deviation of $\hat{P}$ = $\sqrt{\frac{P(1-P)}{N}}$

- Standard deviation for Estimated Probability of Gross conversion is equal to $\sqrt{\frac{0.20625(1-0.20625)}{5000*0.08}}$ = 0.0202
- Standard deviation for Estimated Probability of Retention is equal to $\sqrt{\frac{0.53(1-0.53)}{5000*(660/40000)}}$ = 0.0549
- Standard deviation for Estimated Probability of Net Conversion is equal to $\sqrt{\frac{0.1093125(1-0.0.1093125 )}{5000*0.08)}}$ = 0.0156

In [172]:
# Standard deviation for Estimated Probability of Gross conversion  
# N = number of unique cookies to click the “Start free trial” button where we have 5000 cookies and probability of them 
# clicking "Start free trial" is 0.08
# and the probablity of them clicking on "Start free trial" is 0.08 and P = 0.20625.
# we'll use the fraction or proportion of successes so we'll divide by n
from scipy.stats import binom
n = 5000*0.08
p = 0.20625
print((binom.std(n,p))/n)


0.020230604137049392


In [173]:
# Standard deviation for Estimated Probability for Retention
# To calculate N, we'll need to multiply 5000 cookies by baseline fraction of the cookies enrolling, which is 5000*(660/40000)
# P = 0.53
# we'll use the fraction or proportion of successes so we'll divide by n
from scipy.stats import binom
n = 5000*(660/40000)
p = 0.53
print((binom.std(n,p))/n)

0.05494901217850908


In [174]:
# Standard deviation for Estimated Probablity of Net Conversion 
# N = number of unique cookies to click the “Start free trial” button where we have 5000 cookies and probability of them 
# clicking "Start free trial" is 0.08
# P = 0.1093125
n = 5000*(0.08)
p = 0.1093125
print((binom.std(n,p))/n)

0.015601544582488459


### Determination of the Sample Size
We'll calculate the sample size for the selected evaluation metrics using $\alpha$ = 0.05 and $\beta$ = 0.20.

We'll use the [online sample size calculator](https://www.evanmiller.org/ab-testing/sample-size.html). 
- Clicks required for Gross Conversion: 25,835
    -  Baseline conversion rate: 0.20625, mininum detectable effect (dmin = 0.01)
- Clicks required for Retention: 39115
    - Baseline conversion rate: 0.53, minimum detectable effect (dmin = 0.01)
- Clicks for Net Conversion: 27,413
    - Baseline conversion rate: 0.1093125 , minimum detectable effect(dmin= 0.0075)
    
With the control and experiment groups, we'll need to multiply the numbers above by 2 and we'll adjust with the click to pageview rate.

| Metrics | # Pageviews Required |
| --- | --- | 
|Gross Conversion |(288535 X 2)/0.08 = 645875|
|Retention|(39115 X 2)/(660/40000) = 4741212|
|Net Conversion|(27413 X 2)/0.08 = 685325|

To test all three metrics, we'll need 4,741,212 pageviews. 

### Duration and Exposure
Keeping number of pageviews required to detect an effect in mind, we need to think about how long we need to run the experiment. 

There are 40,000 pageviews a day, that would take about 119 (4741212/40000) days if we get all the traffic for this experiment. Since the retention evaluation metric requires much more power, we can think about testing only the **gross conversion** and **net conversion** metrics. This would bring the experiment days to about 17 (6855325/40000) at 100% diversion and about 36 days at 50% diversion. We can divert more traffic and have it between 17 and 36 days and it'll still be long enough to observe the 14 day enrollment. 



### Experiment Analysis
The data for analysis is [here](https://docs.google.com/spreadsheets/d/1wTXzNVkMgVG2bzTGJeQ95jbMlfcAiVrQBRANzUdHGe0/edit?usp=sharing)
#### Sanity Checks

Sanity checks are tests to ensure that the invariant metrics are statistically equivalent in the control and experiment groups. Our invariant metrics are number of cookies, number of clicks, and click-through_probability. We test for equal diversion at the 95% confidence interval. 

See below for detail calculations. Margin of error for 95% confidence interval (z = 1.96). The observed values are within the corresponding confidence intervals. The result is that these invariant metrics pass the sanity checks. 

| Metrics | Expected Value | Observed Value | CI Lower Bound| CI Upper Bound | Result|
| --- | --- | --- | --- | --- | --- |
|Number of Pageviews/Cookies|0.5|0.5006|0.4988|0.5012| Pass|
|Number of clicks on "Start free trial"|0.5|0.5005|0.4959|0.5041|Pass|
|Click-through-Probablity|0|0.00005|-0.0012|0.0013|Pass|


In [175]:
import pandas as pd
import numpy as np

In [176]:
df_control = pd.read_csv("ABTesting_ControlData.csv")
df_experiment = pd.read_csv("ABTesting_ExperimentData.csv")
df_control.head()
df_experiment.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [177]:
# We'll append the two dataframes together
df_control['Group'] = "Control"
df_experiment['Group'] = "Experiment"
df_control.head()
df_experiment.head()
df_control.Clicks.sum()
df_alldata = df_control.append(df_experiment)
df_alldata

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,Group
0,"Sat, Oct 11",7723,687,134.0,70.0,Control
1,"Sun, Oct 12",9102,779,147.0,70.0,Control
2,"Mon, Oct 13",10511,909,167.0,95.0,Control
3,"Tue, Oct 14",9871,836,156.0,105.0,Control
4,"Wed, Oct 15",10014,837,163.0,64.0,Control
...,...,...,...,...,...,...
32,"Wed, Nov 12",10042,802,,,Experiment
33,"Thu, Nov 13",9721,829,,,Experiment
34,"Fri, Nov 14",9304,770,,,Experiment
35,"Sat, Nov 15",8668,724,,,Experiment


In [178]:
# Pageviews
aggs_pageviews = df_alldata.groupby('Group')['Pageviews'].agg([np.sum, np.mean, np.std, np.var])

diff_pageviews =  df_control.Pageviews.sum()/df_alldata.Pageviews.sum()
print(aggs_pageviews)
# Pageviews Standard error
# We expect that the total number of pageviews in the control group and the experiment 
# group each account for 50% of the total number of cookies.
p = 0.5
SE_pageviews = np.sqrt((p * (1- p))/df_alldata.Pageviews.sum())

## margin of error for 95% confidence interval (z = 1.96)

ME_pageviews = SE_pageviews * 1.96
CI_upper_pageview = p + ME_pageviews
CI_lower_pageview = p - ME_pageviews

print( diff_pageviews, SE_pageviews, ME_pageviews, CI_lower_pageview, CI_upper_pageview)

               sum         mean         std            var
Group                                                     
Control     345543  9339.000000  740.239563  547954.611111
Experiment  344660  9315.135135  708.070781  501364.231231
0.5006396668806133 0.0006018407402943247 0.0011796078509768765 0.49882039214902313 0.5011796078509769


In [179]:
# Clicks
aggs_clicks = df_alldata.groupby('Group')['Clicks'].agg([np.sum, np.mean, np.std, np.var])
print(aggs_clicks)
diff_clicks =  df_control.Clicks.sum()/df_alldata.Clicks.sum()

# Clicks Standard error
df_alldata.Pageviews.sum()
# We expect that the total number of clicks in the control group and the experiment 
# group each account for 50% of the total number of cookies.
p = 0.5
SE_clicks = np.sqrt((p * (1- p))/df_alldata.Clicks.sum())
## margin of error for 95% confidence interval (z = 1.96)
ME_clicks = SE_clicks * 1.96
CI_upper_clicks = p + ME_clicks
CI_lower_clicks = p - ME_clicks

print( diff_clicks, ME_clicks, SE_clicks, CI_lower_clicks, CI_upper_clicks)

              sum        mean        std          var
Group                                                
Control     28378  766.972973  68.286767  4663.082583
Experiment  28325  765.540541  64.578374  4170.366366
0.5004673474066628 0.0041155042762105335 0.002099747079699252 0.49588449572378945 0.5041155042762105


To perform sanity check for the Click Through Probability, we would expect that the difference between the two groups be zero.
Standard Deviation of two porportions = $\sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}$


In [180]:
# click through probability (clicks/cookies)

## control value 
cont_p_hat = df_control.Clicks.sum()/df_control.Pageviews.sum()
print(cont_p1_hat)

## experimental value
exp_p_hat = df_experiment.Clicks.sum()/df_experiment.Pageviews.sum()
print(exp_p_hat)

## observed difference
diff_ClickProb = exp_p_hat - cont_p_hat

# We expect the difference to be 0. Let's see if this is the case.

## Standard Error

SE_ClickProb = np.sqrt((cont_p_hat*(1-cont_p_hat)/df_control.Pageviews.sum()) + (exp_p_hat*(1-exp_p_hat))/df_experiment.Pageviews.sum())

## margin of error for 95% confidence interval (z = 1.96)

ME_ClickProb = SE_ClickProb * 1.96

## CI
upper_ClickProb = (exp_p_hat - cont_p_hat) + ME_ClickProb
lower_ClickProb = (exp_p_hat - cont_p_hat) - ME_ClickProb
print(diff_ClickProb, ME_ClickProb, SE_ClickProb, upper_ClickProb, lower_ClickProb)

0.08212581357457682
0.08218244066616376
5.662709158693602e-05 0.0012956797119073678 0.0006610610775037591 0.0013523068034943038 -0.0012390526203204318


#### Effect Size Tests
A metric is statistically significant if the confidence interval does not include 0 (that is, you can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, you can be confident there is a change that matters to the business.)

We'll check on the gross conversion and net conversion metrics.
Please see below for detailed calculations.

| Metrics | dmin | Observed Difference | CI Lower Bound| CI Upper Bound | Result|
| --- | --- | --- | --- | --- | --- |
|Gross Conversion|0.01|-0.0205|-0.0291|-0.012|Statistically and practically significant|
|Net Conversion|0.0075|-0.0048|-0.0116|0.0019|Neither Statisitical nor practically significant|

In [181]:
# Gross conversion
"""
Gross conversion: That is, the number of user-ids to complete checkout and 
enroll in the free trial divided by the 
number of unique cookies to click the “Start free trial” button. (dmin= 0.01)
"""

## control value 
cont_grossconversion= df_control.Enrollments.sum()/df_control.loc[df_control['Enrollments'].notnull(), 'Clicks'].sum()

## experimental value
exp_grossconversion= df_experiment.Enrollments.sum()/df_experiment.loc[df_experiment['Enrollments'].notnull(), 'Clicks'].sum()


## observed difference
diff_grossconversion = exp_grossconversion - cont_grossconversion

## Standard Error

SE_grossconversion = np.sqrt((cont_grossconversion*(1-cont_grossconversion)/df_control.loc[df_control['Enrollments'].notnull(), 'Clicks'].sum()) + (exp_grossconversion*(1-exp_grossconversion))/df_experiment.loc[df_experiment['Enrollments'].notnull(), 'Clicks'].sum())

## margin of error for 95% confidence interval (z = 1.96)

ME_grossconversion = SE_grossconversion * 1.96

## CI
upper_grossconversion = (exp_grossconversion - cont_grossconversion) + ME_grossconversion
lower_grossconversion = (exp_grossconversion - cont_grossconversion) - ME_grossconversion
print(diff_grossconversion, ME_grossconversion, SE_grossconversion, upper_grossconversion, lower_grossconversion)


-0.020554874580361565 0.008565445227686982 0.004370125116166828 -0.011989429352674583 -0.029120319808048547


In [182]:

# Net Conversion

""""
Net conversion: That is, number of user-ids to remain enrolled 
    past the 14-day boundary (and thus make at least one payment) 
    divided by the number of unique cookies to click the 
    “Start free trial” button. (dmin= 0.0075)
"""


## control value 
cont_netconversion= df_control.Payments.sum()/df_control.loc[df_control['Enrollments'].notnull(), 'Clicks'].sum()

## experimental value
exp_netconversion= df_experiment.Payments.sum()/df_experiment.loc[df_experiment['Enrollments'].notnull(), 'Clicks'].sum()


## observed difference
diff_netconversion = exp_netconversion - cont_netconversion

## Standard Error

SE_netconversion = np.sqrt((cont_netconversion*(1-cont_netconversion)/df_control.loc[df_control['Enrollments'].notnull(), 'Clicks'].sum()) + (exp_netconversion*(1-exp_netconversion))/df_experiment.loc[df_experiment['Enrollments'].notnull(), 'Clicks'].sum())

## margin of error for 95% confidence interval (z = 1.96)

ME_netconversion = SE_netconversion * 1.96

## CI
upper_netconversion = (exp_netconversion - cont_netconversion) + ME_netconversion
lower_netconversion = (exp_netconversion - cont_netconversion) - ME_netconversion
print(diff_netconversion, ME_netconversion, SE_netconversion, upper_netconversion, lower_netconversion)


-0.0048737226745441675 0.006730587137842911 0.0034339730295116894 0.0018568644632987437 -0.011604309812387078


#### Sign Test
The sign test is another way to validate the results above. I used this online calculator [online calculator](https://www.graphpad.com/quickcalcs/binomial1/).


| Metrics | # of successes| # of trials per experiment | p-value for sign test| Statisically significant with $\alpha$ = 0.05 | 
| --- | --- | --- | --- | --- | 
|Gross Conversion|4|23|0.0026|Yes|
|Gross Conversion|10|23|0.6776|No|

Please see the [Sign Test File](https://docs.google.com/spreadsheets/d/1q7lq2hec5O4jtIIcgW1lU6fx2e2AGTnC/edit?usp=sharing&ouid=108342310459873484202&rtpof=true&sd=true) for more details.


### Recommendations
This experiment introduced a screener to help filter out less dedicated students. Our results show that Gross Conversion will be reduced significantly. However, there are no significant changes in Net Conversion. The screener reduces the enrollment of less serious students. However, it doesn't affect those who would make the payments. I would not recommend launching this screener.