# Udacity AB Testing Final Project

This is the final project for Udacity course A/B Testing.

## 1. Experiment Overview: Free Trial Screener

At the time of this experiment, Udacity courses currently have two options on the course overview page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.


In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This [screenshot](https://drive.google.com/file/d/0ByAfiG8HpNUMakVrS0s4cGN2TjQ/view) shows what the experiment looks like.


The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.


__The unit of diversion is a cookie__, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.


## 2. Experiment design

### 2.1 Metric Choice

Which of the following metrics would you choose to measure for this experiment and why? For each metric you choose, indicate whether you would use it as an invariant metric or an evaluation metric. The practical significance boundary for each metric, that is, the difference that would have to be observed before that was a meaningful change for the business, is given in parentheses. All practical significance boundaries are given as absolute changes.


Any place "unique cookies" are mentioned, the uniqueness is determined by day. (That is, the same cookie visiting on different days would be counted twice.) User-ids are automatically unique since the site does not allow the same user-id to enroll twice.


* Number of cookies: That is, number of unique cookies to view the course overview page. (dmin=3000)
* Number of user-ids: That is, number of users who enroll in the free trial. (dmin=50)
* Number of clicks: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
* Click-through-probability: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)
* Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
* Retention: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01)
* Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)

#### 2.1.1 Invariant Metrics

_Invariant metrics are the metrics that should not change across the control and experiment groups during the experiment._

_* Number of cookies This is the unit of diversion. We expect the cookies are evenly distributed among control and experiment groups. Therefore, the number of cookies could be used as an invariant metric._

_* Number of clicks Since the number of clicks that happens before the free trial screener is triggered, this should not change across control and experiment groups and can be used as an invariant metric._

_* Click-through-probability At this point, the users haven't been affected by the pop-up page yet. We expect the CTP would not change at this stage and can be used as an invariant metric._

#### 2.1.2 Evaluation Metrics

_Evaluation metrics are the metrics that are used to measure the difference across the control and experiment groups during the experiment._

_* Gross conversion. This could be used as an evaluation metric. At this stage, the screener has already popped up. If there are any effects, we could see some change in gross conversion value._

_* Retention. If there is an effect for the experiments group, then it is less likely that the students in the experiment group drop off the course after 14-days simply because they are failed to spend enough time per week on the course. Therefore retention will change and can be used as an evaluation metric._

_* Net conversion. We expect to see those students who don't have enough time to leave before they enroll in the free trial in the experiment group. Those who do not have enough time in the control group may leave during a 14-day free trial and get frustrated. No matter when they decide to leave, the total number of students to remain enrolled past the 14-day trial should not change much for both groups. In other words, we expect there is no significantly reducing the number of students to continue past the free trial and eventually complete the course. With that said, the net conversion should not change significantly in our case._

### 2.2 Measuring Variability
This [spreadsheet](https://docs.google.com/spreadsheets/d/1MYNUtC47Pg8hdoCjOXaHqF-thheGpUshrFA21BAJnNc/edit#gid=0) contains rough estimates of the baseline values for these metrics (again, these numbers have been changed from Udacity's true numbers).


For each metric you selected as an evaluation metric, estimate its standard deviation analytically. Do you expect the analytic estimates to be accurate? That is, for which metrics, if any, would you want to collect an empirical estimate of the variability if you had time?

In [2]:
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='d6948de9-f1e9-45af-96ba-62adf20fe437', project_access_token='p-e04de4bdab58d2dcf36e62171a41997e85dff72f')
pc = project.project_context

import pandas as pd

def get_file_handle(fname):
    # Project data path for the raw data file
    data_path = project.get_file(fname)
    data_path.seek(0)
    return data_path

DATA_PATH = 'Final Project Baseline Values.csv'

data_path = get_file_handle(DATA_PATH)
baseline_data = pd.read_csv(data_path,names = ['Baseline Values'])


baseline_data

Unnamed: 0,Baseline Values
Unique cookies to view course overview page per day:,40000.0
"Unique cookies to click ""Start free trial"" per day:",3200.0
Enrollments per day:,660.0
"Click-through-probability on ""Start free trial"":",0.08
"Probability of enrolling, given click:",0.20625
"Probability of payment, given enroll:",0.53
"Probability of payment, given click",0.109313


In [3]:
# Given sample size of 5000 page views, add a column to refect corresponding cookies to click "star free trail" per day and enrollments per day.

baseline_data['Sample'] = 'NA'

baseline_data.loc['Unique cookies to view course overview page per day:','Sample'] = 5000

baseline_data.loc['Unique cookies to click "Start free trial" per day:','Sample'] = baseline_data.loc['Unique cookies to click "Start free trial" per day:','Baseline Values']/baseline_data.loc['Unique cookies to view course overview page per day:','Baseline Values']*baseline_data.loc['Unique cookies to view course overview page per day:','Sample']

baseline_data.loc['Enrollments per day:','Sample'] = baseline_data.loc['Enrollments per day:','Baseline Values']/baseline_data.loc['Unique cookies to view course overview page per day:','Baseline Values']*baseline_data.loc['Unique cookies to view course overview page per day:','Sample']

baseline_data


Unnamed: 0,Baseline Values,Sample
Unique cookies to view course overview page per day:,40000.0,5000.0
"Unique cookies to click ""Start free trial"" per day:",3200.0,400.0
Enrollments per day:,660.0,82.5
"Click-through-probability on ""Start free trial"":",0.08,
"Probability of enrolling, given click:",0.20625,
"Probability of payment, given enroll:",0.53,
"Probability of payment, given click",0.109313,


In [4]:
import math
# number of cookies to click "Start free trial" per day 
number_cookies = baseline_data.loc['Unique cookies to click "Start free trial" per day:','Sample']
# number of user id who enroll in the free trial 
number_id_enroll = baseline_data.loc['Enrollments per day:','Sample'] 

# Standard deviation of Gross conversion
SE_gc = math.sqrt(0.20625*(1-0.20625)/number_cookies)
# Standard deviation for Retention
SE_re = math.sqrt(0.53*(1-0.53)/number_id_enroll)
# Standard deviation for net conversion
SE_nc = math.sqrt(0.109313*(1-0.109313)/number_cookies)

print('The standard deviation for gross conversion is {}'.format(round(SE_gc,4)))
print('The standard deviation for retention is {}'.format(round(SE_re,4)))
print('The standard deviation for net conversion is {}'.format(round(SE_nc,4)))

The standard deviation for gross conversion is 0.0202
The standard deviation for retention is 0.0549
The standard deviation for net conversion is 0.0156


_I expect analytical variance and empirical variance are close for gross conversion and net conversion because the unit of analysis and unit of diversion are both number of cookies for those two metrics. But the analytical and empirical variances would not match for retention matric because the unit of analysis is the number of user IDs, and the unit of diversion is the number of cookies._

### 2.3 Sizing

#### 2.3.1 Choosing Number of Samples given Power
Using the analytic estimates of variance, how many pageviews total (across both groups) would you need to collect to adequately power the experiment? Use an alpha of 0.05 and a beta of 0.2. Make sure you have enough power for each metric.

_I do not plan to use bonferroni correction for this project because it is usually too convservative. I will use [online sample size calculator](https://www.evanmiller.org/ab-testing/sample-size.html) to do the calculation._

_1. Gross conversion matric_
     
   _Given dmin = 0.01, baseline conversion rate = 0.206250, we will need 25,835 unique cookies to click the "Start free trial" button in each group. The total cookies to click the "Start free trial" button needed for both control and experiment groups are 25835*2 = 51670. Given baseline values that Unique cookies to click "Start free trial" per day/Unique cookies to view course overview page per day = 3,200/40,000 = 0.08. To get 51,670 cookies to click "Start free trial" button, we will need 51,670/0.08 = 645,875 pageviews._
   
_2. Retention matric_

  _Given dmin = 0.01, baseline conversion rate = 0.53, we will need 39115 use-ids to completed checkout. The total user-ids that completed checkout for both groups should be 39115*2 = 78,230. Givien enrollments per day/Unique cookies to view course overview page per day = 660/40000 = 0.0165, we will need 78230/0.0165 = 4,741,213 pageviews._
  
_3 Net conversion matric_
  
  _Givn dmin = 0.075, baseline conversion rate = 0.1093125, we will need 27,413 unique cookies to click the "Start free trial" button. The total unique cookies to click the "Start free trial" button for both groups should be 27,413*2 = 54,826. Given baseline values that Unique cookies to click "Start free trial" per day/Unique cookies to view course overview page per day = 3,200/40,000 = 0.08, we will neeed 54,826/0.08 = 685,325 pageviews._
  

_We need at least 4,741,213 pageviews to adequately power the experiment if we decide to use all three matrics._


#### 2.3.2 Choosing Duration vs. Exposure

What percentage of Udacity's traffic would you divert to this experiment (assuming there were no other experiments you wanted to run simultaneously)? Is the change risky enough that you wouldn't want to run on all traffic?


Given the percentage you chose, how long would the experiment take to run, using the analytic estimates of variance? If the answer is longer than a few weeks, then this is unreasonably long, and you should reconsider an earlier decision.

_If we use all three evaluation metrics, then we need at least 4,741,213 pageviews. Give baseline value of 40,000 pageviews per day, we will need 4,741,213/40,000 = 119 days with 100% traffic exposed. That would be too long and too expensive. To limit the experiment duration within a resonable range, we may left off retention matric, and use the gross conversion and net conversion as evaluation metrics. Then the pageview we will need is 685,325. That will take 685,325/40,000 = 18 days with 100% exposure. With 50% exposure, the duration will be 35 days._

## 3. Experiment Analysis

The data for you to analyze is [here](https://docs.google.com/spreadsheets/d/1Mu5u9GrybDdska-ljPXyBjTpdZIUev_6i7t4LRDfXM8/edit#gid=0). This data contains the raw information needed to compute the above metrics, broken down day by day. Note that there are two sheets within the spreadsheet - one for the experiment group, and one for the control group.


The meaning of each column is:

Pageviews: Number of unique cookies to view the course overview page that day.
Clicks: Number of unique cookies to click the course overview page that day.
Enrollments: Number of user-ids to enroll in the free trial that day.
Payments: Number of user-ids who who enrolled on that day to remain enrolled for 14 days and thus make a payment. (Note that the date for this column is the start date, that is, the date of enrollment, rather than the date of the payment. The payment happened 14 days later. Because of this, the enrollments and payments are tracked for 14 fewer days than the other columns.)

### 3.1 Sanity Checks
Start by checking whether your invariant metrics are equivalent between the two groups. If the invariant metric is a simple count that should be randomly split between the 2 groups, you can use a binomial test as demonstrated in Lesson 5. Otherwise, you will need to construct a confidence interval for a difference in proportions using a similar strategy as in Lesson 1, then check whether the difference between group values falls within that confidence level.


If your sanity checks fail, look at the day by day data and see if you can offer any insight into what is causing the problem.

In [5]:
# define filename
DATA_PATH_control = 'Copy of Final Project Results - Control.csv'
DATA_PATH_experiment = 'Copy of Final Project Results - Experiment.csv'


# Using pandas to read the data 
# Since the `DATE` column consists date-time information, we use Pandas parse_dates keyword for easier data processing
data_path_control = get_file_handle(DATA_PATH_control)
control_data = pd.read_csv(data_path_control, parse_dates=['Date'])

data_path_experiment = get_file_handle(DATA_PATH_experiment)
experiment_data = pd.read_csv(data_path_experiment, parse_dates=['Date'])

experiment_data

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0
5,"Thu, Oct 16",9500,788,129.0,61.0
6,"Fri, Oct 17",9088,780,127.0,44.0
7,"Sat, Oct 18",7664,652,94.0,62.0
8,"Sun, Oct 19",8434,697,120.0,77.0
9,"Mon, Oct 20",10496,860,153.0,98.0


In [6]:
# Sanity check for pageviews (metric #1)

#To do sanity check, we will need to know the total pageviews for control and experiment groups.
pageviews_con = control_data['Pageviews'].sum()
pageviews_exp = experiment_data['Pageviews'].sum()

# Calculate SE of pageviews. If the data for two groups are equally distrituted, then the probability that the data is assigned in one of the group is 0.5

SE_pageviews = math.sqrt(0.5*(1-0.5)*(1/(pageviews_con+pageviews_exp)))

# margin of error at confidence level of 95%

ME_pageviews = 1.96* SE_pageviews

print("The confidence interval (pageviews) ({},{})".format(0.5-ME_pageviews,0.5+ME_pageviews))

print("The observed value (pageviews) {}".format(pageviews_con/(pageviews_con+pageviews_exp)))

Confidence interval (pageviews) (0.49882039214902313,0.5011796078509769)
The observed value (pageviews) 0.5006396668806133


In [7]:
# Sanity check for number of clicks (metric #3)

#To do sanity check, we will need to know the total clicks for control and experiment groups.
clicks_con = control_data['Clicks'].sum()
clicks_exp = experiment_data['Clicks'].sum()

# Calculate SE of pageviews. If the data for two groups are equally distrituted, then the probability that the data is assigned in one of the group is 0.5

SE_clicks = math.sqrt(0.5*(1-0.5)*(1/(clicks_con+clicks_exp)))

# margin of error at confidence level of 95%

ME_clicks = 1.96* SE_clicks

print("The confidence interval (clicks) ({},{})".format(0.5-ME_clicks,0.5+ME_clicks))

print("The observed value (clicks) {}".format(clicks_con/(clicks_con+clicks_exp)))

Confidence interval (clicks) (0.49588449572378945,0.5041155042762105)
The observed value (clicks) 0.5004673474066628


In [8]:
# Sanity check for click-through-probability (metric #4)

#To do sanity check, we will need to know the pooled probablity
p_pooled = (clicks_con+clicks_exp)/(pageviews_con+pageviews_exp)


# Calculate SE of pageviews. If the data for two groups are equally distrituted, then the probability that the data is assigned in one of the group is 0.5

SE_ctp = math.sqrt(p_pooled*(1-p_pooled)*(1/pageviews_con+1/pageviews_exp))

# margin of error at confidence level of 95%

ME_ctp = 1.96* SE_ctp

d = clicks_exp/pageviews_exp-clicks_con/pageviews_con

print("The confidence interval (CTP difference) ({},{})".format(-ME_ctp,ME_ctp))

print("The observed value (CTP difference) {}".format(d))


Confidence interval (CTP difference) (-0.0012956791986518956,0.0012956791986518956)
The observed value (CTP difference) 5.662709158693602e-05


_All three invariant metrics passed sanity check._

### 3.2 Result Analysis

#### 3.2.1 Effect Size Tests


Next, for your evaluation metrics, calculate a confidence interval for the difference between the experiment and control groups, and check whether each metric is statistically and/or practically significance. A metric is statistically significant if the confidence interval does not include 0 (that is, you can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, you can be confident there is a change that matters to the business.)


If you have chosen multiple evaluation metrics, you will need to decide whether to use the Bonferroni correction. When deciding, keep in mind the results you are looking for in order to launch the experiment. Will the fact that you have multiple metrics make those results more likely to occur by chance than the alpha level of 0.05?


In [9]:
# Gross conversion: number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)

enroll_con = control_data.loc[0:22,'Enrollments'].sum()
enroll_exp = experiment_data.loc[0:22,'Enrollments'].sum()
clicks_con_slice = control_data.loc[0:22,'Clicks'].sum()
clicks_exp_slice = experiment_data.loc[0:22,'Clicks'].sum()

p_pooled_gc = (enroll_con+enroll_exp)/(clicks_con_slice+clicks_exp_slice)
SE_gc = math.sqrt(p_pooled_gc*(1-p_pooled_gc)*(1/clicks_con_slice+1/clicks_exp_slice))
ME_gc = 1.96*SE_gc
d_gc =enroll_exp/clicks_exp_slice-enroll_con/clicks_con_slice

print("The confidence interval for Gross conversion difference is ({},{})".format(d_gc-ME_gc,d_gc+ME_gc))
print("d difference is {}".format(d_gc))

Confidence interval for Gross conversion difference is (-0.0291233583354044,-0.01198639082531873)
d difference is -0.020554874580361565


In [10]:
# Net conversion: number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)

payments_con = control_data.loc[0:22,'Payments'].sum()
payments_exp = experiment_data.loc[0:22,'Payments'].sum()

p_pooled_nc = (payments_con+payments_exp)/(clicks_con_slice+clicks_exp_slice)
SE_nc = math.sqrt(p_pooled_nc*(1-p_pooled_nc)*(1/clicks_con_slice+1/clicks_exp_slice))
ME_nc = 1.96*SE_nc
d_nc = payments_exp/clicks_exp_slice-payments_con/clicks_con_slice

print("The confidence interval for Net conversion difference is ({},{})".format(d_nc-ME_nc,d_nc+ME_nc))
print("d difference is {}".format(d_nc))

Confidence interval for Net conversion difference is (-0.011604624359891718,0.001857179010803383)
d difference is -0.0048737226745441675


_Gross conversion metric: 
The confidence interval doesn't include zero or practical significance boundary. This metric is both statistically significant and practically significant._

_Net conversion metric: 
The confidence interval includes zero and practical significance boundary. This metric is neither statistically significant nor practically significant._

_I didn't use Bonferroni correction. The Bonferroni correction can be conservative if there are a large number of tests and/or the test statistics are positively correlated[1](https://en.wikipedia.org/wiki/Bonferroni_correction). We expect our metrics to be correlated. The Bonferroni could be too conservative for our case. If we choose to use the Bonferroni correction, the alpha would be 0.025, which will make our results more likely to happen by chance._

#### 3.2.2 Sign Tests

For each evaluation metric, do a sign test using the day-by-day breakdown. If the sign test does not agree with the confidence interval for the difference, see if you can figure out why.

In [51]:
# Gross conversion

control_data['Gross conversion'] = control_data['Enrollments']/control_data['Clicks']
control_data['Net conversion'] = control_data['Payments']/control_data['Clicks']

experiment_data['Gross conversion'] = experiment_data['Enrollments']/experiment_data['Clicks']
experiment_data['Net conversion'] = experiment_data['Payments']/experiment_data['Clicks']

experiment_data

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,Gross conversion,Net conversion
0,"Sat, Oct 11",7716,686,105.0,34.0,0.153061,0.049563
1,"Sun, Oct 12",9288,785,116.0,91.0,0.147771,0.115924
2,"Mon, Oct 13",10480,884,145.0,79.0,0.164027,0.089367
3,"Tue, Oct 14",9867,827,138.0,92.0,0.166868,0.111245
4,"Wed, Oct 15",9793,832,140.0,94.0,0.168269,0.112981
5,"Thu, Oct 16",9500,788,129.0,61.0,0.163706,0.077411
6,"Fri, Oct 17",9088,780,127.0,44.0,0.162821,0.05641
7,"Sat, Oct 18",7664,652,94.0,62.0,0.144172,0.095092
8,"Sun, Oct 19",8434,697,120.0,77.0,0.172166,0.110473
9,"Mon, Oct 20",10496,860,153.0,98.0,0.177907,0.113953


In [53]:
#The total number of days that we have available data

control_data['Gross conversion'].count()

23

In [55]:
(control_data['Gross conversion']>experiment_data['Gross conversion']).value_counts()

True     19
False    18
Name: Gross conversion, dtype: int64

In [56]:
(control_data['Net conversion']>experiment_data['Net conversion']).value_counts()

False    24
True     13
Name: Net conversion, dtype: int64

_Sign test calculation can be achieved by using [online calculator](https://www.graphpad.com/quickcalcs/binomial1.cfm)_

_Gross conversion:
There are 19 of 23 days that the control group has a higher gross conversion value than the experiment group.
The two-tailed p-value for 19 of 23 successes is 0.0026. That p-value is significant at alpha at 0.05 level._

_Net conversion:
The two-tailed p-value for 13 of 23 successes is 0.6776. That p-value is not significant at alpha at 0.05 level._


#### 3.2.3 Summary

_Effect size tests and sign tests showed gross conversion change would be both statistically and practically significantly reduced if we change the website. But the change for net conversion was not statistically significant._

### 3.3 Recommendation
Make a recommendation and briefly describe your reasoning.

_The purpose of this experiment is to reduce the number of frustrated students who left the free trial because they didn't have enough time. The results showed that the gross conversion was both significantly and practically decreased in the experiment group. The change we made to the website screened out some students who may potentially drop the class during the free trial. The net conversion was not significantly changed, so the new change to the website didn't significantly reduce the number of students to continue past the free trial and eventually complete the course as requested by experiment purpose. Based on the above results, I decided to launch the change._

## 4. Follow-Up Experiment

_A follow-up experiment could be run to investigate the number of user-ids to complete the course divided by the number of unique cookies to click the "Start free trial" button. The students who do not have enough time to engage in the course may still stay after 14-day trials, because the first one or two chapters of the course may only include background and introductions, and seem not very time-consuming. Once they dig deeper into the course, it may take them more time to finish the homework, pass the quiz and complete the final project. If the students didn't get prepared to spend enough time on the course, they may not complete the course even after they stayed after the 14-day trial and paid for the course. If that's the case, they will be even more frustrated because they paid money but could not catch up with the progress. In the experiment group, students already know how much time they are going to spend on the course, then it is less likely that they paid and failed to finish the course._