## Experiment Design
### Metric Choice

I will list below which metrics I will use in this experiment as invariant metrics or evaluation metrics.  **Invariant metrics** are metrics that shouldn't change across our experiment and control.  So, these metrics should be _independent_ of our experiment.  Conversely, our **evaluation metrics** are metrics which should change as a direct result from our experiment; therefore, these metrics are _dependent_ upon our experiment.

I will also explain for each metric why I did or did not use it as an invariant metric and why you did or did not use it as an evaluation metric.

The metrics I chose to use as invariant metrics were:
    - number of cookies
    - number of clicks
    - click-through-probability

The metrics I chose to use as evaluation metrics were
    - gross conversion
    - retention
    - net conversion

The rationale I had for choosing or not choosing each metric is as follows:

**Number of cookies:** I chose this as an invariant metric because the number of unique cookies occurs before each visitor sees the experiment so this metric is independent from the experiment.

**Number of user-ids:** I didn't choose this metric as either an invariant metric or an evaluation metric because the number of users who enroll in the free trial is dependent on the experiment.

**Number of clicks:** This is a good invariant metric because the number of unique cookies to click the "Start Free Trial" button is independent from the free trial screener (i.e. the click happens before the user sees the experimet).

**Click-through-probability:** This is a good invariant metric because the user clicks before the experiment happens, so the click is independent from the experiment.

**Gross conversion:** I chose this as an evaluation metric because the gross conversion is directly dependent on the results of the experiment.  The number of user-ids divided by the number of unique cookies should theoretically increase as a result of the experiment.

**Retention:** I chose this as an evaluation metric because it is dependent on the experiment since those users who are asked to honestly asses their own time commitment (as is able to commit to that time) for the nanodegree are more likely to enroll past the trial period.

**Net conversion:** I chose this as an evaluation metric because it is dependent on the effect of the experiment; the number of user-ids divided by the number of unique cookies to click on the "Start Free Trial" button should increase with the addition of the self-evaluation.

### Measuring Standard Deviation

I will list below the standard deviation of each of the chosen evaluation metrics. I will also indicate whether I think the analytic estimate would be comparable to the the empirical variability, or whether I expect them to be different.

In [1]:
import pandas as pd
import numpy as np

import matplotlib as plt
import seaborn as sns

from IPython.display import display 

%matplotlib inline

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
# get baseline
baseline = pd.read_csv("baseline.csv", index_col=False,header = None, names = ['metric','value'])
display( baseline )

Unnamed: 0,metric,value
0,Unique cookies to view page per day:,40000.0
1,"Unique cookies to click ""Start free trial"" per...",3200.0
2,Enrollments per day:,660.0
3,"Click-through-probability on ""Start free trial"":",0.08
4,"Probability of enrolling, given click:",0.20625
5,"Probability of payment, given enroll:",0.53
6,"Probability of payment, given click",0.109313


In [3]:
# given a sample size of 5000 cookies visiting enrollment page
sample_size_cookies = 5000

prob_enrolling = 0.206250
unique_cookies = 40000
unique_cookies_click = 3200

std_gross_conv = round(np.sqrt((prob_enrolling*(1.-prob_enrolling))/    \
                               (sample_size_cookies*unique_cookies_click/unique_cookies)), 4)
print( 'standard deviation of gross conversion:', std_gross_conv )

standard deviation of gross conversion: 0.0202


In [4]:
prob_pmt_enroll = 0.53
enroll_per_day = 660

std_retention = round(np.sqrt((prob_pmt_enroll*(1.-prob_pmt_enroll))/    \
                              (sample_size_cookies*enroll_per_day/unique_cookies)), 4)
print( 'standard deviation of retention:', std_retention )

standard deviation of retention: 0.0549


In [5]:
prob_pmt_click = 0.109313

std_net_conv = round(np.sqrt((prob_pmt_click*(1.-prob_pmt_click))/    \
                              (sample_size_cookies*unique_cookies_click/unique_cookies)), 4)
print( 'standard deviation of net conversion:', std_net_conv )

standard deviation of net conversion: 0.0156


### Sizing
#### Number of Samples vs. Power

I will not use the Bonferroni correction during my analysis phase.  To calculate the number of samples needed, I used the calculator at http://www.evanmiller.org/ab-testing/sample-size.html. The pageviews needed for each evaluation metric is as follows:

##### Gross conversion

* Baseline conversion rate = 20.6255%
* d_min = 0.01
* alpha = 0.05
* 1 - beta = 0.2
* calculated samples = 25835
* required pageviews = (25835 / 0.08) * 2 = 645,875

##### Retention

* Baseline conversion rate = 53%
* d_min = 0.01
* alpha = 0.05
* 1 - beta = 0.2
* calculated samples = 39,115
* required pageviews = ((39,115 / 0.08) / 0.20625) * 2 = 4,741,212

##### Net conversion

* Baseline conversion rate = 10.93125%
* d_min = 0.0075
* alpha = 0.05
* 1 - beta = 0.2
* calculated samples = 27,413
* required pageviews = (27413 / 0.08) * 2  = 685,325

**Retention** requires the largest number of pageviews at 4,741,212 so the number of pageviews the experiment will require is 4,741,212.

#### Duration vs. Exposure
If we divert 100% of traffic, we'll need 119 days to run the experiment.  I chose to divert 100% of traffic because we need a large number of pageviews for this experiment and I'd like to gather the data as quickly as possible given the large required sample size.

This experiment is not very risky since it will not affect current users, only potential new students.

In [6]:
days_retention = 4741212/unique_cookies
days_gross_conversion = 645875/unique_cookies
days_net_conversion = 685325/unique_cookies

print('At 100% diversion of traffic:')
print('For retention, we''ll need:', days_retention, 'days')
print('For gross conversion, we''ll need:', days_gross_conversion, 'days')
print('For net conversion, we''ll need:', days_net_conversion, 'days')

At 100% diversion of traffic:
For retention, well need: 118.5303 days
For gross conversion, well need: 16.146875 days
For net conversion, well need: 17.133125 days


## Experiment Analysis
#### Sanity Checks
For each of your invariant metrics, give the 95% confidence interval for the value you expect to observe, the actual observed value, and whether the metric passes your sanity check. (These should be the answers from the "Sanity Checks" quiz.)


For any sanity check that did not pass, explain your best guess as to what went wrong based on the day-by-day data. Do not proceed to the rest of the analysis unless all sanity checks pass.


In [7]:
df_control = pd.read_csv('control.csv')
df_experiment = pd.read_csv('experiment.csv')

df = pd.DataFrame({'control': pd.Series([df_control.Pageviews.sum(),
                                            df_control.Clicks.sum(),
                                            df_control.Enrollments.sum(),
                                            df_control.Payments.sum()]),
                   'experiment': pd.Series([df_experiment.Pageviews.sum(),
                                            df_experiment.Clicks.sum(),
                                            df_experiment.Enrollments.sum(),
                                            df_experiment.Payments.sum()]),
                   'sum_total': pd.Series([df_control.Pageviews.sum()+df_experiment.Pageviews.sum(),
                                          df_control.Clicks.sum()+df_experiment.Clicks.sum(),
                                          df_control.Enrollments.sum()+df_experiment.Enrollments.sum(),
                                          df_control.Payments.sum()+df_experiment.Payments.sum()])
                  
                  })

df.index = ['Pageviews', 'Clicks', 'Enrollments', 'Payments']

display(df)

Unnamed: 0,control,experiment,sum_total
Pageviews,345543.0,344660.0,690203.0
Clicks,28378.0,28325.0,56703.0
Enrollments,3785.0,3423.0,7208.0
Payments,2033.0,1945.0,3978.0


In [17]:
# probability that a user will be in control or experiment group
print( '---Sanity check for number of cookies---')
prob_group = 0.5
SE_pageviews = np.sqrt((prob_group*(1.-prob_group))/(345543.0 + 344660.0))
ME_pageviews = SE_pageviews * 1.96
CI_pageviews = (0.5-ME_pageviews, 0.5+ME_pageviews)

print( 'Confidence Intervel for pageviews: ', np.round(CI_pageviews, 4) )

observed_pageviews = np.round(345543.0/690203.0, 4)
pageviews_pass = True if (observed_pageviews>CI_pageviews[0] and observed_pageviews<CI_pageviews[1]) else False

print('pageview passes?', pageviews_pass, 'with observed rate of:', observed_pageviews)

---Sanity check for number of cookies---
Confidence Intervel for pageviews:  [ 0.4988  0.5012]
pageview passes? True with observed rate of: 0.5006


In [20]:
# probability that a user will be in control or experiment group
print( '---Sanity check for number of clicks on "Start free trial"---')
prob_group = 0.5
SE_clicks = np.sqrt((prob_group*(1.-prob_group))/(28378.0 + 28325.0))
ME_clicks = SE_clicks * 1.96
CI_clicks = (0.5-ME_clicks, 0.5+ME_clicks)

print( 'Confidence Intervel for clicks on "Start free trial": ', np.round(CI_clicks, 4) )

observed_clicks = np.round(28378.0/56703.0, 4)
clicks_pass = True if (observed_clicks>CI_clicks[0] and observed_clicks<CI_clicks[1]) else False

print('clicks on "Start free trial" passes?', clicks_pass, 'with observed rate of:', observed_clicks)

---Sanity check for number of clicks on "Start free trial"---
Confidence Intervel for clicks on "Start free trial":  [ 0.4959  0.5041]
clicks on "Start free trial" passes? True with observed rate of: 0.5005


In [19]:
# probability that a user will be in control or experiment group
print( '---Sanity check for click-through-probability---')
prob_group = 0.5
SE_enrollments = np.sqrt((prob_group*(1.-prob_group))/(3785.0 + 3423.0))
ME_enrollments = SE_enrollments * 1.96
CI_enrollments = (0.5-ME_enrollments, 0.5+ME_enrollments)

print( 'Confidence Intervel for clicks: ', np.round(CI_enrollments, 4) )

observed_enrollments = np.round(3785.0/7208.0, 4)
enrollments_pass = True if (observed_enrollments>CI_enrollments[0] and observed_enrollments<CI_enrollments[1]) else False

print('click-through-probability clicks?', enrollments_pass, 'with observed rate of:', observed_enrollments)

---Sanity check for click-through-probability---
Confidence Intervel for clicks:  [ 0.4885  0.5115]
click-through-probability clicks? False with observed rate of: 0.5251


#### Result Analysis
##### Effect Size Tests
For each of your evaluation metrics, give a 95% confidence interval around the difference between the experiment and control groups. Indicate whether each metric is statistically and practically significant. (These should be the answers from the "Effect Size Tests" quiz.)

##### Sign Tests
For each of your evaluation metrics, do a sign test using the day-by-day data, and report the p-value of the sign test and whether the result is statistically significant. (These should be the answers from the "Sign Tests" quiz.)


##### Summary
State whether you used the Bonferroni correction, and explain why or why not. If there are any discrepancies between the effect size hypothesis tests and the sign tests, describe the discrepancy and why you think it arose.

#### Recommendation
Make a recommendation and briefly describe your reasoning.

## Follow-Up Experiment
Give a high-level description of the follow up experiment you would run, what your hypothesis would be, what metrics you would want to measure, what your unit of diversion would be, and your reasoning for these choices.