# A/B Testing - Udacity Free Trial Screener
## Dana Hagist

## Experiment Overview

At the time of this experiment, Udacity courses currently have two options on the course overview page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This screenshot (https://drive.google.com/file/d/0ByAfiG8HpNUMakVrS0s4cGN2TjQ/view) shows what the experiment looks like.

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

## Metric Choice

### List of Metrics:

- Number of cookies: That is, number of unique cookies to view the course overview page. (dmin=3000)
- Number of user-ids: That is, number of users who enroll in the free trial. (dmin=50)
- Number of clicks: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
- Click-through-probability: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)
- Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
- Retention: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01)
- Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)

#### Invariant Metrics
Invariant metrics are those that should be evenly distributed between your treatment and control groups.  That is, they should be unaffected by the experiment.  The below three metrics all occur before the free trial screener is triggered, meaning they should be invariant:
- Number of Cookies
- Number of clicks
- Click-through-probability 

#### Evaluation Metrics
Evaluation metrics are those that you care to measure to understand the impact of your experiment.  In this case, the hope is that we are able to reduce the gross conversion (students enrolling in free trial) while keeping net conversion (students remaining enrolled past the 14-day boundary) consistent.  The intent here, as outlined in the Experiment Overview, is to free up coaching resources during the free trial period for those that are more likely to go on and complete the course, due to having ample time to dedicate.  Therefore, our Evaluation Metrics are as follows:
- Gross Conversion: Should be reduced by 1%
- Net Conversion: Should increase by 0.75%

#### Other Metrics:
The other two metrics we would not need to use for the scope of our analysis, which are Number of user ids and retention.  

#### What we will be Looking for:
The results we will be seeking are that the gross conversion rate has a statistically significant decrease by the miniumum magnitude of 1%, and the net conversion rate DOES NOT DECREASE significantly.  The minimum threshold for the net conversion rate having a meaningful change is 0.75%, but the intent of the change is primarily to reduce gross conversion and NOT REDUCE net conversion.

## Measuring Variability

This spreadsheet (https://docs.google.com/spreadsheets/d/1MYNUtC47Pg8hdoCjOXaHqF-thheGpUshrFA21BAJnNc/edit#gid=0) contains rough estimates of the baseline values for these metrics (again, these numbers have been changed from Udacity's true numbers).

Because our evaluation metrics based on cookies, and the unit of diversion is cookie, we can use analytical estimates.  If they were different, we would want to consider using empirical estimates.

Furthermore, we expect our metrics to follow binomial distributions, so in estimating the variance, we will use the formula for a binominal distribution, which is:
p * (1-p) / n

In [66]:
# Baseline values from spreadsheet - variables are in same order
cookies_crs_ovrw_dly = 40000
cookies_free_trl_dly = 3200
enroll_per_day = 660
click_thru_prob_free_trl = .08
prob_enroll_click = .20625
prob_pymt_enroll = .53
prob_pymt_click = .1093125

# Assuming overall visitors cookie sample of 5000,
# We calculate the total number who are likely to click on free trial
# This will be our experiment sample size
N = 5000 * 3200/40000
print(N)

400.0


In [67]:
# Importing Numpy for Square Root and other calculations
import numpy as np

# Expected Gross Conversions (based on 5000 sample)
gross_conv_expect = N * click_thru_prob_free_trl
# Standard deviation using variance calculation from above
gross_sd = np.sqrt(prob_enroll_click * (1-prob_enroll_click) / N)
# Printing expected gross conversions and standard deviation
print("Expected Gross Conversions: " , gross_conv_expect)
print("Gross Conversion Standard Deviation: " , gross_sd)

# Expected Net Conversions (based on 5000 sample)
net_conv_expect = N * prob_pymt_click
net_sd = np.sqrt(prob_pymt_click * (1-prob_pymt_click) / N)
# Printing expected net conversions and standard deviation
print("Expected Net Conversions: " , net_conv_expect)
print("Net Conversion Standard Deviation: " , net_sd)

Expected Gross Conversions:  32.0
Gross Conversion Standard Deviation:  0.020230604137
Expected Net Conversions:  43.725
Net Conversion Standard Deviation:  0.0156015445825


## Sizing
### Choosing Number of Samples given Power

One thing that I need to check is how many total pageviews would be necessary in order to adequately power the experiment.  There is a good online calculator that can be used here (http://www.evanmiller.org/ab-testing/sample-size.html)

For purposes of the analysis, we will use a significance level of 95% (alpha = .05) and beta = .2

#### Gross Conversion
- Baseline Conversion Rate: .20625
- Minimum Detectable Effect: .01
- Alpha = .05
- Beta = .2
- Required Sample Size (per group): 25,835
- Total Sample Size = 25,835 * 2 = 51,670
- Total Pageviews Required = 51,670 / .08 (click thru prob) =  645,875

#### Net Conversion
- Baseline Conversion Rate: .1093125
- Minimum Detectable Effect: .0075
- Alpha = .05
- Beta = .2
- Required Sample Size (per group): 27,411
- Total Sample Size = 27,411 * 2 = 54,822
- Total Pageviews Required = 54,877 / .08 = 685,963


### Choosing Duration vs. Exposure

Because this change is very low risk (not a reasonable expectation of harm to users or cost), I wouldn't see an issue running all traffic through the experiment.  

Because we have approximately 40,000 visitors per day, and are seeking a total of 685,963 pageviews in order to have adequate power in our experiment, the duration of the experiment should be approximately 17.14 days (although we can do 18 to be safe).

This seems to be an acceptable amount of time to run the experiment as with a two week timeframe, there are unlikely to be too many other factors that could impact our evaluation metrics.  However, it would likely be important to look over the course of a month to determine whether certain two week periods have higher gross or net conversion rates than others.

## Analysis

The data for you to analyze is here (https://www.google.com/url?q=https://docs.google.com/a/knowlabs.com/spreadsheets/d/1Mu5u9GrybDdska-ljPXyBjTpdZIUev_6i7t4LRDfXM8/edit%23gid%3D0&sa=D&ust=1535379387782000). This data contains the raw information needed to compute the above metrics, broken down day by day. Note that there are two sheets within the spreadsheet - one for the experiment group, and one for the control group.

However, for purposes of the analysis (and organization), I've created a single csv with a column denoting whether the observation was in the treatment group (treatment = 1) or control group (treatment = 0)

The meaning of each column is:

- Pageviews: Number of unique cookies to view the course overview page that day.
- Clicks: Number of unique cookies to click the course overview page that day.
- Enrollments: Number of user-ids to enroll in the free trial that day.
- Payments: Number of user-ids who who enrolled on that day to remain enrolled for 14 days and thus make a payment. (Note that the date for this column is the start date, that is, the date of enrollment, rather than the date of the payment. The payment happened 14 days later. Because of this, the enrollments and payments are tracked for 14 fewer days than the other columns.)
- Treatment: 1 for set of cookies in treatment group, and 0 for control group.

In [68]:
# Importing pandas to work with the dataset
import pandas as pd

# Reading in the data
df = pd.read_csv('udacity_experiment.csv')

# Creating control and experiment/treatment groups
control = df[df['Treatment']==0]
treatment = df[df['Treatment']==1]

# Printing top rows for treatment and control groups
print(control.head())
print(treatment.head())

          Date  Pageviews  Clicks  Enrollments  Payments  Treatment
0  Sat, Oct 11       7723     687        134.0      70.0          0
1  Sun, Oct 12       9102     779        147.0      70.0          0
2  Mon, Oct 13      10511     909        167.0      95.0          0
3  Tue, Oct 14       9871     836        156.0     105.0          0
4  Wed, Oct 15      10014     837        163.0      64.0          0
           Date  Pageviews  Clicks  Enrollments  Payments  Treatment
37  Sat, Oct 11       7716     686        105.0      34.0          1
38  Sun, Oct 12       9288     785        116.0      91.0          1
39  Mon, Oct 13      10480     884        145.0      79.0          1
40  Tue, Oct 14       9867     827        138.0      92.0          1
41  Wed, Oct 15       9793     832        140.0      94.0          1


### Sanity Checks

For the invariant metrics, I need to ensure that they are evenly distributed between our treatment and control groups, within an acceptable significance level

#### Cookie Count

In [69]:
# Finding and printing number of cookies in control group
cookies_page_control_cnt =sum(control.Pageviews)
print("Number of cookies in control group: ", cookies_page_control_cnt)

# Finding and printing number of cookies in treatment group
cookies_page_treatment_cnt =sum(treatment.Pageviews)
print("Number of cookies in treatment group: ", cookies_page_treatment_cnt)

# Printing difference between the two groups
print("Difference in cookies between treatment and control: ", cookies_page_control_cnt - cookies_page_treatment_cnt)

# Finding standard deviation of cookie count
cookie_cnt_sd = np.sqrt((0.5 * 0.5) / (cookies_page_control_cnt + cookies_page_treatment_cnt))
print ("Standard deviation: ",cookie_cnt_sd)

# Finding margin of error for cookie count
m= cookie_cnt_sd * 1.96
print("Margin of error:", m)

# Printing lower bound as .5 - margin of error
lower_bound = 0.5 - m
print("Lower bound of the 95% confidence interval: ", lower_bound)

# Printing upper bound as .5 + margin of error
upper_bound = 0.5 + m
print("Upper bound of the 95% confidence interval: ", upper_bound)

# Printing the observed percentage of cookies count in control vs total 
p_observed = cookies_page_control_cnt / (cookies_page_control_cnt + cookies_page_treatment_cnt)
print("Oberved fraction: ", p_observed)

Number of cookies in control group:  345543
Number of cookies in treatment group:  344660
Difference in cookies between treatment and control:  883
Standard deviation:  0.000601840740294
Margin of error: 0.00117960785098
Lower bound of the 95% confidence interval:  0.498820392149
Upper bound of the 95% confidence interval:  0.501179607851
Oberved fraction:  0.500639666881


#### Explanation of above: 
Because each cookie should have a 50% chance of falling into the treatment or control group, we have a relatively small standard deviation and tight margin of error with a 95% confidence level. The allowable percentage of cookies in the control group would be up to .501 and our observation is .5006, so we are within the 95% confidence interval

#### Number of Clicks

In [70]:
# Printing number of cookies to click 'start free trial' button in control group
cookies_clicks_control_cnt = sum(control.Clicks)
print("Number of unique cookies to click 'start free trial' button in the control group:", cookies_clicks_control_cnt)

# Printing number of cookies to click 'start free trial' button in treatment group
cookies_clicks_treatment_cnt = sum(treatment.Clicks)
print("Number of unique cookies to click 'start free trial' button in the treatment group:", cookies_clicks_treatment_cnt)

# Printing difference between treatment and control groups
print("Difference in cookies clicking free trial button between treatment and control: ",
      cookies_clicks_control_cnt - cookies_clicks_treatment_cnt)

# Cookie click standard deviation
cookie_click_sd = np.sqrt( (0.5 * 0.5) / (cookies_clicks_control_cnt + cookies_clicks_treatment_cnt))
print("Standard deviation:", cookie_click_sd)

# Margin of Error
m = cookie_click_sd * 1.96
print("Margin of error:", m)

# Printing lower bound as .5 - margin of error
lower_bound = 0.5 - m
print("Lower bound of the confidence interval:", lower_bound)


# Printing upper bound as .5 + margin of error
upper_bound = 0.5 + m
print("Upper bound of the confidence interval:", upper_bound)

# Printing the observed percentage of cookies cliking on free trial in control vs total 
p_observed = cookies_clicks_control_cnt / (cookies_clicks_control_cnt + cookies_clicks_treatment_cnt)
print("Observed fraction:", p_observed)

Number of unique cookies to click 'start free trial' button in the control group: 28378
Number of unique cookies to click 'start free trial' button in the treatment group: 28325
Difference in cookies clicking free trial button between treatment and control:  53
Standard deviation: 0.0020997470797
Margin of error: 0.00411550427621
Lower bound of the confidence interval: 0.495884495724
Upper bound of the confidence interval: 0.504115504276
Observed fraction: 0.500467347407


#### Explanation of above: 
Because each cookie clicking on the 'start free trial' button should have a 50% chance of falling into the treatment or control group, we have a relatively small standard deviation and tight margin of error with a 95% confidence level. The allowable percentage of cookies in the control group would be up to .504 and our observation is .5004, so we are within the 95% confidence interval.  No issues noted with this invariant metric.

#### Click-Through Probability

In [71]:
# Calculating the click-through probability in control group
# Remember this is total cookies that clicked 'start free trial' / cookies that saw course overview
ctp = cookies_clicks_control_cnt / cookies_page_control_cnt
print("Click-through probability in control group: ", ctp)

# Calculating and printing standard deviation of click-through probability 
ctp_sd = np.sqrt((ctp * (1-ctp)) / (cookies_page_control_cnt))
print("Standard Deviation: ", ctp_sd)

# Calculating margin of error
m = ctp_sd * 1.96

# Lower bound for 95% confidence interval
lower_bound= ctp - m
print("Lower bound: ", lower_bound)

# Upper bound for 95% confidence interval
upper_bound= ctp + m
print("Upper bound: ", upper_bound)

# Calculating click-through probability of treatment group, which should be in 95% confidence interval of control group
p_treatment = cookies_clicks_treatment_cnt / cookies_page_treatment_cnt
print("Click-through probability in treatment group: ", p_treatment)

Click-through probability in control group:  0.0821258135746
Standard Deviation:  0.000467068276555
Lower bound:  0.0812103597525
Upper bound:  0.0830412673966
Click-through probability in treatment group:  0.0821824406662


#### Explanation of above:

In the above, we are ensuring that the probability of a cookie clicking the 'start free trial' in the treatment is within the 95% confidence interval of the same probability in control group.  We can see that 0.0822 falls within the acceptable threshold of .0812 and .0830, so no issues here.

We have no tested all of our invariant metrics, which fall within the acceptable thresholds to support that the control and experiment groups are indeed comparable.

### Check for Practical and Statistical Significance

Next, for my evaluation metrics, I will calculate a confidence interval for the difference between the experiment and control groups, and check whether each metric is statistically and/or practically significant. 

A metric is statistically significant if the confidence interval does not include 0 (that is, we can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, we can be confident there is a change that matters to the business.)

Because I have chosen two evaluation metrics and we are looking for both to be statistically significant, we don't need to use a Bonferroni correction.  A Bonferroni correction can be useful in cases where we are looking for ANY statistically significant result from a number of metrics.  The reason for this is that the more metrics you are using for evaluation, the more likely you are to get at least one statistically significant result by chance.  An example is that when using a 95% confidence interval, out of 20 metrics, one is likely to show up as statistically significant by chance.

In [72]:
# Need to drop the records where we don't have number of enrollments and payments
control2=control[control.Enrollments.isnull() == False]
treatment2=treatment[treatment.Enrollments.isnull() == False]

# Checking for any more remaining null values
print("--Nulls in Control--")
print(control2.isnull().sum())
print("--Nulls in Treatment--")
print(treatment2.isnull().sum())

--Nulls in Control--
Date           0
Pageviews      0
Clicks         0
Enrollments    0
Payments       0
Treatment      0
dtype: int64
--Nulls in Treatment--
Date           0
Pageviews      0
Clicks         0
Enrollments    0
Payments       0
Treatment      0
dtype: int64


### Analyzing Gross Conversion

In [73]:
# Calculating Control Sample and Enrollment Count
N_cont = sum(control2.Clicks)
X_cont = sum(control2.Enrollments)

# Calculating Experiment Sample and Enrollment Count
N_exp=sum(treatment2.Clicks)
X_exp=sum(treatment2.Enrollments)

# Printing gross conversion for both control and experiment group
print("Gross conversion for control group: ", X_cont / N_cont)
print("Gross conversion for experiment group: ", X_exp / N_exp)

# Calculating the pooled probability (all enrollments divided by clicks on 'start free trial')
pooled_prob = (X_cont + X_exp) / (N_cont + N_exp)
print("Pooled probability: ", pooled_prob)

# Calculating the pooled standard error
pooled_SE = np.sqrt(pooled_prob * (1 - pooled_prob) * (1/N_cont + 1/N_exp))
print("Pool standard error: ", pooled_SE)

# Calculating the margine of error
m = 1.96 * pooled_SE
print("Margin of error: ", m)

# Calculating the observed difference bewteen treatment and control
diff_observed= X_exp/N_exp - X_cont/N_cont
print("Difference observed between control and experiment group ", diff_observed)

# Calculating lower and upper bounds of the confidence interval
lower_bound = diff_observed - m
print("Lower bound of the 95% of the confidence intervall: ", lower_bound)
upper_bound = diff_observed + m
print("Upper bound of the 95% of the confidence intervall: ", upper_bound)

Gross conversion for control group:  0.218874689181
Gross conversion for experiment group:  0.1983198146
Pooled probability:  0.208607067404
Pool standard error:  0.00437167538523
Margin of error:  0.00856848375504
Difference observed between control and experiment group  -0.0205548745804
Lower bound of the 95% of the confidence intervall:  -0.0291233583354
Upper bound of the 95% of the confidence intervall:  -0.0119863908253


#### Explanation of above
In the above, we are calculating the observed difference between our treatment and control groups, and then determining the 95% confidence interval around that difference.  In this case, we can see that the interval does not include 0, meaning that our result is statistically significant.  Furthermore, as our observed difference, as well as the lower and the upper bound are lower than the minimum threshold (which is meant to be negative), the results are also practically significant.

### Analyzing Net Conversion

In [74]:
# Calculating Control Sample and Payment Count
N_cont = sum(control2.Clicks)
X_cont = sum(control2.Payments)

# Calculating Experiment Sample and Payment Count
N_exp=sum(treatment2.Clicks)
X_exp=sum(treatment2.Payments)

# Printing gross conversion for both control and experiment group
print("Net conversion for control group: ", X_cont / N_cont)
print("Net conversion for experiment group: ", X_exp / N_exp)

# Calculating the pooled probability (all students who go on to pay divided by clicks on 'start free trial')
pooled_prob = (X_cont + X_exp) / (N_cont + N_exp)
print("Pooled probability: ", pooled_prob)

# Calculating the pooled standard error
pooled_SE = np.sqrt(pooled_prob * (1 - pooled_prob) * (1/N_cont + 1/N_exp))
print("Pool standard error: ", pooled_SE)

# Calculating the margine of error
m = 1.96 * pooled_SE
print("Margin of error: ", m)

# Calculating the observed difference bewteen treatment and control
diff_observed= X_exp/N_exp - X_cont/N_cont
print("Difference observed between control and experiment group ", diff_observed)

# Calculating lower and upper bounds of the confidence interval
lower_bound = diff_observed - m
print("Lower bound of the 95% of the confidence intervall: ", lower_bound)
upper_bound = diff_observed + m
print("Upper bound of the 95% of the confidence intervall: ", upper_bound)

Net conversion for control group:  0.117562019314
Net conversion for experiment group:  0.11268829664
Pooled probability:  0.115127485312
Pool standard error:  0.00343413351293
Margin of error:  0.00673090168535
Difference observed between control and experiment group  -0.00487372267454
Lower bound of the 95% of the confidence intervall:  -0.0116046243599
Upper bound of the 95% of the confidence intervall:  0.0018571790108


#### Explanation of above
In the above, once again we are calculating the observed difference between our treatment and control groups, and then determining the 95% confidence interval around that difference.  In this case, we can see that the interval DOES include 0, meaning that our result is NOT statistically significant.  It was also discussed early on that a magnitude (practical significance level) we would care about in this case is 0.75% or .0075, and although the lower bound of our confidence interval does contain that value, our observed difference is not that large of a magnitude.

### Run Sign Tests

Next, for each evaluation metric, I will do a sign test using the day-by-day breakdown. If the sign test does not agree with the confidence interval for the difference, I will see if I can figure out why.

In [75]:
# Calculating the gross conversion for treatment and control groups
gross_conversion_control = control2.Enrollments/control2.Clicks
gross_conversion_treatment = treatment2.Enrollments/treatment2.Clicks
# Some manipulation to reset the indexes of the above two series
gross_conversion_control.reset_index(drop=True, inplace=True)
gross_conversion_treatment.reset_index(drop=True, inplace=True)

# Creating a dataframe for the sign test results
sign_test_df = pd.concat([gross_conversion_control, gross_conversion_treatment], axis = 1)
# Creating the two columns for control and treatment
sign_test_df.columns = ['gross_conversion_control','gross_conversion_treatment']
# Adding column to flag days where 
sign_test_df['sign_test']= sign_test_df.gross_conversion_control < sign_test_df.gross_conversion_treatment

In [76]:
# Printing subset of results
print(sign_test_df)

# Counting and printing number of days of experiment
total_days = len(sign_test_df)
print("Total Days: ", total_days)

# Calculating number of days where experiment > control
true_days = sum(sign_test_df.sign_test == True)

# Printing number of days the gross conversion rate is higher for experiment than control
print("Number of days the gross conversion rate is higher for the experiment \
group than the control group:", true_days)

    gross_conversion_control  gross_conversion_treatment  sign_test
0                   0.195051                    0.153061      False
1                   0.188703                    0.147771      False
2                   0.183718                    0.164027      False
3                   0.186603                    0.166868      False
4                   0.194743                    0.168269      False
5                   0.167679                    0.163706      False
6                   0.195187                    0.162821      False
7                   0.174051                    0.144172      False
8                   0.189580                    0.172166      False
9                   0.191638                    0.177907      False
10                  0.226067                    0.165509      False
11                  0.193317                    0.159800      False
12                  0.190977                    0.190031      False
13                  0.326895                    

#### Calculating Sign Test Statistical Significance
Using following online calculator to calculate statistical significance (http://graphpad.com/quickcalcs/binomial1.cfm)

Successes = 4

Experiments = 23

Probability = 0.5

The two-tail P-value returned by the calculator is equal to 0.0026, which is statistically significant.

The sign test agrees with the effect size test.

In [77]:
# Calculating the gross conversion for treatment and control groups
net_conversion_control = control2.Payments/control2.Clicks
net_conversion_treatment = treatment2.Payments/treatment2.Clicks
# Some manipulation to reset the indexes of the above two series
net_conversion_control.reset_index(drop=True, inplace=True)
net_conversion_treatment.reset_index(drop=True, inplace=True)

# Creating a dataframe for the sign test results
sign_test_df = pd.concat([net_conversion_control, net_conversion_treatment], axis = 1)
# Creating the two columns for control and treatment
sign_test_df.columns = ['net_conversion_control','net_conversion_treatment']
# Adding column to flag days where 
sign_test_df['sign_test']= sign_test_df.net_conversion_control < sign_test_df.net_conversion_treatment

In [78]:
# Printing subset of results
print(sign_test_df)

# Counting and printing number of days of experiment
total_days = len(sign_test_df)
print("Total Days: ", total_days)

# Calculating number of days where experiment > control
true_days = sum(sign_test_df.sign_test == True)

# Printing number of days the net conversion rate is higher for experiment than control
print("Number of days the net conversion rate is higher for the experiment \
group than the control group:", true_days)

    net_conversion_control  net_conversion_treatment  sign_test
0                 0.101892                  0.049563      False
1                 0.089859                  0.115924       True
2                 0.104510                  0.089367      False
3                 0.125598                  0.111245      False
4                 0.076464                  0.112981       True
5                 0.099635                  0.077411      False
6                 0.101604                  0.056410      False
7                 0.110759                  0.095092      False
8                 0.086831                  0.110473       True
9                 0.112660                  0.113953       True
10                0.121107                  0.082176      False
11                0.109785                  0.087391      False
12                0.084211                  0.105919       True
13                0.181278                  0.134864      False
14                0.185239              

#### Calculating Sign Test Statistical Significance

Using following online calculator to calculate statistical significance (http://graphpad.com/quickcalcs/binomial1.cfm)

Successes = 10

Experiments = 23

Probability = 0.5

The two-tail P-value returned by the calculator is equal to 0.6776, which is NOT statistically significant.

The sign test agrees with the effect size test.


### Recommendation

Finally, it is time to make a recommendation.

What we found during this experiment is that including the question about time available for study after clicking the 'start free trial' button did reduce the free-trial enrollments in a meaningful way (estimated at around 2%).  This result was statistically significant.

However, the net conversion rate (those who go on to pay) also reduced. Although the reduction was not statistically significant, the 95% confidence interval did include the magnitude which we would consider meaningful on the negative side.

In conclusion, it would probably make sense to run a follow-up experiment, perhaps including messaging about the benefits of paying and completing the course (to help with the net conversion rate issue).  Or, we could revisit the threshold for net conversions.  Either way, it's probably not good to launch the change at this time.

## Possibility for Follow-Up Experiment: How to Reduce Early Cancellations

If you wanted to reduce the number of frustrated students who cancel early in the course, what experiment would you try? Give a brief description of the change you would make, what your hypothesis would be about the effect of the change, what metrics you would want to measure, and what unit of diversion you would use. Include an explanation of each of your choices.