In [18]:
import pandas as pd
import numpy as np
import math

## A/B Testing to Test Approaches to Reducing Early Udacity Course Cancellation

<strong>Description (from Udacity) </strong>
    
At the time of this experiment, Udacity courses currently have two options on the course overview page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.


In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This screenshot shows what the experiment looks like.


The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.


The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

<strong>The new screener question</strong>
![title](screen_shoot.jpg)

<strong>Project Instructions</strong>
https://docs.google.com/document/u/1/d/1aCquhIqsUApgsxQ8-SQBAigFDcfWVVohLEXcV6jWbdI/pub?embedded=True

## Eperimental Design

<strong>Unit of Diversion</strong>
For the experiment the unit of diversion will be a cookie, however, if a student enrolls in the free trail they will be tracked via user-id from that point forward. The same user-id cannot enroll in the free trail twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they are signed in when they visit the course overview page.

## Metric Selection

### Invariant Metrics
Theses metrics should remain roughly the same between the control and expirement group, and they should also be inline with the baseline measures for each. We expect a similiar distribution for these metrics.
<ul>
    <li>Number of Cookies: The number of unique cookies to view the course overview page</li>
    <li>
    Number of Clicks: The number of unique cookies to click the "Start Free Trail" button (which happens after the free trial screener is triggered)</li>
    <li>
    Click Through Probability: The number of unique cookies to click the "Start Free Trial" button divided by the number of unique cookies to view the course overview page. The experimental screen question will be shown AFTER a user clicks the "start free trail" button, as a result for both control and experimental groups the experience and thus the probability of click should remain roughly the same. 
    </li>
</ul>

### Evaluation Metrics
We will analyze changes in these metrics to determine the effectiveness of the free trial screener at reducing cancellation rates.
<ul>
    <li>Gross Conversion: The number of user-ids to complete checkout and enroll in the free trial divided by the number of unique cookies to click the "Start Free Trial" button.</li>
    <li>Retention: The number of user-ids to remain enrolled past the 14-day boundary (and make at least 1 payment) divided by the number of user-ids to complete checkout</li>
    <li>Net Conversion: The number of user-id to remain enrolled past the 14-day boundary (and make at least 1 payment) divided by the number of unique cookies to click the "Start Free trail" button.</li>
</ul>

## Calculating Standard Deviation
For each evaluation, using the baseline data provided by Udacity, the analytical estimate of standard deviation is calculated for a sample size of 5,000 cookies visiting the course overview page.

Get the baseline data from Udacity provided CSV

In [17]:
# get data into dataframe
df_baseline = pd.read_csv('baseline_data.csv')
df_baseline.head()

lst_baselineMetrics = [
    'unique_cookie_views',
    'unique_cookies_button_click', 
    'enrollments_day',
    'click_probability',
    'prob_enrolling_g_click', 
    'prob_payment_g_enroll', 
    'prob_payment_g_click'
]

metrics = {}

# set values
for i,j in df_baseline.iterrows():
    metric = lst_baselineMetrics[i]
    val = j[1]
    metrics[metric] = val

# given by Udacity    
metrics['n_sample'] = 5000
print(metrics)

{'unique_cookie_views': 40000.0, 'unique_cookies_button_click': 3200.0, 'enrollments_day': 660.0, 'click_probability': 0.08, 'prob_enrolling_g_click': 0.20625, 'prob_payment_g_enroll': 0.53, 'prob_payment_g_click': 0.1093125, 'n_sample': 5000}


Calculate Analytical standard deviation estimates

In [26]:
# set variables
n_clicks = metrics['unique_cookies_button_click']
gross_conversion = metrics['prob_enrolling_g_click']
click_probability = metrics['click_probability']
n_sample = metrics['n_sample']
retention = metrics['prob_payment_g_enroll']
net_conversion = metrics['prob_payment_g_click']
enrollments_day = metrics['enrollments_day']
page_views = metrics['unique_cookie_views']

# gross conversion estimate
std_gross_conversion = math.sqrt((gross_conversion * (1-gross_conversion))/(click_probability*n_sample))

# retention std estimate
std_retention = math.sqrt((retention*(1-retention))/(enrollments_day/page_views*n_sample))

# Net conversion std estimate
std_net_conversion = math.sqrt((net_conversion*(1-net_conversion))/(n_clicks/page_views*n_sample))

print(f'Std Gross Conversion: {std_gross_conversion}')
print(f'Std Retentoin: {std_retention}')
print(f'Std Net Conversion: {std_net_conversion}')

Std Gross Conversion: 0.020230604137049392
Std Retentoin: 0.05494901217850908
Std Net Conversion: 0.01560154458248846


## Study Sizing

Using the online calculator found <a href='https://www.evanmiller.org/ab-testing/sample-size.html'>here</a> the sample size required is calculated. Assuming an alpha value of 0.05 and beta value of 0.2 for each metric.

### Gross Conversion
> User IDs to complete checkout and enroll/ # of unique cookies to click start free trial button

<ul>
    <li>Baseline Conversion: 20.625%</li>
    <li>Minimum Detectable Effect: 1%</li>
    <li>Alpha: 0.05</li>
    <li>Beta: 1.0-0.2=0.8</li>
    <li>Number of Groups: 2 (experimental and Control)</li>
    <li>Sample Size per Group: 25,835</li>    
    <li>Total Sample Size: 2 x 25,835 = 51,670</li>
    <li>Pageviews: We need 51,670 total individuals to click the "Start Free Trail" button, with an 8% click/pageview rate we need:51,670/.08 = 645,875 page views</li>
     <li>Duration Required (Assuming 100% of traffic): 645,875/40,000 = 16</li>
</ul>

### Retention
> Number of User IDs to remain enrolled past 14-day boundary/ Number of User IDs to complete checkout
<ul>
    <li>Baseline Retention: 53.0%</li>
    <li>Minimum Detectable Effect: 1%</li>
    <li>Alpha: 0.05</li>
    <li>Beta: 1.0-0.2=0.8</li>
    <li>Number of Groups: 2 (Experimental and Control)</li>
    <li>Sample Size per Group: 39,115</li>
    <li>Total Sample Size: 39,115 X 2 = 78,230 </li>
    <li>Pageviews: 78,230/(.08*.20625) = 4,741,212</li>   
    <li>Duration Required (Assuming 100% of traffic): 4,741,212/40,000 = 119</li>
</ul>

### Net Conversion
> Number of User IDs to remain enrolled past the 14-day boundary/number of unique cookies to click the "Start Free Trial" button

<ul>
    <li>Baseline Net Conversion:10.93125%</li>
    <li>Minimum Detecable Effect: 1%</li>
    <li>Alpha: 0.05</li>
    <li>Beta: 1.0-0.2 = 0.8</li>
    <li>Number of Groups: 2 (Experimental and Control)</li>
    <li>Sample Size per Group: 15,464</li>
    <li>Total Sample Size: 30,928</li>
    <li>Pageviews:30,928/.08 = 685,325</li>
    <li>Duration Required (Assuming 100% of traffic): 685,325/40,000 = 17</li>
</ul>

As we are reviewing various metrics, we need to use the largest number of pageviews required by any given metric. In this case, the Retention metric requires the most number of pageviews at 4,741,212 views, which means the total number of pageviews required for are testing is 4,741,212.

## Duration and Exposure

With the baseline value of pageviews at 40,000 page views per day, and diverting 100% of all site traffic, it would take roughly 119 days to collect enough data for the experiment. Best practice typically calls for 1%-10% of traffic to both allow other experiments and to minimize customer impact, however, the nature of the change is fairly subtle it does not present a significant risk in terms of degraded user experience. That said, an experiment of 118 days still creates other issues, and the duration of the experiment can be scalled back to roughly 17 days if we focus net conversions and gross conversions, or 16 days if we limit the scope of investigation excusively on gross conversions. 

## Experiment Results Analysis
### Sanity Checks 
> Prior to analyzing experiment results, we will review the invariant metrics to smell check the overall execution of the experiment. 

<i>Analysis Conducted in Google Sheet, but summary table provided below</i>
<table>
    <thead>
        <tr>
            <td></td>
            <th>Number of Cookies</th>
            <th>Number of Clicks</th>
            <th> Click-Through-Probability</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <th>Control</th>
            <td>345,543 (50.06%)</td>
            <td>28,378 (50.05%)</td>
            <td>8.2126%</td>
        </tr>
        <tr>
            <th>Experiment</th>
            <td>344,660 (49.94%)</td>
            <td>28,325 (49.95%)</td>
            <td>8.2182%</td>
        </tr>
        <tr>
            <th>Total</th>
            <td>690,203</td>
            <td>56,703</td>
            <td>8.2154%</td>
        </tr>
    </tbody>
</table>

For all items, a 95% confidence internval is used.

#### Number of Cookies
> Overall we expect the number of cookies in the control and experimental group to be roughly equal
<ul>
    <li>Standard Error:0.0006018407403</li>
    <li>Margin of Error:0.001179586176</li>
    <li>Lower Bound:0.4988204138</li>
    <li>Upper Bound:0.5011795862</li>
    <li>Observed: 50.06% (control)</li>
    <li>Pass Sanity Check?:Yes</li>
</ul>

#### Number of clicks on "Start Free Trial"
> Overall we expect to be roughly equal across both groups
<ul>
    <li>Standard Error:0.00209974708</li>
    <li>Margin of Error:0.004115428656</li>
    <li>Lower Bound:0.4958845713</li>
    <li>Upper Bound:0.5041154287</li>
    <li>Observed: 50.05% (control)</li>
    <li>Pass Sanity Check?:Yes</li>
</ul>


#### Click-through-probability on "start Free Trial"
> Overall we expect to be roughly equal across both groups
<ul>
    <li>Standard Error:0.0006610608156</li>
    <li>Margin of Error:0.001295655391</li>
    <li>Lower Bound:-0.0013</li>
    <li>Upper Bound:0.0013</li>
    <li>Observed: 0.0001 (experiment-control)</li>
    <li>Pass Sanity Check?:Yes</li>
</ul>

For Number of Cookies and Clicks, standard error is calculated as follows:
SQRT((0.5X0.5)/(N1 + N2))
Where N1 and N2 represent the number of cookies for the control and experimental group, respectively.

For Click Through Probability, standard error is calculated as follows:
SQRT(PGX(1-PG)X((1/N1) + (1/N2)))
Where N1 and N2 are the same as above, and PG if the Global Pooled rate of click through probability (56,703/690,203)

### Effect Size Test
> Now that we have passed the required sanity checks to ensure our experiment was executed correctly, we can now determine the impact of the screener question on the Gross Conversion and Net Conversion rates.

### Calculate Conversion Rates

In [None]:
# load data from csv
df_control = pd.read_csv('Final_Results_Control.csv')
df_exp = pd.read_csv('Final_Results_Experiment.csv')

# make df copy
df_control_copy = df_control.copy()
df_exp_copy = df_exp.copy()

# drop na's
df_control_copy = df_control_copy.dropna()
df_exp_copy = df_exp_copy.dropna()


In [40]:

# calculate gross and net conversion rates
df_control_copy['gross_conversion'] = df_control_copy['Enrollments']/df_control_copy['Clicks']
df_control_copy['net_conversion'] =  df_control_copy['Payments']/df_control_copy['Clicks']

df_exp_copy['gross_conversion'] = df_exp_copy['Enrollments']/df_exp_copy['Clicks']
df_exp_copy['net_conversion'] =  df_exp_copy['Payments']/df_exp_copy['Clicks']

# calculate total and pooled global rates
gross_c = df_control_copy['Enrollments'].sum()/df_control_copy['Clicks'].sum()
net_c = df_control_copy['Payments'].sum()/df_control_copy['Clicks'].sum()

gross_e = df_exp_copy['Enrollments'].sum()/df_exp_copy['Clicks'].sum()
net_e = df_exp_copy['Payments'].sum()/df_exp_copy['Clicks'].sum()

# pooled Global
global_clicks = df_control_copy['Clicks'].sum()+df_exp_copy['Clicks'].sum()
gross_global = (df_control_copy['Enrollments'].sum() + df_exp_copy['Enrollments'].sum())/global_clicks
net_global = (df_control_copy['Payments'].sum() + df_exp_copy['Payments'].sum())/global_clicks

# print Gross and Net conversion rates for control, experimental, and global pooled
print(f'Control, Gross: {gross_c} Net: {net_c}')
print(f'Experimental, Gross: {gross_e} Net: {net_e}')
print(f'Global, Gross: {gross_global} Net: {gross_global}')

Control, Gross: 0.2188746891805933 Net: 0.11756201931417337
Experimental, Gross: 0.19831981460023174 Net: 0.1126882966396292
Global, Gross: 0.20860706740369866 Net: 0.20860706740369866


### Calculate Standard Error and Analyze Significance

In [43]:
# set Z value to 1.96 to correspond to alpha of .05
z_value = 1.96
clicks_c = df_control_copy['Clicks'].sum()
clicks_e = df_exp_copy['Clicks'].sum()
# Standard Error and margin of error, Gross Conversion
se_gross = math.sqrt((gross_global*(1-gross_global))*((1/clicks_c)+(1/clicks_e)))
moe_gross = z_value * se_gross
# Standard Error and margin of error, Net Conversion
se_net = math.sqrt((net_global*(1-net_global))*((1/clicks_c)+(1/clicks_e)))
moe_net = z_value * se_net
print(f'SE Gross Conversion: {se_gross} MOE: {moe_gross}')
print(f'SE Net Conversion: {se_net} MOE: {moe_net}')

SE Gross Conversion: 0.004371675385225936 MOE: 0.008568483755042836
SE Net Conversion: 0.0034341335129324238 MOE: 0.0067309016853475505


In [52]:
# diff, gross conversion
d_gross = gross_e - gross_c
# establish 95% confidence interval around d based on MOE
gross_lower =d_gross-moe_gross
gross_upper = d_gross+moe_gross
print('Gross Conversion diff {0}, CI: [{1},{2}]'.format(d_gross,gross_lower,gross_upper))
if gross_lower > 0 or gross_upper <0:
    print('Statistically signfificant, CI does not include 0')

# diff, net
d_net = net_e - net_c
net_lower = d_net-moe_net
net_upper = d_net+moe_net
print('Net Conversion diff {0}, CI: [{1},{2}]'.format(d_net,net_lower,net_upper))
if net_lower > 0 or net_upper <0:
    print('Statistically signfificant, CI does not include 0')


Gross Conversion diff -0.020554874580361565, CI: [-0.0291233583354044,-0.01198639082531873]
Statistically signfificant, CI does not include 0
Net Conversion diff -0.0048737226745441675, CI: [-0.011604624359891718,0.001857179010803383]


### Sign Test
Statistical signfificance of sign test was calculated using the online calculator here: https://www.graphpad.com/quickcalcs/binomial1.cfm


In [71]:
# create new subset data frame 
df_sign_test = df_control_copy[['Date','gross_conversion','net_conversion']]
df_sign_test['gross_conversion_exp'] = df_exp_copy['gross_conversion']
df_sign_test['net_conversion_exp'] = df_exp_copy['net_conversion']

# calculate diff in control group minus experimental group
df_sign_test['gross_control_m_exp'] = df_sign_test['gross_conversion'] - df_sign_test['gross_conversion_exp']
df_sign_test['net_control_m_exp'] = df_sign_test['net_conversion'] - df_sign_test['net_conversion_exp']

# get diff values to a list
lst_gross = df_sign_test['gross_control_m_exp'].values.tolist()
lst_net = df_sign_test['net_control_m_exp'].values.tolist()

n_gross = len(lst_gross)
n_net = len(lst_net)

assert n_gross == n_net

# get counts of where control > experimental
gross_control_positive = sum([1 for i in lst_gross if i > 0])
net_control_positive = sum([1 for i in lst_net if i >0])

print(f' Count Gross Control>Exp: {gross_control_positive}, Count Net Control>Exp: {net_control_positive}, N: {n_gross}')

 Count Gross Control>Exp: 19, Count Net Control>Exp: 13, N: 23


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


## Final Review and Recommendation: Implement Change

The goal of this analysis was to use the framework of an A/B test to determine the impact of adding a new screen question that appears after a user clicks the 'start free trial' button on the course overview page. The motive of the new screener question is to reduce the number of students who enroll in a program, but end up dropping out before the 14-day trial by providing students with a greater understanding of the time commitment required by the program--with the underlying assumption being that students with less time will opt out of signing up for the 14-day free trial and only more serious students will commit.

Prior to beginning any analysis, we established some invariant metrics--number of cookies, number of button clicks, and probability of click--to allow us to determine if the overall experiment was implemented correctly. Afterwards, we selected 2 evaluation metrics to help determine the effectiveness of the new screener question, we selected Gross Conversion Rate and Net Conversion Rate. Additionally, we reviewed baseline values for a number of metrics to determine the exposure and duration of the experiment. 

The experiment was conducted and the final data was collected and organized into a CSV that was then imported in python for analysis. The analysis started with a basic sanity check on the invariant metrics of choice where we calculated the Standard Error and Margin of Error values for Number of Cookies, number of button clicks, and click-through-probability to then form confidence intervals, which enabled us to compare the value of these metrics between control and experimental group. The results showed that the invariant metrics between the control and experimental group were comparable, and suggest that the experiment was carried out correctly. Finally, we analyzed data corresponding to the effectiveness metrics of choice--Gross and Net Conversion rates. To analyze the result we: 
<ul>
    <li>Calculated Global Pooled Gross and Net Conversion Rates</li>
    <li>Calculated Standard Error Rates</li>
    <li>With Z value of 1.96, Calculated Margin of Error</li>
    <li>Calculated D Hat (difference in gross and net conversion rates between Control and Experimental groups)</li>
    <li>Formed 95% confidence intervals for each evaluation metric</li>
    <li>Evaluated D Hat for statistical and practical significance</li>
</ul>

The final results showed that the screener question had an effect on Gross Conversion Rates that was both statistically and practically significant. No such effect was found with respect to Net Conversion Rates. Additionally, the non-parametric sign test agreed with the overall results generated by the confidence intervals. 

At a high level reducing gross conversion rates seems bad, however, as the goal is to reduce overall dropout rates after the 14-day window by decreasing the number of time constrained students. Retention--a metric not analyzed in this study due to time constraints--most closely measures this overall process, and it is defined as user IDs enrolled past the 14-day period divided by number of users who complete checkout. Reducing overall gross conversion rates by reducing the number of time contrained students will reduce overall number of user ids to complete checkout (ie denominator of Retention), but also likely increase the numberator as time constrained students self-select out and thus leave a higher pool of committed students. That said, the change did not have the anticipated change on Net Conversion rates, as there was a small and statistically insingificant chagne in net conversion rates. Overall, I recommend implementing the change and continue gathering and analyzing data as it relates to retetion and net conversion. 


