# Udacity ABtesting

## 1.Preparation

### 1.1 Packages

In [1]:
import numpy as np
import scipy.stats as stats
import math
from scipy.stats import norm
import pandas as pd
import matplotlib.pyplot as plt

### Data Import

### Define Functions

In [2]:
def get_z_star(alpha):
    return stats.norm.ppf(alpha)

## 2.Choosing and Characterizing

### 2.1 Experiment Overview

At the time of this experiment, Udacity courses currently have two options on the course overview page and here are some following events after clicking the button:  
- **"start free trial"**:  they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first.  
- **"access course materials"**: they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.  

In the experiment, Udacity tested a change where `if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead.`  


The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.    



### 2.2 Business Objective

Maximize the course completion rate of “Free Trial” users through guiding the students who do not have enough time to “access course materials”.


### 2.3 Customer Funnel

<img src="image/19.png" width="700">

### 2.4 Metrics Overview 

#### Data explanation

Here are the data we collected:  



|  item  |  data  |
|  ----  |  ----  |
|  Unique cookies to view course overview page per day  |  40000  |
|  Unique cookies to click "Start free trial" per day   |  3200  |
|  Enrollments per day  | 660  |
|  Click-through-probability on "Start free trial"  |  0.08  |
|  Probability of enrolling, given click  |  0.20625  |
|  Probability of payment, given enroll  |  0.53  |
|  Probability of payment, given click  |  0.1093125  |



In the following analysis, $d_{min}$ means the minimum change which is significant for business.

#### Metrics' practial significance bar

Here are metrics we may use for invariant metrics or evaluation metrics  



|  Metrics  | $d_{min}$  |
|  -------- |  --------  |
| **# of Cookies**: # of unique cookies to view the course overview page | 3000 |
| **# of users-ids**: # of users who eroll in the free trial | 50 |
| **# of clicks**: # of unique cookies to click the "start free trial" button (before the free trial screener is trigger) | 240 |
| **CTP** $= \frac{\text{number of unique cookies to click the button}}{\text{number of unique cookies to view the course overview page}}$ | 0.01 |
| **Gross conversion** $= \frac{\text{number of user-ids to complete checkout and enroll in the free trial}}{\text{number of unique cookies to click the button}}$ | 0.01 |
| **Retention** $= \frac{\text{number of user-ids to remain enrolled past 14-day boundary}}{\text{number of user-ids to complete checkout}}$ | 0.01 |
| **Net Conversion** $= \frac{\text{number of user-ids to remain enrolled past 14-day boundary }}{\text{number of unique cookies to click the button}}$ | 0.0075 | 


In [3]:
# collected data:
N_cookies = 40000
N_clicks = 3200
N_enroll_per_day = 660
CTP_sftrial = 0.08
P_enroll_click = 0.20625  # gross conversion
P_pay_enroll = 0.53       # retention
P_pay_click = 0.1093125   # net conversion

# practical significance levels
d_min_cookies=3000
d_min_userid=50
d_min_enroll=240
d_min_CTP=0.01
d_min_Gross_conversion=0.01
d_min_Retention=0.01
d_min_Net_conversion=0.0075

In [6]:
GC={}
GC['d_min'] = d_min_Gross_conversion
GC['p'] = P_enroll_click
GC['n'] = N_clicks

RT={}
RT['d_min'] = d_min_Retention
RT['p'] = P_pay_enroll


### 2.5 Hypothesis

$H_0$: There are no significant differences between control group and experiment group.  
$H_1$: There are significant differences between gcontrol group and experiment group. 

## 3.Design the Experiment

### 3.1 Metric Choice

<img src="image/19.png" width="500">

**Invariant Metrics:** 
- Number of cookies:&emsp;&emsp;&emsp;New function is added after cookies are counted, therefore, should remain unchange.
- Number of clicks:&emsp;&emsp;&emsp;&emsp;Same reason as above.
- Click Through Probability: Cookies and clicks remain unchange, which lead to CTP unchange.     
      
**Evaluation Metrics:**  
- Gross conversion
- Retention
- Net conversion

**Unit of diversion:** 
- The unit of diversion is a **cookie**, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

### 3.2 Measuring Standard Deviation

**Standard Error of Evaluation Metrics** (given a sample size of 5000 cookies visiting the course overview page)


$SE=\sqrt{\frac{p*(1-p)}{\frac{5000}{N_{pageviews}}*N_{clicks}}}$

In [14]:
gc_SE=round(np.sqrt(P_enroll_click*(1-P_enroll_click)/(5000*N_clicks/N_cookies)),4)
ret_SE=round(np.sqrt(P_pay_enroll*(1-P_pay_enroll)/(5000*N_enroll_per_day/N_cookies)),4)
nc_SE=round(math.sqrt(P_pay_click*(1-P_pay_click)/(5000*N_clicks/N_cookies)),4)
print(f'Gross conversion SE: {gc_SE}')
print(f'Retention SE: {ret_SE}')
print(f'Net conversion SE: {nc_SE}')

Gross conversion SE: 0.0202
Retention SE: 0.0549
Net conversion SE: 0.0156


|  metric  |  analytic result  |  empirical result  |
|  ----------------  |  ------  |  -------   |
|  Gross Conversion  |  0.0202  |  0.20625   |
|  Retention         |  0.0549  |  0.53      |
|  Net Conversion    |  0.0156  |  0.1093125 |

The analytic estimates seems comparable to the empirical variability

### Sizing

#### **Number of Samples vs. Power**

**Do we need Bonferroni Correction:**  
  This experiment has 3 metrics: Gross Conversion(GC), Retention(RT), Net Conversion(NC). So, for a multiple metrics AB test, we should consider if Bonferroni Correction need to be applied.  


In [7]:
N=3 # number of metrics
a=0.05
a_corrected=a/N
print(f'alpha corrected: {round(a_corrected,4)}')

alpha corrected: 0.0167


Recall that the $d_{min}$ for 3 evaluation metrics are $[0.01,0.01,0.0075]$. None of the $d_{min}$ is greater or equal to $\alpha_{corrected}$. Thus, we do not need Bonferroni Correction here.

**Calculate the Size**

In [8]:
n_gc=25835
n_rt=39115
n_nc=27413
n_group=2

**Calculate pageviews we need**

In [9]:
#pageview needed:
n1=round(n_gc*n_group/(N_clicks/N_cookies))
n2=round(n_rt*n_group/(N_enroll_per_day/N_cookies))
n3=round(n_nc*n_group/(N_clicks/N_cookies))
sizedata={'Metrics':['Gross Conversion','Retention','Net Conversion'],
          'Sample Size':[n_gc,n_rt,n_nc],
          'Pageview Needed':[n1,n2,n3]}
sizedf=pd.DataFrame(sizedata)
sizedf

Unnamed: 0,Metrics,Sample Size,Pageview Needed
0,Gross Conversion,25835,645875
1,Retention,39115,4741212
2,Net Conversion,27413,685325


#### **Duration vs. Exposure**

In [10]:
sizedf['Pageview Needed'] = sizedf['Pageview Needed'].astype(float)

# Calculate the 'duration' column with ceil function
sizedf['duration'] = sizedf['Pageview Needed'].apply(lambda x: math.ceil(x / N_cookies))

# Display the DataFrame
sizedf


Unnamed: 0,Metrics,Sample Size,Pageview Needed,duration
0,Gross Conversion,25835,645875.0,17
1,Retention,39115,4741212.0,119
2,Net Conversion,27413,685325.0,18


## Analyzing Results

### Sanity Checks


In [11]:
control= pd.read_csv('data/Final Project Results - Control.csv')
experiment= pd.read_csv('data/Final Project Results - Experiment.csv')

In [12]:
control.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


In [13]:
experiment.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [40]:
def sanitycheck(control,experiment,p,alpha):
    control=control
    experiment=experiment
    total=control+experiment
    p_hat=control/total
    sd=round(np.sqrt(p*(1-p)/total),4)
    z=1.96
    m=round(sd*z,4)
    lower_CI=round(p_hat-m,4)
    upper_CI=round(p_hat+m,4)
    result = pd.DataFrame({
        'lower_CI': [lower_CI],
        'upper_CI': [upper_CI],
        'observed': [p_hat],
        'pass': [lower_CI <= p_hat <= upper_CI],
    })

    return result

Pageview Sanity Check

In [41]:
pageview_snc=sanitycheck(control['Pageviews'].sum(),experiment['Pageviews'].sum(),0.5,0.05)
print(pageview_snc)

   lower_CI  upper_CI  observed  pass
0    0.4994    0.5018   0.50064  True


Let's see what happened in data day by day.

In [39]:
pageview=pd.DataFrame({'control': control['Pageviews'],
                       'experiment': experiment['Pageviews'],
                       'observed': control['Pageviews']/(control['Pageviews']+experiment['Pageviews'])})
pageview.head()

Unnamed: 0,control,experiment,observed
0,7723,7716,0.500227
1,9102,9288,0.494943
2,10511,10480,0.500738
3,9871,9867,0.500101
4,10014,9793,0.505579


Clicks Sanity Check

In [42]:
click_snc=sanitycheck(control['Clicks'].sum(),experiment['Clicks'].sum(),0.5,0.05)
print(click_snc)

   lower_CI  upper_CI  observed  pass
0    0.4964    0.5046  0.500467  True


CTP Sanity Check

In [35]:
clicks_cont = control['Clicks'].sum()
clicks_exp = experiment['Clicks'].sum()
clicks_total = clicks_cont + clicks_exp
pageviews_cont = control['Pageviews'].sum()
pageviews_exp = experiment['Pageviews'].sum()
pageviews_total = pageviews_cont + pageviews_exp
ctp_cont=clicks_cont/pageviews_cont
ctp_exp=clicks_exp/pageviews_exp
d_hat=round(ctp_exp-ctp_cont,4)
p_pooled=clicks_total/pageviews_total
sd_pooled=np.sqrt(p_pooled*(1-p_pooled)*(1/pageviews_cont+1/pageviews_exp))
m=round(1.96*sd_pooled,4)
CI_lower=round(d_hat-m,4)
CI_upper=round(d_hat+m,4)
print(f'CI: [{CI_lower},{CI_upper}]', f'd_hat: {d_hat}')
CI_lower<=d_hat<=CI_upper

CI: [-0.0012,0.0014] d_hat: 0.0001


True

### Sign Tests 

The p-value of Gross conversion is 0.0026, < 0.05. Therefore, Gross conversion is statistically significant.  
The p-value of Net conversion is 0.6776, > 0.05. Therefore, Net conversion is not statistically significant.  