<a href="https://colab.research.google.com/github/HeZHANG0/HZ/blob/master/ABtest/A_B_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
#set up
import numpy as np

## Intro

This AB test is for an experiment conducted by Udacity (the data has been modified). Udacity tested a change where if a student clicked "start free trial", they were asked how much time they had available to devote to the course. If a student indicated less than 5 hours, he or she would be suggested accessing the course materials for free. If more, he or she would check out as usual. The hypothesis was that this might set clearer expectations for students, therefore reducing the number of students leaving the course because of lack of time while not affecting the number of students to continue past the free trial. The unit of diversion is a cookie.

This report contains experiment design and experiment analysis. For experiment design, I reason invariant metrics and evaluation metrics I choose and calculate variability and sizing. As the test passed sanity check, I then dive into analyzing the evaluation metrics and propose recommendations.


## Experiment Design

### Metric Choice

**Invatiant metrics:**

These metrics are supposed to not vary significantly from the control group to the experiment group. Therefore, they will be used for sanity check.

**Number of cookies.** Number of unique cookies to view the course overview page.

**Number of clicks.** Number of unique coockes to click the "start free trial" button.

**Click-through-probability.** Number of unique cookies to click the "Start free trial" button divided by number of unique cookies to new the course overview page.

**Evaluation metrics:**

These are proposed evaluation metrics. Keeping our goal in mind, we care about the net conversion the most, which is the number of payments divided by the number of enrollments.

**Gross conversion.** Number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "start free trial" button.

**Retention.** Number of user-ids to remain enrolled past the 14-day boundary divided by number of user-ids to complete checkout.

**Net conversion.** Number of user-ids to remain enrolled past the 14-day boundary divided by the number of unique coockies to click the "Start free trial" button.

### Measuring Stardard Deviation

The Standard deviation of the evaluation metrics, given a sample size of 5000:

In [0]:
# Baseline
baseline = {"Cookies":40000,"Clicks":3200,"Enrollments":660,"CTP":0.08,"GConversion":0.20625,
           "Retention":0.53,"NConversion":0.109313}

In [0]:
# Calculate the SD of gross conversion, retention, and net conversion 
# fisrt, scale the baseline sizes to the sample size
baseline['Cookies']=5000
baseline['Clicks']= baseline['Clicks']*5000/40000
baseline['Enrollments']= baseline['Enrollments']*5000/40000

In [0]:
def calculate_sd(pr, size):
  result= (pr*(1-pr)/(size))**0.5
  print(result)
  return result

In [67]:
sd_GConversion= calculate_sd(baseline['GConversion'], baseline['Clicks'])
sd_Retention= calculate_sd(baseline['Retention'], baseline['Enrollments'])
sd_NVonversion= calculate_sd(baseline['NConversion'], baseline['Clicks'])

0.020230604137049392
0.05494901217850908
0.015601575884425905


### Sizing

#### Number of Samples vs. Power

Use Bonferroni correction in the analysis phase. I started with three metrics, then dropped the retention rate because it would require too large sizing. Therefore, the corrected alpha should be 0.05/2.

In [0]:
# Calculate pageview numbers
alpha= 0.05
beta= 0.2

In [0]:
from scipy.stats import norm
def get_z_star(alpha):
  return -norm.ppf(alpha/2)

def get_beta(z_star,s,d_min,N):
  SE=s/(N**0.5)
  return norm.cdf(z_star*SE, loc= d_min, scale=SE)

# s is the pooled standard error for N=1 in each group,
# which is sqrt(p*(1-p)*(1/1 + 1/1))
def required_size(s, d_min, Ns,alpha=0.05,beta=0.2):
  for N in Ns:
    if get_beta(get_z_star(alpha),s,d_min,N)<= beta :
      return N
  return -1

In [70]:
# Calculate separately for three metrics
# For Gross conversion
Ns= np.arange(10000,50000,1).tolist()
s= (2*baseline['GConversion']*(1-baseline['GConversion']))**0.5
#As we have two groups, size *2
c_size= 2*required_size(s,d_min=0.01,Ns=Ns,alpha=alpha,beta=beta)
pageview_size= c_size*40000/3200
pageview_size

642475.0

In [0]:
# For Retention
# Because the required pageview number is too large, I got rid of this metric.
# Ns= np.arange(10000,500000,1).tolist()
# s= (2*baseline['Retention']*(1-baseline['Retention']))**0.5
# c_size= 2*required_size(s,d_min=0.01,Ns=Ns,alpha=alpha,beta=beta)
# pageview_size= c_size*40000/3200
# pageview_size

In [71]:
# For net conversion
Ns= np.arange(10000,50000,1).tolist()
s= (2*baseline['NConversion']*(1-baseline['NConversion']))**0.5
c_size= 2*required_size(s,d_min=0.0075,Ns=Ns,alpha=alpha,beta=beta)
pageview_size= c_size*40000/3200
pageview_size

679300.0

#### Duration vs. Exposure


The number of pageview required is 822625. So I would take 80% of Udacity's everyday traffic (32000) and have the experiment last for 26 days. 

## Experiment Analysis

In [72]:
# Set up
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
import pandas as pd
control= pd.read_csv('/content/gdrive/My Drive/online courses/Control.csv')
experiment= pd.read_csv('/content/gdrive/My Drive/online courses/Experiment.csv')

In [74]:
control

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723.0,687.0,134.0,70.0
1,"Sun, Oct 12",9102.0,779.0,147.0,70.0
2,"Mon, Oct 13",10511.0,909.0,167.0,95.0
3,"Tue, Oct 14",9871.0,836.0,156.0,105.0
4,"Wed, Oct 15",10014.0,837.0,163.0,64.0
5,"Thu, Oct 16",9670.0,823.0,138.0,82.0
6,"Fri, Oct 17",9008.0,748.0,146.0,76.0
7,"Sat, Oct 18",7434.0,632.0,110.0,70.0
8,"Sun, Oct 19",8459.0,691.0,131.0,60.0
9,"Mon, Oct 20",10667.0,861.0,165.0,97.0


### Sanity Checks

In [0]:
# Because our variants involve enrollment and payment, we can only use those complete records. drop NAs.
control.dropna(inplace=True)
experiment.dropna(inplace=True)

In [0]:
# calculate sums for the variables.
# cookie
cookie={}
cookie['control']= control.Pageviews.sum()
cookie['experiment']= experiment.Pageviews.sum()
# click
click={}
click['control']= control.Clicks.sum()
click['experiment']= experiment.Clicks.sum()
# payment
payment={}
payment['control']= control.Payments.sum()
payment['experiment']= experiment.Payments.sum()
# enrollment
enrollment={}
enrollment['control']=control.Enrollments.sum()
enrollment['experiment']= experiment.Enrollments.sum()

In [77]:
# cookie
# Use alpha=0.5
cookie['sd']= (0.5*0.5/(cookie['control']+cookie['experiment']))**0.5
cookie['margin']= cookie['sd']*1.96
cookie['CI_lower']=0.5-cookie['margin']
cookie['CI_upper']=0.5+cookie['margin']
#check whether the control-assign rate is within the CI
print(cookie['CI_lower'],cookie['CI_upper'])
print(cookie['control']/(cookie['control']+cookie['experiment']))

0.49849413322889896 0.501505866771101
0.500945634850363


In [78]:
# click
click['sd']= (0.5*0.5/(click['control']+click['experiment']))**0.5
click['margin']= click['sd']*1.96
click['CI_lower']=0.5-click['margin']
click['CI_upper']=0.5+click['margin']
#check whether the control-assign rate is within the CI
print(click['CI_lower'],click['CI_upper'])
print(click['control']/(click['control']+click['experiment']))

0.4947279053856712 0.5052720946143289
0.5004775272769368


In [79]:
#CTP
CTP={}
CTP['p_pool']= (click['control']+click['experiment'])/(cookie['control']+cookie['experiment'])
CTP['p_experiment']=click['experiment']/cookie['experiment']
CTP['p_control']= click['control']/cookie['control']
CTP['d']= CTP['p_experiment']-CTP['p_control']
CTP['SE_pool']= (CTP['p_pool']*(1-CTP['p_pool'])*(1/cookie['control']+1/cookie['experiment']))**0.5
CTP['margin']= 1.96* CTP['SE_pool']
CTP['CI_lower']= -CTP['margin']
CTP['CI_upper']=CTP['margin']
print(CTP['CI_lower'],CTP['CI_upper'])
print(CTP['d'])

-0.0016488088774456579 0.0016488088774456579
0.0001527615025272433


Therefore, for all the invariants, the experiment passes the sanity checks.

### Result Analysis

#### Effect Size Tests

For gross conversion, retention, and net conversion, we will calculate the difference and see if that's within CI for h0. 

In [0]:
# Bonferroni correction would be too conservative in this case, so I choose not to use it. calculate z
z=get_z_star(0.05)

In [82]:
# gross conversion. 
GConversion={}
GConversion['experiment']= enrollment['experiment']/click['experiment']
GConversion['control']= enrollment['control']/click['control']
GConversion['d']= GConversion['experiment']- GConversion['control']
GConversion['p_pool']= (enrollment['experiment']+enrollment['control'])/(click['experiment']+click['control'])
GConversion['SE_pool']= (GConversion['p_pool']*(1-GConversion['p_pool'])*(1/click['control']+1/click['experiment']))**0.5
GConversion['margin']= GConversion['SE_pool']*z
GConversion['CI_lower']= GConversion['d']-GConversion['margin']
GConversion['CI_upper']= GConversion['d']+ GConversion['margin']
print(GConversion['CI_lower'],GConversion['CI_upper'])
print('practical:0.01')

-0.02912320088750467 -0.011986548273218461
practical:0.01


In [0]:
# # retention. We dropped this metric at the first place
# Retention={}
# Retention['experiment']= payment['experiment']/enrollment['experiment']
# Retention['control']= payment['control']/enrollment['control']
# Retention['d']= Retention['experiment']-Retention['control']
# Retention['p_pool']= (payment['experiment']+payment['control'])/(enrollment['experiment']+enrollment['control'])
# Retention['SE_pool']= (Retention['p_pool']*(1-Retention['p_pool'])*(1/enrollment['experiment']+1/enrollment['control']))**0.5
# Retention['margin']= z*Retention['SE_pool']
# Retention['CI_lower']= Retention['d']-Retention['margin']
# Retention['CI_upper']= Retention['d']+Retention['margin']
# print(Retention['CI_lower'],Retention['CI_upper'])
# print(Retention['d'],'practical:0.01')

In [83]:
# net conversion
NConversion={}
NConversion['experiment']= payment['experiment']/click['experiment']
NConversion['control']=payment['control']/click['control']
NConversion['d']= NConversion['experiment']- NConversion['control']
NConversion['p_pool']= (payment['experiment']+payment['control'])/(click['control']+click['experiment'])
NConversion['SE_pool']= (NConversion['p_pool']*(1-NConversion['p_pool'])*(1/click['experiment']+1/click['control']))**0.5
NConversion['margin']= z*NConversion['SE_pool']
NConversion['CI_lower']= NConversion['d']-NConversion['margin']
NConversion['CI_upper']= NConversion['d']+NConversion['margin']
print(NConversion['CI_lower'],NConversion['CI_upper'])
print('practical:0.0075')

-0.011604500677993734 0.0018570553289054001
practical:0.0075


In sum, gross conversion has practically significant change; net conversion does not have statistically significant change.

#### Sign Tests

In [84]:
# combine data sets and calculate signs.
# merge by dates
full_data= control.merge(experiment,on='Date')
full_data.head()

Unnamed: 0,Date,Pageviews_x,Clicks_x,Enrollments_x,Payments_x,Pageviews_y,Clicks_y,Enrollments_y,Payments_y
0,"Sat, Oct 11",7723.0,687.0,134.0,70.0,7716,686,105.0,34.0
1,"Sun, Oct 12",9102.0,779.0,147.0,70.0,9288,785,116.0,91.0
2,"Mon, Oct 13",10511.0,909.0,167.0,95.0,10480,884,145.0,79.0
3,"Tue, Oct 14",9871.0,836.0,156.0,105.0,9867,827,138.0,92.0
4,"Wed, Oct 15",10014.0,837.0,163.0,64.0,9793,832,140.0,94.0


In [0]:
#sign test for GConversion (enrollment/click) and NConversion (payment/click)
full_data['experiment_GConversion']= full_data.Enrollments_y	/full_data.Clicks_y
full_data['control_GConversion']= full_data.Enrollments_x	/full_data.Clicks_x
full_data['experiment_NConversion']=full_data.Payments_y	/full_data.Clicks_y
full_data['control_NConversion']= full_data.Payments_x	/full_data.Clicks_x
full_data['sign_GConversion']= np.where(full_data.experiment_GConversion>full_data.control_GConversion,1,0)
full_data['sign_NConversion']= np.where(full_data.experiment_NConversion>full_data.control_NConversion,1,0)

In [86]:
full_data.head()

Unnamed: 0,Date,Pageviews_x,Clicks_x,Enrollments_x,Payments_x,Pageviews_y,Clicks_y,Enrollments_y,Payments_y,experiment_GConversion,control_GConversion,experiment_NConversion,control_NConversion,sign_GConversion,sign_NConversion
0,"Sat, Oct 11",7723.0,687.0,134.0,70.0,7716,686,105.0,34.0,0.153061,0.195051,0.049563,0.101892,0,0
1,"Sun, Oct 12",9102.0,779.0,147.0,70.0,9288,785,116.0,91.0,0.147771,0.188703,0.115924,0.089859,0,1
2,"Mon, Oct 13",10511.0,909.0,167.0,95.0,10480,884,145.0,79.0,0.164027,0.183718,0.089367,0.10451,0,0
3,"Tue, Oct 14",9871.0,836.0,156.0,105.0,9867,827,138.0,92.0,0.166868,0.186603,0.111245,0.125598,0,0
4,"Wed, Oct 15",10014.0,837.0,163.0,64.0,9793,832,140.0,94.0,0.168269,0.194743,0.112981,0.076464,0,1


In [87]:
#A summary of the signs
sign={}
sign['total']= len(full_data.sign_GConversion)
sign['GConversion_1']= full_data.sign_GConversion.sum()
sign['GConversion_1_rate']= sign['GConversion_1']/sign['total']
sign['NConversion_1']= full_data.sign_NConversion.sum()
sign['NConversion_1_rate']= sign['NConversion_1']/sign['total']
sign

{'GConversion_1': 4,
 'GConversion_1_rate': 0.17391304347826086,
 'NConversion_1': 10,
 'NConversion_1_rate': 0.43478260869565216,
 'total': 23}

In [0]:
# calculate p value for the 'GConversion_1_rate' and for 'NConversion_1_rate'
import math # to calculate n choose x
def get_prob(x,n):
  p=math.factorial(n)/(math.factorial(x)*math.factorial(n-x))*(0.5**n)
  return p

# because we don't pre-assume the change direction of p, we should use two-tailed test.
def get_2_side_p_value(x,n):
  p=0
  for i in range(x+1):
    p=p+ get_prob(i,n)
  return p*2

In [89]:
# p-value for GConversion
print(get_2_side_p_value(sign['GConversion_1'],sign['total']),0.05)

0.002599477767944336 0.05


In [90]:
# p-value for NConversion
print(get_2_side_p_value(sign['NConversion_1'],sign['total']),0.05)

0.6776394844055176 0.05


#### Summary

Therefore, the Gross Conversion has passed the sign test, but not Net Conversion.

### Recommendation

I recommend not lauching the new UI feature, as the goal was not achieved according to the test. Although the gross conversion (enrollments/clicks) has dropped significantly in a practical sense, the net conversion (payments/clicks) has not changed significantly. The net conversion is the goal we care about here.