- Unique cookies to view page per day:	40000.0
- Unique cookies to click "Start free trial" per day:	3200.0
- Enrollments per day:	660.0
- Click-through-probability on "Start free trial":	0.08
- Probability of enrolling, given click:	0.20625
- Probability of payment, given enroll:	0.53
- Probability of payment, given click	0.1093125

In [139]:
uCookiesViewPageDay = 40000
uCookiesStartFreeDay = 3200
enrollDay = 660
clickThroughProp = 0.08
enrollProp = 0.20625
payPropEnroll = 0.53
payPropClick = 0.1093125
sampleViews = 5000.

In [140]:
import math
import pandas as pd
import numpy as np

def standard_error (prop, size):
    return round((math.sqrt(prop*(1-prop)/size)),4);

sampleCookies = uCookiesStartFreeDay*sampleViews/uCookiesViewPageDay;
sampleEnroll = enrollDay*sampleViews/uCookiesViewPageDay;

print "SE for Gross Conversion: ", standard_error(enrollProp, sampleCookies);
print "SE for Retention:", standard_error(payPropEnroll, sampleEnroll);
print "SE for Net Conversion:", standard_error(payPropClick, sampleCookies);

SE for Gross Conversion:  0.0202
SE for Retention: 0.0549
SE for Net Conversion: 0.0156


## Sample Size without Bonferroni correction
- alpha: 0.05
- beta: 0.2

### Gross Conversion
- base: 0.20625
- size: 25835 cookies

### Retention
- base: 0.53
- size: 39115 users

### Net Conversion
- base: 0.1093125
- size: 27413 cookies

In [241]:
grossConversionSize = 2*round(25835. / uCookiesStartFreeDay * uCookiesViewPageDay,0)
print "Minimum sample size for Gross Conversion:", grossConversionSize

retentionSize = 2*round(39115. / enrollDay * uCookiesViewPageDay, 0)
print "Minimum sample size for Retention: ", retentionSize

netConversionSize = 2*round(27413. / uCookiesStartFreeDay * uCookiesViewPageDay,0)
print "Minimum sample size for Net Conversion:", netConversionSize

Minimum sample size for Gross Conversion: 645876.0
Minimum sample size for Retention:  4741212.0
Minimum sample size for Net Conversion: 685326.0


## Sample Size conditions with Bonferroni correction
- alpha = 0.02
- beta = 0.2

### Gross Conversion
- base: 0.20625
- size: 33014 cookies

### Net Conversion
- base: 0.1093125
- size: 35016 cookies

In [242]:
grossConversionSize = 2*round(33014. / uCookiesStartFreeDay * uCookiesViewPageDay,0)
print "Minimum sample size for Gross Conversion:", grossConversionSize

netConversionSize = 2*round(35016. / uCookiesStartFreeDay * uCookiesViewPageDay,0)
print "Minimum sample size for Net Conversion:", netConversionSize

Minimum sample size for Gross Conversion: 825350.0
Minimum sample size for Net Conversion: 875400.0


In [244]:
def calcDuration(fraction, size):
    return math.ceil (size / (uCookiesViewPageDay * fraction))

print "Duration of experiment for 10%:", calcDuration(.1, netConversionSize)
print "Duration of experiment for 20%:", calcDuration(.2, netConversionSize)
print "Duration of experiment for 50%:", calcDuration(.5, netConversionSize)
print "Duration of experiment for 70%:", calcDuration(.7, netConversionSize)
print "Duration of experiment for 80%:", calcDuration(.8, netConversionSize)

fraction_used = 0.7

Duration of experiment for 10%: 219.0
Duration of experiment for 20%: 110.0
Duration of experiment for 50%: 44.0
Duration of experiment for 70%: 32.0
Duration of experiment for 80%: 28.0


## Sanity Check on invariant metrics

In [257]:
controlData = pd.read_csv("Final Project Results - Control.csv")
experimentData = pd.read_csv("Final Project Results - Experiment.csv")

resultsSum = {"Control":pd.Series([controlData.Pageviews.sum(),controlData.Clicks.sum()],
                                  index = ["pageviews","clicks"]),
           "Experiment":pd.Series([experimentData.Pageviews.sum(),experimentData.Clicks.sum()],
                               index = ["pageviews","clicks"])}
results = pd.DataFrame(resultsSum)

In [258]:
results['Total']= results.Control + results.Experiment
results['Prob'] = 0.5
results['Days'] = [controlData.Pageviews.count(), controlData.Clicks.count()]
results['ControlProb'] = results.Control / results.Total
results['ExperimentProb'] = results.Experiment / results.Total
results['Difference'] = results.ControlProb - results.ExperimentProb
results['SE'] = np.sqrt(results.Prob * (1 - results.Prob) / results.Total)
results['ME'] = 1.96 * results.SE
results['CIlower'] = results.Prob - results.ME
results['CIupper'] = results.Prob + results.ME
results['Pass'] = results.apply(lambda x: (x.ControlProb >= x.CIlower) and (x.ControlProb <= x.CIupper),axis=1)

cols = results.columns.values

results

Unnamed: 0,Control,Experiment,Total,Prob,Days,ControlProb,ExperimentProb,Difference,SE,ME,CIlower,CIupper,Pass
pageviews,345543,344660,690203,0.5,37,0.50064,0.49936,0.001279,0.000602,0.00118,0.49882,0.50118,True
clicks,28378,28325,56703,0.5,37,0.500467,0.499533,0.000935,0.0021,0.004116,0.495884,0.504116,True


In [259]:
controlPageviews = results.loc['pageviews','Control']
controlClicks = results.loc['clicks','Control']

experimentPageviews = results.loc['pageviews','Experiment']
experimentClicks = results.loc['clicks', 'Experiment']

controlClickThroughProb = 1.0 * controlClicks / controlPageviews
experimentClickThroughProb = 1.0 * experimentClicks / experimentPageviews

controlClickThroughProbSE = np.sqrt((controlClickThroughProb * (1- controlClickThroughProb))/controlPageviews)
controlClickThroughProbME = 1.96 * controlClickThroughProbSE

experimentClickThroughProbSE = np.sqrt((experimentClickThroughProb * (1- experimentClickThroughProb))/experimentPageviews)
experimentClickThroughProbME = 1.96 * experimentClickThroughProbSE

controlClickThroughProbCIlower = experimentClickThroughProb - experimentClickThroughProbME
controlClickThroughProbCIupper = experimentClickThroughProb + experimentClickThroughProbME
Pass = controlClickThroughProb >= controlClickThroughProbCIlower and controlClickThroughProb <= controlClickThroughProbCIupper 

resultsCT = { 'Control':None, 'Experiment':None, 'Total':None, 'Prob':None, 'Days':None,
              'ControlProb':controlClickThroughProb, 'ExperimentProb':experimentClickThroughProb, 
              'Difference':controlClickThroughProb-experimentClickThroughProb,
              'SE':controlClickThroughProbSE, 'ME':controlClickThroughProbME,
              'CIlower':controlClickThroughProbCIlower, 'CIupper':controlClickThroughProbCIupper, 'Pass':Pass }
resultsCT = pd.DataFrame(resultsCT, index=["click through"], columns=cols)

results = results.append(resultsCT)

results

Unnamed: 0,Control,Experiment,Total,Prob,Days,ControlProb,ExperimentProb,Difference,SE,ME,CIlower,CIupper,Pass
pageviews,345543.0,344660.0,690203.0,0.5,37.0,0.50064,0.49936,0.001279,0.000602,0.00118,0.49882,0.50118,True
clicks,28378.0,28325.0,56703.0,0.5,37.0,0.500467,0.499533,0.000935,0.0021,0.004116,0.495884,0.504116,True
click through,,,,,,0.082126,0.082182,-5.7e-05,0.000467,0.000915,0.081266,0.083099,True


## Evaluation metrics calculation

In [262]:
resultsSum = {"Control":pd.Series([controlData.Pageviews.sum(), controlData.Clicks.sum(),
                                  controlData.Enrollments.sum(),controlData.Payments.sum()],
                                  index = ["pageviews", "clicks","enrollments","payments"]),
              "Experiment":pd.Series([experimentData.Pageviews.sum(), experimentData.Clicks.sum(),
                               experimentData.Enrollments.sum(),experimentData.Payments.sum()],
                               index = ["pageviews", "clicks","enrollments","payments"])}
results = pd.DataFrame(resultsSum)

In [263]:
results['DaysControl'] = [controlData.Pageviews.count(), controlData.Clicks.count(),
                          controlData.Enrollments.count(), controlData.Payments.count()]
results['DaysExperiment'] = [experimentData.Pageviews.count(), experimentData.Clicks.count(),
                             experimentData.Enrollments.count(), experimentData.Payments.count()]


results

Unnamed: 0,Control,Experiment,DaysControl,DaysExperiment
pageviews,345543,344660,37,37
clicks,28378,28325,37,37
enrollments,3785,3423,23,23
payments,2033,1945,23,23


In [264]:
controlDataNN = controlData[pd.isnull(controlData.Enrollments) != True]
experimentDataNN = experimentData[pd.isnull(experimentData.Enrollments) != True]

resultsSumNN = {"Control":pd.Series([controlDataNN.Pageviews.sum(), controlDataNN.Clicks.sum(),
                                  controlDataNN.Enrollments.sum(),controlDataNN.Payments.sum()],
                                  index = ["pageviews", "clicks","enrollments","payments"]),
                "Experiment":pd.Series([experimentDataNN.Pageviews.sum(), experimentDataNN.Clicks.sum(),
                               experimentDataNN.Enrollments.sum(),experimentDataNN.Payments.sum()],
                               index = ["pageviews", "clicks","enrollments","payments"])}
resultsNN = pd.DataFrame(resultsSumNN)

resultsNN['Total']= resultsNN.Control + resultsNN.Experiment

resultsNN

Unnamed: 0,Control,Experiment,Total
pageviews,212163,211362,423525
clicks,17293,17260,34553
enrollments,3785,3423,7208
payments,2033,1945,3978


In [265]:
experimentClicks = resultsNN.loc["clicks"].Experiment
experimentEnrollments = resultsNN.loc["enrollments"].Experiment
experimentPayments = resultsNN.loc["payments"].Experiment

controlClicks = resultsNN.loc["clicks"].Control
controlEnrollments = resultsNN.loc["enrollments"].Control
controlPayments = resultsNN.loc["payments"].Control

controlGrossConversion = controlEnrollments / controlClicks
experimentGrossConversion = experimentEnrollments / experimentClicks

controlNetConversion = controlPayments / controlClicks
experimentNetConversion = experimentPayments / experimentClicks

GrossConversion = (controlEnrollments+experimentEnrollments) / (controlClicks+experimentClicks)
NetConversion = (controlPayments+experimentPayments) / (controlClicks+experimentClicks)

print "Gross Conversion: ", GrossConversion
print "Net Conversion:", NetConversion



Gross Conversion:  0.208607067404
Net Conversion: 0.115127485312


In [266]:
resultsGC = {
    'Control': controlGrossConversion,
    'Experiment': experimentGrossConversion,
    'Total': GrossConversion,
    'Dmin': 0.01
}
resultsNC = {
    'Control': controlNetConversion,
    'Experiment': experimentNetConversion,
    'Total': NetConversion,
    'Dmin': 0.0075
}
resultsGC = pd.DataFrame(resultsGC, index=["gross conversion"])
resultsNC = pd.DataFrame(resultsNC, index=["net conversion"])

resultsC = resultsGC.append(resultsNC)

resultsC['SE'] = np.sqrt(resultsC.Total * (1-resultsC.Total) * (1/controlClicks + 1/experimentClicks))
resultsC['ME'] = resultsC.SE * 1.96
resultsC['Diff'] = resultsC.Experiment - resultsC.Control
resultsC['CIlower'] = resultsC.Diff - resultsC.ME
resultsC['CIupper'] = resultsC.Diff + resultsC.ME
resultsC['StatSig'] = resultsC.apply(lambda x: (x.CIupper < 0.) or (x.CIlower > 0.),axis=1)
resultsC['PractSig'] = resultsC.apply(lambda x: (x.CIlower > x.Dmin) or (x.CIupper < -x.Dmin),axis=1)

resultsC

Unnamed: 0,Control,Dmin,Experiment,Total,SE,ME,Diff,CIlower,CIupper,StatSig,PractSig
gross conversion,0.218875,0.01,0.19832,0.208607,0.004372,0.008568,-0.020555,-0.029123,-0.011986,True,True
net conversion,0.117562,0.0075,0.112688,0.115127,0.003434,0.006731,-0.004874,-0.011605,0.001857,False,False


## Sign test

In [236]:
signTestData = pd.merge(controlDataNN,experimentDataNN,on="Date")

signTestData['ControlGC'] = signTestData.Enrollments_x/signTestData.Clicks_x
signTestData['ExperimentGC'] = signTestData.Enrollments_y/signTestData.Clicks_y
signTestData['ControlNC'] = signTestData.Payments_x/signTestData.Clicks_x
signTestData['ExperimentNC'] = signTestData.Payments_y/signTestData.Clicks_y
signTestData['GCsign'] = signTestData.ControlGC < signTestData.ExperimentGC
signTestData['NCsign'] = signTestData.ControlNC < signTestData.ExperimentNC

cols = ['Date', 'ControlGC', 'ExperimentGC', 'ControlNC', 'ExperimentNC', 'GCsign', 'NCsign']

signTestData[cols]

Unnamed: 0,Date,ControlGC,ExperimentGC,ControlNC,ExperimentNC,GCsign,NCsign
0,"Sat, Oct 11",0.195051,0.153061,0.101892,0.049563,False,False
1,"Sun, Oct 12",0.188703,0.147771,0.089859,0.115924,False,True
2,"Mon, Oct 13",0.183718,0.164027,0.10451,0.089367,False,False
3,"Tue, Oct 14",0.186603,0.166868,0.125598,0.111245,False,False
4,"Wed, Oct 15",0.194743,0.168269,0.076464,0.112981,False,True
5,"Thu, Oct 16",0.167679,0.163706,0.099635,0.077411,False,False
6,"Fri, Oct 17",0.195187,0.162821,0.101604,0.05641,False,False
7,"Sat, Oct 18",0.174051,0.144172,0.110759,0.095092,False,False
8,"Sun, Oct 19",0.18958,0.172166,0.086831,0.110473,False,True
9,"Mon, Oct 20",0.191638,0.177907,0.11266,0.113953,False,True


In [240]:
print "Number of days: ", len(signTestData)
print "Number of positive GC:", len(signTestData[signTestData.GCsign]), "p-value: ", 0.0026 
print "Number of positive NC:", len(signTestData[signTestData.NCsign]), "p-value: ", 0.6776 

Number of days:  23
Number of positive GC: 4 p-value:  0.0026
Number of positive NC: 10 p-value:  0.6776
