# Design an A/B Test

[Cédric Campguilhem](https://github.com/ccampguilhem/Udacity-DataAnalyst), March 2018

<a id="Top">

## Table of contents

- [Introduction](#Introduction)
- [Project organisation](#Organisation)
- [Experiment design](#Design)
    - [Metric choice](#Metric)
    - [Measuring standard deviation](#Standard deviation)
    - [Sizing](#Sizing)
- [Experiment analysis](#Analysis)
    - [Sanity checks](#Sanity)
    - [Result analysis](#Result)
    - [Recommendations](#Recommendations)
- [Follow-up experiment](#Followup)
- [Appendix](#Appendix)

<a id="Introduction">

## Introduction [*top*](#Top)

This project is related to A/B testing course for Udacity Data Analyst Nanodegree program. The purpose of this project is to analyse an experiment made at Udacity.

The experiment is related to a change when student clicks "start free trial" button. A message asks them how much time they would dedicate to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead.

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

<a id="Organisation">

## Project organization [*top*](#Top)

<a id="Design">

## Experiment design [*top*](#Top)

<a id="Metric">

### Metric choice [*Experiment design*](#Design)

The following parameters have been selected as invariants of the analysis (i.e. parameters which should not be affected by the change being analyzed). All those metrics are captured upstream to the change being analyzed.

Invariant                    | Description                      
:----------------------------|:---------------------------------
Number of cookies            | Number of unique cookies to view the course overview page. 
Number of clicks             | Number of unique cookies to click on "Start free trial" button 
Click-through-probability    | Number of unique cookies to click on "Start free trial" button divided by the number of unique cookies to view the course page overview.

All other metrics (Number of user-ids, Gross conversion, Retention, Net conversion) are collected downstream to change and may potentially being affected by change.

The following parameters have been selected as evaluation metrics because it collects information downstream to change and are related to the objectives of this A/B test which are: 

- minimizing the proportion of enrolled students quiting during the trial
- keeping the same proportion of students clicking the start free trial and continuing the course afterwards

Evaluation metrics  | Description | Practical significance boundary | Reasons of choice
:-------------------|:------------|:--------------------------------|:------------------
Gross conversion    | Number of user-ids to enroll in the free trial divided by the number of unique cookies to click on the "Start free trial" button. | ${d}_{min} = 0.01$ | This captures the proportion of students changing their mind after the time commitment warning. In the control group this metric should be 1.
Retention           | Number of user-ids to remain enrolled after the trial divided by the number of user-ids enrolled during the trial. | ${d}_{min} = 0.01$ | This metric enables to evaluate first objective: minimize number of students quitting during the trial.
Net conversion      | Number of user-ids to remain enrolled after the trial divided by the number of unique cookies to click the "Start free trial" button. | ${d}_{min} = 0.0075$ | This metrics enables to evaluate the second objective: keep the same proportion of students enrolled in the long term.

Number of user-ids who enroll in the free trial may be affected by change but does not provide any relevant information serving the objective of the analysis. 

<a id="Standard deviation">

### Measuring standard error [*Experiment design*](#Design)

The standard error for each evaluation metrics will be calculated with the following data:

Parameter | Value
:---------|:------
Unique cookies to view course overview page per day |	40000
Unique cookies to click "Start free trial" per day | 3200
Enrollments per day |	660
Click-through-probability on "Start free trial" |	0.08
Probability of enrolling, given click |	0.20625
Probability of payment, given enroll |	0.53
Probability of payment, given click	| 0.1093125

The *probability of enrolling, given click* is linked to the *gross conversion* metric. *Probability of payment, given enroll* is in relation with the *retention* metric. Finally, *probability of payment, given click* is related to *net conversion* metric. As we are dealing with probabilities, we will assume to have a binomial distribution. We can then estimate the standard error for each metric using the binomial standard deviation:

\begin{align}
SE = \sqrt{\frac{p(1-p)}{n}}
\end{align}

Where:
- p is the probability of event
- n is the number of repetitions of the event

In [38]:
nb_cookies = 5000.
nb_clicks = nb_cookies * 3200. / 40000.
nb_enrollments = nb_clicks * 660. / 3200.
print nb_clicks, nb_enrollments

400.0 82.5


In [39]:
import math
stddev_gross = math.sqrt(0.20625 * (1 - 0.20625) / nb_clicks)
stddev_retention = math.sqrt(0.53 * (1 - 0.53) / nb_enrollments)
stddev_conversion = math.sqrt(0.1093125 * (1 - 0.1093125) / nb_clicks)
print stddev_gross, stddev_retention, stddev_conversion

0.020230604137 0.0549490121785 0.0156015445825


The sample size is 5000 cookies. We can then assume that we will have 400 clicks on the "Start free trial button" and 82.5 enrollments. The standard deviations are reported in the table below:

Evaluation metrics | Units of analysis (n) | Estimated standard deviation
:------------------|:----------------------|:----------------------------
Gross conversion   | cookie (400)          | 0.0202
Retention          | user-id (82)          | 0.0549
Net conversion     | cookie (400)          | 0.0156

Gross conversion and net conversion use cookie as unit of analysis and unit of diversion, so the analytical standard error calculated here shall be quite close from empirical values. This is not the case for retention metrics as it uses user-id and we could have differences between empirical variability and the one estimated above.

<a id="Sizing">

### Sizing [*Experiment design*](#Design)

#### Number of samples vs power

Assuming that all metrics are independent, the probability of having at least one false positive would be $1 - (0.95 * 0.95 * 0.95) = 0.14$. Using the Bonferroni correction is a way to reduce the overall chance to get false positive by increasing significance level (reducing alpha value) for each test that is made and I am going to use it in this experiment. This leads to use a alpha value of 1.67% instead. I have rounded up alpha to 2% to perform the following calculations.

I have used the online calculator provide by [Evan Miller](http://www.evanmiller.org/ab-testing/sample-size.html) to estimate sample size for A/B test. The results are provided in the table below:

Parameter            | Base conversion rate | Practical significance | $\alpha$ | $1 - \beta$ | Sample size per variation
:--------------------|:---------------------|:-----------------------|:---------|:------------|:-------------------------
Gross conversion     | 20.625 %             | 1.0 %                  | 2.0 %    | 80.0 %      | 33014
Retention            | 53.0 %               | 1.0 %                  | 2.0 %    | 80.0 %      | 50013
Net conversion       | 10.93125 %           | 0.75 %                 | 2.0 %    | 80.0 %      | 35016

The retention metrics is the one requiring the most samples per variation. But as this metrics is also using user-id as units of analysis, it also need to be converted to clicks, increasing again the number of page views (only 8% of view lead to clicks):

\begin{equation}
{pageviews} = \frac{50013 * 2}{0.08 * 0.20625}
\end{equation}

The equation above assumes that both control and test groups are seeing the same number of pages and leads to 6062182 page views.

In [50]:
print 50013. / (0.08 * 0.20625) * 2.

6062181.81818


#### Duration vs exposure

We have 40000 unique cookies to view course overview per day. If we redirect half of the traffic, the duration would be:

\begin{equation}
duration = \frac{6062182}{40000 * 0.5}
\end{equation}

The equation above leads to 304 days ! That's a long experiment and Udacity does not want to spend that long. We need to rework some of the previous decisions we have made.

The retention metric is really demanding in terms of page views. If we drop this metric and updating the Bonferroni correction (alpha is now 2.5% -rounded up to 3% in online calculator), the dimensionning metric is net conversion which now requires 791500 page views. If we increase the redirection factor to two-third of the traffic, this lead to a duration of 30 days which is much more manageable.

In [53]:
print 6062182 / (40000 * 0.5)

303.1091


In [63]:
print 31660 / 0.08 * 2.
print 791500.0 / (40000 * 0.66)

791500.0
29.9810606061


This desing exposes one third of students to a new feature during one month. The nature of feature is to minimize students starting the free trial without willing to dedicate more than 5 hours a week to follow the course. This feature shall not change the mind of students wanting to take the course and agreeing to dedicate a long time to it. So running this test is probably is a reasonable risk.

<a id="Analysis">

## Experiment analysis [*top*](#Top)

<a id="Sanity">

### Sanity checks [*Experiment analysis*](#Analysis)

type text here

<a id="Result">

### Result analysis [*Experiment analysis*](#Analysis)

type text here

<a id="Recommendations">

### Recommendations [*Experiment analysis*](#Analysis)

type text here

<a id="Followup">

## Follow-up experiment [*top*](#Top)

type text here

<a id="Appendix">

## Appendix [*top*](#Top)

type text here

In [2]:
#Convert notebook to html
!jupyter nbconvert --to html --template html_minimal.tpl Design_an_AB_test.ipynb

[NbConvertApp] Converting notebook Design_an_AB_test.ipynb to html
[NbConvertApp] Writing 260626 bytes to Design_an_AB_test.html
