# Statistical Analysis of A/B Experiment on Udacity Free Trial Screener

## Author: Adeyinka J. Oresanya

---------

# **Introduction**

### **Experiment Overview**

Udacity tested a change on their website where if a potential student clicked the "Start Free Trial" button on the Course Overview page, they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they selected fewer than 5 hours per week, a message would appear informing them that Udacity courses usually require a greater time commitment for successful completion and suggesting that the student might like to access the course materials for free. At this point, the student would have the option either to continue enrolling in the free trial, or access the course materials for free instead. The overall business objective for this new feature is to increase the likelihood that students who continue past the free trial will make payments and complete their courses.

The aim of this project is to conduct statistical analysis on the data gathered from the A/B experiment to determine whether there was a significant change caused by the Free Trial Screener and to provide recommendation whether or not to launch this feature.

### Hypothesis

Free trial screener will reduce the number of students who leave the free trial because of lack of time and increase the likelihood of students who continue past the free trial to eventually make payments and complete the course.



*H0: There is no significant difference between students who went through the free trial screener and those who did not.*

*Ha: There is a significant difference between students who went through the free trial screener and those who did not.*

## Datasets


The datasets used are available [here](https://docs.google.com/spreadsheets/d/1Mu5u9GrybDdska-ljPXyBjTpdZIUev_6i7t4LRDfXM8/edit#gid=0)




    ## **Selected Metrics**



    *   **Click-through rate:** This is the number of clicks divided by the number of pageviews. For this experiment, this metric will not move because it can not distinguish the outcomes of the Free Trial Screener between the experiment and control groups, as clicks happened before the Screener popped up. Thus, it will be useful as an invariant metric for checking that the experiment is robust for A/B testing.


    *   **Retention:** This is the number of users to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of enrolled students. This metric is expected to increase for the experiment group given that the number of students who go ahead with the free trial reduces without reducing the number of students who are enrolled past 14 days.


    *   **Gross conversion:** This is the number of enrolled users divided by the number of clicks. It is assumed that this metric will decrease for the experiment group because the number of enrollments is expected to decrease after answering the Screener question, given  that those who indicated 'Fewer than 5 hours per week' will not be encouraged to enroll.


    *   **Net conversion:** This is the number of users to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of clicks. This metric is also expected to increase for the experiment group 

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.stats.proportion import proportions_ztest

In [2]:
control =  pd.read_csv('Control.csv')

experiment = pd.read_csv('Experiment.csv')


In [3]:
data = pd.merge(control, experiment, on= 'Date', how = 'inner')

## **95% confidence interval on click-through rate**

In [4]:
def confidence_interval(x1, x2, n1, n2):
  '''
  Returns the upper and lower bounds of the confidence interval for the difference between two proportions

  Parameters
  ----------
  x1 {int, array-like}: The number of successses in n trials for the first independent sample
  x2 {int, array-like}: The number of successses in n trials for the second independent sample
  n1 {int, array-like}: The number of trials or observations for the first sample
  n2 {int, array-like}: The number of trials or observations for the second sample

  Returns
  -------
  Lowerbound {float}: Lowerbound of the confidence interval
  Upperbound {float}: Upperbound of the confidence interval
  
  '''
  p1 = x1.sum() / n1.sum()
  n1 = n1.sum()                    
  p2 = x2.sum() / n2.sum()
  n2 = n2.sum()
  p_diff = p2 - p1
  standard_error_pooled = np.sqrt(((p1 * (1- p1))/n1 + (p2 * (1- p2))/n2 )) 
  z_critical = stats.norm.ppf(q = 0.975)
  margin_of_error = standard_error_pooled * z_critical
  upperbound = p_diff + margin_of_error
  lowerbound = p_diff - margin_of_error
  print(f'The confidence interval is between {lowerbound:.{4}f} and {upperbound:.{4}f}')


confidence_interval(data.Clicks_ctl, data.Clicks_exp, data.Pageviews_ctl, data.Pageviews_exp)

The confidence interval is between -0.0012 and 0.0014


It is estimated with 95% confidence that the true difference between the control and experiment groups lies between -0.12% and 0.14%. Since this interval includes zero, we can conclude that the difference between these two groups is not statistically significant. This is expected because clicks happened before the change was seen (or not).

## **Preparing the Metrics**

**Note:** The dataset has observations of pageviews and clicks for 37 days, while records for enrollments and payments were for 23 days. Thus, when working with enrollments and payments, only the corresponding records of pageviews and clicks were used in calculations.

In [5]:
data['ctr_control']= data['Clicks_ctl'] / data['Pageviews_ctl']
data['ctr_experiment']= data['Clicks_exp'] / data['Pageviews_exp']

In [6]:
df = data[data.Enrollments_ctl.notnull()]
data2 = df.copy()

In [7]:
data2['retention_control']= data2['Payments_ctl'] / data2['Enrollments_ctl']
data2['retention_experiment']= data2['Payments_exp'] / data2['Enrollments_exp']

data2['GrossConversion_control']= data2['Enrollments_ctl'] / data2['Clicks_ctl']
data2['GrossConversion_experiment']= data2['Enrollments_exp'] / data2['Clicks_exp']

data2['NetConversion_control']= data2['Payments_ctl'] / data2['Clicks_ctl']
data2['NetConversion_experiment']= data2['Payments_exp'] / data2['Clicks_exp']

## **Conducting Z-test**

In [8]:
Metric = {
          "Click-through Rate": [data['Clicks_ctl'], data['Clicks_exp'], data['Pageviews_ctl'], data['Pageviews_exp']], 
          "Retention": [data2['Payments_ctl'], data2['Payments_exp'], data2['Enrollments_ctl'], data2['Enrollments_exp']],
          "Gross conversion": [data2['Enrollments_ctl'],  data2['Enrollments_exp'], data2['Clicks_ctl'], data2['Clicks_exp']],
          "Net conversion": [data2['Payments_ctl'], data2['Payments_exp'], data2['Clicks_ctl'], data2['Clicks_exp']]
         }



data_list= []
for key, value in Metric.items():
    successes = np.array([Metric[key][0].sum(), Metric[key][1].sum()])
    total_trials = np.array([Metric[key][2].sum(), Metric[key][3].sum()])
    z_stat, p_val = proportions_ztest(successes, total_trials, value = 0, alternative='two-sided')
    if p_val > 0.05:
        result = "Not significant"
    else:
        result = "Significant"
    data_list.append({"Metric": key, "z-statistic": round(z_stat, 3), "p_value": round(p_val, 3), "Significance": result})
    df1 = pd.DataFrame(data_list)
df1     

Unnamed: 0,Metric,z-statistic,p_value,Significance
0,Click-through Rate,-0.086,0.932,Not significant
1,Retention,-2.651,0.008,Significant
2,Gross conversion,4.702,0.0,Significant
3,Net conversion,1.419,0.156,Not significant


The results from the Table confirms that there is no significant difference in the click-through rates between those who saw the Free Trial Screener and those who did not. In addition, there is no significant difference in the net conversion. However, there is a significant difference in the gross conversion and retention rates between those who saw the Free Trial Screener and those who did not.

## **Conducting T-Test**

In [9]:
Metric = {"Click-through Rate": [data.ctr_control, data.ctr_experiment], 
          "Retention": [data2.retention_control, data2.retention_experiment], 
          "Gross conversion": [data2.GrossConversion_control, data2.GrossConversion_experiment],
          "Net conversion": [data2.NetConversion_control, data2.NetConversion_experiment]}

data_list= []
for key, value in Metric.items():
    t_stat, p_val = stats.ttest_ind(Metric[key][0], Metric[key][1])
    if p_val > 0.05:
        result = "Not significant"
    else:
        result = "Significant"
    data_list.append({"Metric": key, "t-statistic": round(t_stat, 3), "p_value": round(p_val, 3), "Significance": result})
    df2 = pd.DataFrame(data_list)
df2     

Unnamed: 0,Metric,t-statistic,p_value,Significance
0,Click-through Rate,-0.083,0.934,Not significant
1,Retention,-1.008,0.319,Not significant
2,Gross conversion,1.54,0.131,Not significant
3,Net conversion,0.539,0.593,Not significant


There is no significant difference observed in any of the metrics.

## **Observations**

While the z-test recorded statistical significance for gross conversion and retention rates, these metrics were not significant under t-test.
The difference in the results of these two methods could be due to the fact that when working with large sample sizes, the normal distribution curve does a good job of approximating the distribution of a sample statistic but becomes inefficient in approximating the distribution of the same statistic when these are computed from small sample sizes (n < 30). The sample size for computing retention and gross conversion metrics was 23.

## **Significance of metrics under different confidence levels (using t-test results)**

In [10]:
alpha = [0.4, 0.1, 0.05, 0.01]

p_value = {"Net conversion": 0.593, "Gross conversion": 0.131, "Retention": 0.319}

data= []
for i in alpha:
    for key, value in p_value.items():
        if value > i:
            result= "Not significant"
        else:
            result= "Significant"
        data.append({"Metric": key, "alpha": i, "p_value": value, "Statistical Significance": result})
        df= pd.DataFrame(data)
df     

Unnamed: 0,Metric,alpha,p_value,Statistical Significance
0,Net conversion,0.4,0.593,Not significant
1,Gross conversion,0.4,0.131,Significant
2,Retention,0.4,0.319,Significant
3,Net conversion,0.1,0.593,Not significant
4,Gross conversion,0.1,0.131,Not significant
5,Retention,0.1,0.319,Not significant
6,Net conversion,0.05,0.593,Not significant
7,Gross conversion,0.05,0.131,Not significant
8,Retention,0.05,0.319,Not significant
9,Net conversion,0.01,0.593,Not significant


Conclusions about the significance of net conversion do not differ under the different confidence levels because the p-value, 0.593, is very large compared to the different levels of significance. This means that there is a 59.3% probability of any difference observed being completely random and not due to the change in the experiment.

Whereas, at 60% confidence level, retention rate and gross conversion are statistically significant. However, it is important to note that setting the alpha level at 40% means there is large chance that we might incorrectly reject the null hypothesis, meaning that we conclude there is a change in these metrics when there is really not one, which is a waste of resources for Udacity! 

## **Conclusion**

Based on the findings, there is no sufficient evidence to conclude that the Free Trial Screener will increase the likelihood of students to continue past the free trial and eventually make payments or complete the course.

## **Recommendation**

Do not launch the change on the website.