# A/B Testing the Udacity Website

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf

## Exercise 1
### Import data

In [2]:
control_df = pd.read_csv('https://media.githubusercontent.com/media'+
                         '/nickeubank/MIDS_Data/master/udacity_AB_testing' + 
                         '/control_data.csv')
experiment_df = pd.read_csv('https://media.githubusercontent.com/media'+
                            '/nickeubank/MIDS_Data/master/udacity_AB_testing'+
                            '/experiment_data.csv')

## Exercise 2
### Explore the data
* The situation of the course overview page on a single day is represented by each row, which is the unit of observation of the data.

In [3]:
control_df.sample(5)

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
31,"Tue, Nov 11",9880,830,,
7,"Sat, Oct 18",7434,632,110.0,70.0
17,"Tue, Oct 28",9363,736,154.0,91.0
1,"Sun, Oct 12",9102,779,147.0,70.0
27,"Fri, Nov 7",9424,781,,


In [4]:
experiment_df.sample(5)

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
19,"Thu, Oct 30",9308,728,207.0,67.0
30,"Mon, Nov 10",10445,851,,
1,"Sun, Oct 12",9288,785,116.0,91.0
5,"Thu, Oct 16",9500,788,129.0,61.0
20,"Fri, Oct 31",8715,722,182.0,123.0


## Exercise 3
### Stack data into a single dataset

In [5]:
# add treatment column to each dataframe
control_df['treatment'] = 'control'
experiment_df['treatment'] = 'experiment'

# combine the two dataframes
df_whole = pd.concat([control_df, experiment_df])
df_whole.sample(10)

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,treatment
5,"Thu, Oct 16",9670,823,138.0,82.0,control
28,"Sat, Nov 8",8969,760,,,experiment
27,"Fri, Nov 7",9272,767,,,experiment
18,"Wed, Oct 29",9262,727,201.0,96.0,experiment
4,"Wed, Oct 15",9793,832,140.0,94.0,experiment
35,"Sat, Nov 15",8630,743,,,control
9,"Mon, Oct 20",10496,860,153.0,98.0,experiment
14,"Sat, Oct 25",8687,691,176.0,128.0,control
3,"Tue, Oct 14",9867,827,138.0,92.0,experiment
2,"Mon, Oct 13",10511,909,167.0,95.0,control


## Exercise 4
* Given Udacity's goals, they are hoping:

* Potential Outcomes Framework/Overall Evaluation Criterion (OEC): **(Enrollments(Number of people enrolling in trial) - Payments(Number of people who eventually pay for the service)) / Click（number of those users clicking “Start Free Trial”）**


## Exercise 5
* They are hoping that the number of students to continue past the free trial and eventually complete the course will not be impacted by the manipulation, that is the **Payments: Number of people who eventually pay for the service.**

## Exercise 6
### Test the quality of the randomization

In [6]:
# check nan in each column
df_whole.isnull().sum()

Date            0
Pageviews       0
Clicks          0
Enrollments    28
Payments       28
treatment       0
dtype: int64

In [7]:
# calculate the average number of pageviews 
# for the treated group and for the control group
avg_pv_control = df_whole.loc[df_whole[
    'treatment']== 'control','Pageviews'].mean()
avg_pv_experiment = df_whole.loc[df_whole[
    'treatment']== 'experiment','Pageviews'].mean()
print('The avergae number of pageviews for the control group is: ',
       round(avg_pv_control,2))
print('The avergae number of pageviews for the experiment group is: ',
       round(avg_pv_experiment,2))
print('The difference in the average number of pageviews', 
      'between the two groups is: ', 
      round(avg_pv_experiment - avg_pv_control,2))


The avergae number of pageviews for the control group is:  9339.0
The avergae number of pageviews for the experiment group is:  9315.14
The difference in the average number of pageviews between the two groups is:  -23.86


* **The average number of pageviews for the treated group and for the control group looks similar** with 23.86 difference in the average number of pageviews between the two groups.


## Exercise 7
### Use a ttest to test the statistical significance of the differences

In [8]:
#t test
t,p = stats.ttest_ind(control_df['Pageviews'], experiment_df['Pageviews'])
#t1,p1 = 
# stats.ttest_ind(control_df.Pageviews.values, experiment_df.Pageviews.values)
print('The p-value is: ', round(p,2), 'and the t-statistic is: ', round(t,2))
#print('The p-value is: ', p1, 'and the t-statistic is: ', t1)

The p-value is:  0.89 and the t-statistic is:  0.14


* With a p-value equals to 0.89 (> 0.05), the difference between the average number of pageviews for the treated group and for the control group is **not statistically significant**.

## Exercise 8
* The **Clicks(number of those users clicking “Start Free Trial”)** is also pre-treatment since people have to click the 'Start Free Trial' before they get a chance to see the treatment.

## Exercise 9
### Check if 'Clicks' is balanced

In [9]:
avg_click_control = df_whole.loc[df_whole[
    'treatment']== 'control','Clicks'].mean()
avg_click_experiment = df_whole.loc[df_whole[
    'treatment']== 'experiment','Clicks'].mean()
print('The avergae number of clicks for the control group is: ',
       round(avg_click_control,2))
print('The avergae number of clicks for the experiment group is: ',
       round(avg_click_experiment,2))
print('The difference in the average number of clicks', 
      'between the two groups is: ', 
      round(avg_click_experiment - avg_click_control,2))

The avergae number of clicks for the control group is:  766.97
The avergae number of clicks for the experiment group is:  765.54
The difference in the average number of clicks between the two groups is:  -1.43


In [10]:
t1,p1 = stats.ttest_ind(control_df.Clicks.values, experiment_df.Clicks.values)
print('The p-value is: ', 
      round(p1,2), 'and the t-statistic is: ', round(t1,2))

The p-value is:  0.93 and the t-statistic is:  0.09


* **The average number of users clicking “Start Free Trials"(Clicks) for the treated group and for the control group looks similar** with 1.43 difference in the average number of Clicks between the two groups.

* With a p-value equals to 0.93 (> 0.05), the difference between the average number of users clicking “Start Free Trials"(Clicks) for the treated group and for the control group is **not statistically significant**.

## Exercise 10
### Evaluate the effects of the experiment

In [11]:
# check whether the Payments and Enrollment-payments 
# per click is different between the two groups
# drop nan in Payments and Enrollment
df_new = df_whole.dropna().copy()
# add a new column'Enrollment-payments'
df_new['enroll_no_pay_per_click'] = (
    df_new['Enrollments'] - df_new['Payments'])/df_new['Clicks']
df_new.shape, df_whole.shape, df_new.isnull().sum()


((46, 7),
 (74, 6),
 Date                       0
 Pageviews                  0
 Clicks                     0
 Enrollments                0
 Payments                   0
 treatment                  0
 enroll_no_pay_per_click    0
 dtype: int64)

In [12]:
# check whether the Payments is different between the two groups
avg_pay_control = df_new.loc[df_new[
    'treatment']== 'control','Payments'].mean()
avg_pay_experiment = df_new.loc[df_new[
    'treatment']== 'experiment','Payments'].mean()
print('The avergae number of payments for the control group is: ',
       round(avg_pay_control,2))
print('The avergae number of payments for the experiment group is: ',
       round(avg_pay_experiment,2))
print('The difference in the average number of payments', 
      'between the two groups is: ', 
      round(avg_pay_experiment - avg_pay_control,2))

The avergae number of payments for the control group is:  88.39
The avergae number of payments for the experiment group is:  84.57
The difference in the average number of payments between the two groups is:  -3.83


In [13]:
t2,p2 = stats.ttest_ind(df_new.loc[df_new[
    'treatment']== 'control','Payments'].values,
                         df_new.loc[df_new[
    'treatment']== 'experiment','Payments'].values)
print('The p-value is: ', round(p2,2),
       'and the t-statistic is: ', round(t2,2))

The p-value is:  0.56 and the t-statistic is:  0.59


In [14]:
# check whether the Enrollment is different between the two groups
avg_enroll_pay_per_click_control = df_new.loc[
    df_new['treatment']== 'control','enroll_no_pay_per_click'].mean()
avg_enroll_pay_per_click_experiment = df_new.loc[
    df_new['treatment']== 'experiment','enroll_no_pay_per_click'].mean()
print('The avergae number of enrollments-payments per click', 
      'for the control group is: ', round(avg_enroll_pay_per_click_control,2))
print('The avergae number of enrollments-payments per click', 
      'for the experiment group is: ', 
      round(avg_enroll_pay_per_click_experiment,2))
print('The difference in the average number of enrollments', 
      'but no payment students per click between the two groups is: ',
round(
    avg_enroll_pay_per_click_experiment - avg_enroll_pay_per_click_control,4))

The avergae number of enrollments-payments per click for the control group is:  0.1
The avergae number of enrollments-payments per click for the experiment group is:  0.09
The difference in the average number of enrollments but no payment students per click between the two groups is:  -0.0159


In [15]:
t3,p3 = stats.ttest_ind(df_new.loc[df_new[
    'treatment']== 'control','enroll_no_pay_per_click'].values,
                            df_new.loc[df_new[
    'treatment']== 'experiment','enroll_no_pay_per_click'].values)
print('The p-value is: ', round(p3,3), 'and the t-statistic is: ',
       round(t3,3))

The p-value is:  0.132 and the t-statistic is:  1.535


* Given the results above, we can tell that the metric we don’t want affected(Payments) is not affected since the average payments for the treated group and for the control group looks similar with 3.83 difference in the average payments between the two groups. With a p-value equals to 0.56 (> 0.05), the difference between the average payments for the treated group and for the control group is **not statistically significant**.

* For enrollments-payments/click(the difference between the number of people enrolling in trial and the number of people who eventually pay for the service per click), which is the OEC that we want to test is affected since there are 11.91 differences in the average number of enrollments-payments between the two groups **(the experiment group has 0.02 fewer avergae number of enrollments-payments/click compared to the control group)**. However, with a p-value equals 0.132, this difference is not statistically significant.

* In a word, Udacity didn't achieve their goals.

## Exercise 11
### Re-estimating the effect of treatment on the OEC using a linear regression

In [16]:
# bivariate regression on enrollment and treatment
bi_model = smf.ols(formula =
                    'enroll_no_pay_per_click ~ treatment',
                      data = df_new).fit()
bi_model.summary()

0,1,2,3
Dep. Variable:,enroll_no_pay_per_click,R-squared:,0.051
Model:,OLS,Adj. R-squared:,0.029
Method:,Least Squares,F-statistic:,2.356
Date:,"Mon, 27 Mar 2023",Prob (F-statistic):,0.132
Time:,22:41:49,Log-Likelihood:,89.832
No. Observations:,46,AIC:,-175.7
Df Residuals:,44,BIC:,-172.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.1021,0.007,13.948,0.000,0.087,0.117
treatment[T.experiment],-0.0159,0.010,-1.535,0.132,-0.037,0.005

0,1,2,3
Omnibus:,14.16,Durbin-Watson:,1.908
Prob(Omnibus):,0.001,Jarque-Bera (JB):,15.205
Skew:,1.227,Prob(JB):,0.000499
Kurtosis:,4.383,Cond. No.,2.62


* The result of the bivariate regression model is exactly the same as the t-test in Exercise 10 with the coefficient of `treatment` being -0.0159, and a p-value equals 0.132.

## Exercise 12
### Add indicator variables for the day of each observation

In [17]:
# add date as indicator
model_new = smf.ols(formula = 'enroll_no_pay_per_click ~ treatment + C(Date)',
                     data = df_new).fit()
model_new.summary()

0,1,2,3
Dep. Variable:,enroll_no_pay_per_click,R-squared:,0.806
Model:,OLS,Adj. R-squared:,0.602
Method:,Least Squares,F-statistic:,3.962
Date:,"Mon, 27 Mar 2023",Prob (F-statistic):,0.000978
Time:,22:41:49,Log-Likelihood:,126.29
No. Observations:,46,AIC:,-204.6
Df Residuals:,22,BIC:,-160.7
Df Model:,23,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.1079,0.016,6.651,0.000,0.074,0.142
treatment[T.experiment],-0.0159,0.007,-2.398,0.025,-0.030,-0.002
"C(Date)[T.Fri, Oct 24]",0.0445,0.022,1.983,0.060,-0.002,0.091
"C(Date)[T.Fri, Oct 31]",-0.0074,0.022,-0.331,0.744,-0.054,0.039
"C(Date)[T.Mon, Oct 13]",-0.0231,0.022,-1.026,0.316,-0.070,0.024
"C(Date)[T.Mon, Oct 20]",-0.0285,0.022,-1.270,0.217,-0.075,0.018
"C(Date)[T.Mon, Oct 27]",0.0328,0.022,1.458,0.159,-0.014,0.079
"C(Date)[T.Sat, Nov 1]",-0.0235,0.022,-1.047,0.306,-0.070,0.023
"C(Date)[T.Sat, Oct 11]",-0.0017,0.022,-0.074,0.941,-0.048,0.045

0,1,2,3
Omnibus:,3.871,Durbin-Watson:,1.863
Prob(Omnibus):,0.144,Jarque-Bera (JB):,3.826
Skew:,-0.0,Prob(JB):,0.148
Kurtosis:,4.413,Cond. No.,27.3


In [18]:
# compute the change in std of enrollment-payments
print('the change in std of enrollment-payments per click is: ',
       round((0.0100-0.007)/0.0100*100,2), '%')

the change in std of enrollment-payments per click is:  30.0 %


* After adding indicator variables for the day of each observation, the standard errors on the `treatment` variable decreases by 30.0%. 

## Exercise 13

* Given the result above, the differences in the average number of enrollments between the two groups is statistically significant with a p-value equals to 0.025(<0.05). 

* Therefore, the trial implys that by asking how much time student had available to devote to the course, Udacity is able to reduce the number of frustrated students who left the free trial because they didn’t have enough time – without significantly reducing the number of students to continue past the free trial and eventually complete the course. That is, Udacity could improve the overall student experience and improve coaches’ capacity to support students who are likely to complete the course by launching their new version of the website.

## Exercise 14
### Add indicators for day of the week 

In [19]:
# add indicators for the day of the week
# extract the first 3 letters of the Date
df_new['Day'] = df_new['Date'].str[:3]
df_new.sample(5)


Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,treatment,enroll_no_pay_per_click,Day
1,"Sun, Oct 12",9102,779,147.0,70.0,control,0.098845,Sun
8,"Sun, Oct 19",8434,697,120.0,77.0,experiment,0.061693,Sun
1,"Sun, Oct 12",9288,785,116.0,91.0,experiment,0.031847,Sun
15,"Sun, Oct 26",8896,708,161.0,104.0,control,0.080508,Sun
18,"Wed, Oct 29",9262,727,201.0,96.0,experiment,0.144429,Wed


In [20]:
model_day = smf.ols(formula = 'enroll_no_pay_per_click ~ treatment + C(Day)',
                     data = df_new).fit()
model_day.summary()

0,1,2,3
Dep. Variable:,enroll_no_pay_per_click,R-squared:,0.215
Model:,OLS,Adj. R-squared:,0.07
Method:,Least Squares,F-statistic:,1.487
Date:,"Mon, 27 Mar 2023",Prob (F-statistic):,0.201
Time:,22:41:49,Log-Likelihood:,94.201
No. Observations:,46,AIC:,-172.4
Df Residuals:,38,BIC:,-157.8
Df Model:,7,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.1203,0.015,8.070,0.000,0.090,0.150
treatment[T.experiment],-0.0159,0.010,-1.569,0.125,-0.036,0.005
C(Day)[T.Mon],-0.0186,0.020,-0.940,0.353,-0.059,0.021
C(Day)[T.Sat],-0.0373,0.019,-2.013,0.051,-0.075,0.000
C(Day)[T.Sun],-0.0173,0.019,-0.934,0.356,-0.055,0.020
C(Day)[T.Thu],-0.0019,0.020,-0.095,0.925,-0.042,0.038
C(Day)[T.Tue],-0.0378,0.020,-1.905,0.064,-0.078,0.002
C(Day)[T.Wed],-0.0086,0.020,-0.432,0.668,-0.049,0.032

0,1,2,3
Omnibus:,12.955,Durbin-Watson:,1.745
Prob(Omnibus):,0.002,Jarque-Bera (JB):,13.612
Skew:,1.11,Prob(JB):,0.00111
Kurtosis:,4.474,Cond. No.,9.27
