In [1]:
import math as mt
import numpy as np
import pandas as pd
from scipy.stats import norm

### 1 Experiment Overview 
Experiment Name: "QuickView button" Screener.
This experiment is to maximize the user's access to the product details page and complete the purchase
https://www.wayfair.com/furniture/sb0/bedroom-sets-c46123.html

The general process is as follows：

1. **choose a metric**
2. **review statistics**
3. **design**
4. **analyse**


#### 1.1 Description of Experimented Change

* At the shop list view on **wayfair shopping site** , it provides two card display modes: one is when there is a quick view button when the mouse is hovered over, and one is not.
    <figure class="two">
        <img src="img/1.png" width="40%">
        <img src="img/2.png" width="40%">
    </figure>
    
* When users encounter a quick view button, they can click and view the details && specific sku information to  customize combinations, providing them with more information whether they enter the next detail page.
* Users without quick view button need to click directly to see detailed sku information
* <figure class="two">
        <img src="img/3.png" width="40%">
        <img src="img/4.png" width="40%">
    </figure>

#### 1.2 Experiment Hypothesis 
The hypothesis is that this can provide users with more information about deciding whether or not to make a final purchase, thereby reducing the number of situations where there is insufficient information and one more click to enter the details to opt out without buying. The quick view button can significantly increase the conversion rate of the product detail page, improve the user's buying law, and optimize the user experience.

#### 1.3 Experiment Details ¶
The unit of diversion is a cookie. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.


### 2 Metric Choice 
> We need two types of metrics for a successful experiment (or at least, a safe one); Invariate and evaluation metrics. Invariate metrics are used for "sanity checks", that is, to make sure our experiment (the way we presented a change to a part of the population, as well as the way we collected the data) is not inherently wrong. Basically, this means we pick metrics which we consider not to change (not to be affected) because of our experiment and later make sure these metrics don't change drastically between our control and experiment groups.
Evaluation metrics on the other hand, are the metrics in which we expect to see a change, and are relevant to the business goals we aim to achieve. For each metric we state a  **Dmin**  - which marks the minimum change which is practically significant to the business. For instance, stating that any increase in retention that is under 2%, even if statistically significant, is not practical to the business.  -- from the course [A/B_testing](https://en.wikipedia.org/wiki/A/B_testing)



#### 2.1 Invariate Metrics - Sanity Checks

| Metric Name  | Metric Formula  | $Dmin$  | Notation |
|:-:|:-:|:-:|:-:|
| Number of Cookies in goods detail Page  | # unique daily cookies on page | 3000 cookies  | $C_k$ |
| Number of Clicks on quick view Button  | # unique daily cookies who clicked  | 240 clicks | $C_l$ |
| Quick view button Click-Through-Probability  | $\frac{C_l}{C_k}$ | 0.01  | $CTP$ | 

#### 2.2 Evaluation Metrics - Performance Indicators
| Metric Name  | Metric Formula  | $Dmin$  | Notation |
|:-:|:-:|:-:|:-:|
| Gross Conversion   |  $\frac{enrolled}{C_l}$  | 0.01  | $Conversion_{Gross}$ |
| Retention   | $\frac{paid}{enrolled}$  | 0.01  | $Retention$ |
| Net Conversion  |  $\frac{paid}{C_l}$  | 0.0075 | $Conversion_{Net}$ |


### 3 Design
Referring to the basic data given by Udacity, when abtesting the wayfair company, we can arrange buried points for engineers to obtain the corresponding data according to the corresponding ideas. Now I temporarily construct some virtual data.


| Item | Description  | Estimator  |
|:-:|:-:|:-:|
| Number of cookies | Daily unique cookies to view course overview page  | 100,000  |
| Number of clicks | Daily unique cookies to click Quick view button  | 5,000 |
| Number of enrollments | Quick view enrollments per day  | 800  |
| CTP | CTP on Quick view button  | 0.05  |
| Gross Conversion | Probability of enrolling, given a click  | 0.20625  |
| Retention | Probability of payment, given enrollment  | 0.53  |
| Net Conversion | Probability of payment, given click  | 0.109313 |


In [2]:
#Let's place this estimators into a dictionary for ease of use later
base_line = {"Cookies":100000,"Clicks":5000,"Enrollments":800,"CTP":0.05,"GrossConversion":0.16,
           "Retention":0.53,"NetConversion":0.109313}

In [3]:
#Scale The counts estimates
#In this case, from 100000 unique cookies to visit the course overview page per day, to 5000.
base_line["Cookies"] = 10000
base_line["Clicks"]=base_line["Clicks"]*(10000/100000)
base_line["Enrollments"]=base_line["Enrollments"]*(10000/100000)
base_line

{'Cookies': 10000,
 'Clicks': 500.0,
 'Enrollments': 80.0,
 'CTP': 0.05,
 'GrossConversion': 0.16,
 'Retention': 0.53,
 'NetConversion': 0.109313}

These three indicators refer to the three evaluation indicators in the Udacity case and are also applicable in our wayfair case.
* **Gross Conversion** - The baseline probability for Gross Conversion can be calculated by the number of users to enroll  by the number of cookies clicking the qucik view. In other words, the probability of enrollment given a click. In this case, the unit of diversion (Cookies), that is the element by which we differentiate samples and assign them to control and experiment groups, is equall to the unit of analysis (cookies who click), that is the denominator of the formula to calculate Gross Conversion (GC). When this is the case, this analytic estimate of variance is sufficient.
* **Retention** - The baseline probability for retention is the number of paying users (pay by clicking the quick view btn) divided by the number of total enrolled users. In other words, the probability of payment, given enrollment. The sample size is the number of enrolled users. In this case, unit of diversion is not equal to unit of analysis (users who enrolled) so an analytical estimation is not enough - if we had the data for these estimates, we would want to estimate this variance empirically as well.
* **Net Conversion** - The baseline probability for the net conversion is the number of paying users divided by the number of cookies that clicked the quick view button. In other words, the probability of payment, given a click. The sample size is the number of cookies that clicked. In this case, the unit of analysis and diversion are equal so we expect a good enough estimation analytically.

In [4]:
# Let's get the p and n we need for Gross Conversion (GC)
# and compute the Stansard Deviation(sd) rounded to 4 decimal digits.
GC_INFO={}
GC_INFO["d_min"]=0.01
GC_INFO["p"]=base_line["GrossConversion"]
#p is given in this case - or we could calculate it from enrollments/clicks
GC_INFO["n"]=base_line["Clicks"]
GC_INFO["std"]=round(mt.sqrt((GC_INFO["p"]*(1-GC_INFO["p"]))/GC_INFO["n"]),4)
print("GC_INFO['std'] = {}".format(GC_INFO["std"]))

# Let's get the p and n we need for Retention(R)
# and compute the Stansard Deviation(sd) rounded to 4 decimal digits.
R_INFO={}
R_INFO["d_min"]=0.01
R_INFO["p"]=base_line["Retention"]
R_INFO["n"]=base_line["Enrollments"]
R_INFO["std"]=round(mt.sqrt((R_INFO["p"]*(1-R_INFO["p"]))/R_INFO["n"]),4)
print("R_INFO['std'] = {}".format(R_INFO["std"]))

# Let's get the p and n we need for Net Conversion (NC)
# and compute the Standard Deviation (sd) rounded to 4 decimal digits.
NC_INFO={}
NC_INFO["d_min"]=0.0075
NC_INFO["p"]=base_line["NetConversion"]
NC_INFO["n"]=base_line["Clicks"]
NC_INFO["std"]=round(mt.sqrt((NC_INFO["p"]*(1-NC_INFO["p"]))/NC_INFO["n"]),4)
print("NC_INFO['std'] = {}".format(NC_INFO["std"]))

GC_INFO['std'] = 0.0164
R_INFO['std'] = 0.0558
NC_INFO['std'] = 0.014


Given $\alpha=0.05$ (significance level ) and $\beta=0.2$ (power), we want to estimate how many total pageviews (cookies who viewed the course overview page) we need in the experiment. This amount will be divided into tthe two groups: control and experiment. This calculation can be done using an [online calculator](http://www.evanmiller.org/ab-testing/sample-size.html) or by calculating directly using the required formula.
<figure class="two">
        <img src="img/5.png" width="40%">
        <img src="img/6.png" width="40%">
</figure>

In [5]:

def get_z_score(alpha):
    return norm.ppf(alpha)


def get_sds(p,d):
    sd1=mt.sqrt(2*p*(1-p))
    sd2=mt.sqrt(p*(1-p)+(p+d)*(1-(p+d)))
    sds=[sd1,sd2]
    return sds

def get_sample_size(sds,alpha,beta,d):
    n=pow((get_z_score(1-alpha/2)*sds[0]+get_z_score(1-beta)*sds[1]),2)/pow(d,2)
    return n

In [6]:
GC_INFO["d"]=0.01
R_INFO["d"]=0.01
NC_INFO["d"]=0.0075

In [7]:
# Let's get an integer value for simplicity
GC_INFO["SampSize"]=round(get_sample_size(get_sds(GC_INFO["p"],GC_INFO["d"]),0.05,0.2,GC_INFO["d"]))
GC_INFO["SampSize"]

21255.0

In [8]:
GC_INFO["SampSize"]=round(GC_INFO["SampSize"]/0.08*2)
GC_INFO["SampSize"]

531375.0

In [9]:
# Getting a nice integer value
R_INFO["SampSize"]=round(get_sample_size(get_sds(R_INFO["p"],R_INFO["d"]),0.05,0.2,R_INFO["d"]))
R_INFO["SampSize"]

39087.0

In [10]:
R_INFO["SampSize"]=R_INFO["SampSize"]/0.08/0.16*2
R_INFO["SampSize"]

6107343.75

In [11]:
# Getting a nice integer value
NC_INFO["SampSize"]=round(get_sample_size(get_sds(NC_INFO["p"],NC_INFO["d"]),0.05,0.2,NC_INFO["d"]))
NC_INFO["SampSize"]

27413.0

In [12]:
NC_INFO["SampSize"]=NC_INFO["SampSize"]/0.08*2
NC_INFO["SampSize"]

685325.0

### Analyse

In [13]:
# we use pandas to load datasets
control=pd.read_csv("data/Control.csv")
experiment=pd.read_csv("data/Experiment.csv")
control.head(10)

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",10398,673,231.0,109.0
1,"Sun, Oct 12",11118,735,197.0,84.0
2,"Mon, Oct 13",10440,943,179.0,148.0
3,"Tue, Oct 14",10476,662,234.0,57.0
4,"Wed, Oct 15",10302,722,136.0,120.0
5,"Thu, Oct 16",11734,863,137.0,84.0
6,"Fri, Oct 17",9572,726,219.0,114.0
7,"Sat, Oct 18",11080,636,122.0,58.0
8,"Sun, Oct 19",10933,720,228.0,137.0
9,"Mon, Oct 20",10501,859,237.0,104.0


In [14]:
pageviews_cont=control['Pageviews'].sum()
pageviews_exp=experiment['Pageviews'].sum()
pageviews_total=pageviews_cont+pageviews_exp
print ("number of pageviews in control:", pageviews_cont)
print ("number of Pageviews in experiment:" ,pageviews_exp)

number of pageviews in control: 383184
number of Pageviewsin experiment: 386149


In [15]:
p=0.5
alpha=0.05
p_hat=round(pageviews_cont/(pageviews_total),4)
sd=mt.sqrt(p*(1-p)/(pageviews_total))
ME=round(get_z_score(1-(alpha/2))*sd,4)
print ("The confidence interval is between",p-ME,"and",p+ME,"; Is",p_hat,"inside this range?")

The confidence interval is between 0.4989 and 0.5011 ; Is 0.4981 inside this range?


In [16]:
clicks_cont=control['Clicks'].sum()
clicks_exp=experiment['Clicks'].sum()
clicks_total=clicks_cont+clicks_exp

p_hat=round(clicks_cont/clicks_total,4)
sd=mt.sqrt(p*(1-p)/clicks_total)
ME=round(get_z_score(1-(alpha/2))*sd,4)
print ("The confidence interval is between",p-ME,"and",p+ME,"; Is",p_hat,"inside this range?")

The confidence interval is between 0.4959 and 0.5041 ; Is 0.4978 inside this range?


In [17]:
ctp_cont=clicks_cont/pageviews_cont
ctp_exp=clicks_exp/pageviews_exp
d_hat=round(ctp_exp-ctp_cont,4)
p_pooled=clicks_total/pageviews_total
sd_pooled=mt.sqrt(p_pooled*(1-p_pooled)*(1/pageviews_cont+1/pageviews_exp))
ME=round(get_z_score(1-(alpha/2))*sd_pooled,4)
print ("The confidence interval is between",0-ME,"and",0+ME,"; Is",d_hat,"within this range?")

The confidence interval is between -0.0012 and 0.0012 ; Is 0.0001 within this range?


> Sanity Checks succeed !

### CI cases from courses on udacity
![](img/7.png)

In [18]:
# Count the total clicks from complete records only
clicks_cont=control["Clicks"].loc[control["Enrollments"].notnull()].sum()
clicks_exp=experiment["Clicks"].loc[experiment["Enrollments"].notnull()].sum()

In [19]:
#Gross Conversion - number of enrollments divided by number of clicks
enrollments_cont=control["Enrollments"].sum()
enrollments_exp=experiment["Enrollments"].sum()

GC_cont=enrollments_cont/clicks_cont
GC_exp=enrollments_exp/clicks_exp
GC_pooled=(enrollments_cont+enrollments_exp)/(clicks_cont+clicks_exp)
GC_sd_pooled=mt.sqrt(GC_pooled*(1-GC_pooled)*(1/clicks_cont+1/clicks_exp))
GC_ME=round(get_z_score(1-alpha/2)*GC_sd_pooled,4)
GC_diff=round(GC_exp-GC_cont,4)
print("The change due to the experiment is",GC_diff*100,"%")
print("Confidence Interval: [",GC_diff-GC_ME,",",GC_diff+GC_ME,"]")
print ("The change is statistically significant if the CI doesn't include 0. In that case, it is practically significant if",-GC_INFO["d_min"],"is not in the CI as well.")

The change due to the experiment is -1.39 %
Confidence Interval: [ -0.022699999999999998 , -0.005099999999999999 ]
The change is statistically significant if the CI doesn't include 0. In that case, it is practically significant if -0.01 is not in the CI as well.


In [20]:
#Net Conversion - number of payments divided by number of clicks
payments_cont=control["Payments"].sum()
payments_exp=experiment["Payments"].sum()

NC_cont=payments_cont/clicks_cont
NC_exp=payments_exp/clicks_exp
NC_pooled=(payments_cont+payments_exp)/(clicks_cont+clicks_exp)
NC_sd_pooled=mt.sqrt(NC_pooled*(1-NC_pooled)*(1/clicks_cont+1/clicks_exp))
NC_ME=round(get_z_score(1-alpha/2)*NC_sd_pooled,4)
NC_diff=round(NC_exp-NC_cont,4)
print("The change due to the experiment is",NC_diff*100,"%")
print("Confidence Interval: [",NC_diff-NC_ME,",",NC_diff+NC_ME,"]")
print ("The change is statistically significant if the CI doesn't include 0. In that case, it is practically significant if",NC_INFO["d_min"],"is not in the CI as well.")

The change due to the experiment is -1.53 %
Confidence Interval: [ -0.022199999999999998 , -0.0084 ]
The change is statistically significant if the CI doesn't include 0. In that case, it is practically significant if 0.0075 is not in the CI as well.


According to this result there was a change due to the experiment, that change was both statistically and practically significant. 


* **Gross Conversion**
A metric is statistically significant if the confidence interval does not include 0 (that is, you can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, you can be confident there is a change that matters to the business.)

We have a **negative** change of **1.39%**, when we were willing to accept any change greater than 1%. This means the Gross Conversion rate of the experiment group (the one exposed to the change, i.e. asked how many hours they can devote to studying) has decreased as expected by 2% and this change was significant. This means  less people enrolled in the quick view button.
* **Net Conversion** 
The hypothesis is the same as before just with net conversion instead of gross. At this point we expect the fraction of payers (out of the clicks) to decrease as well.
In this case we got a change size of more than a 0.5%, a bit lot decrease which is  statistically significant, and as such  practically significant.


### Summary && Recommendation

An experiment was conducted in which potential Udacity students were diverted by cookie into two groups, experiment and control. The experiment group was shown more sku infomaition, after clicking a "**quick view button**" on shop list page view, whereas the control group was not. Three invariant metrics (Number of Cookies, Number of clicks on "quick view button", and Click-Through-Probability) were chosen for purposes of validation and sanity checking while Gross Conversion (enrollment/cookie) and Net Conversion (payments/cookie) served as evaluation metrics. 
The null hypothesis is that there is no difference in the evaluation metrics between the groups, futhermore, a practical signifcance threshold was set for each metric. The requirement for launching the experiment is that the null hypothesis must be rejected for ALL evaluation metrics and that the difference between branches must meet or exceed the practical signficance threshold. In our case in which ALL metrics must be relevant to launch, the risk of type II errors (false negatives) increases as the number of metrics increases, so it stands to reason that controlling for false positives is not consistent with our acceptance criteria.

**Analysis revealed the expected equal distribution of cookies into the control and experimental groups, for the invariant metrics, at the 95% CI. A difference in gross conversion was found to be statistically signficant at the 95% CI, and the null hypothesis was rejected. Gross conversion also met the practical signficance threshold. Net Conversion was found to be either statistically or practically signficant at the 95% CI.**


This experiment was designed to determine whether show more info at the shop list page  would improve the overall user experience and promote completion of consumption. A statistically and practically signficant decrease in Gross Conversion was observed but with negative significant differences in Net Conversion. This translates to a decrease in enrollment and payment. **Considering this, my recomendation is not to launch, but rather to pursue other experiments.**