### 1. Settings and imports

In [284]:
#Import
import pandas as pd
import numpy as np
#%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')
import seaborn as sns
sns.set(style="ticks")


In [285]:
basline = pd.read_csv("data/baseline.csv", index_col=False,header = None, names = ['metric','baseline_val'])
basline

Unnamed: 0,metric,baseline_val
0,Unique cookies to view page per day:,40000.0
1,"Unique cookies to click ""Start free trial"" per...",3200.0
2,Enrollments per day:,660.0
3,"Click-through-probability on ""Start free trial"":",0.08
4,"Probability of enrolling, given click:",0.20625
5,"Probability of payment, given enroll:",0.53
6,"Probability of payment, given click",0.109313


## 2. Metric Choice

### Answer student metric choice
In order to answer this question let us remind ourselves of the definition:
Invariant metric is a metric that should not be affected by the changes in the experiment
Evaluation metrics are metrics used to measure the impact or the changes made in the experiment

### Invariant metrics:
- Number of cookies: That is, number of unique cookies to view the course overview page.
As this is the  unit of diversion, we can expect an even distribution amongst the control and experiment groups. Hence, it is an appropriate invariant metric.

- Number of clicks: That is, number of unique cookies to click the "Start free trial" button.
As this happens before the free trial screener is triggered, we can again expect an even distribution amongst the control and experiment groups or in other words: at this stage of the funnel the experience is the same for all 

- Click-through-probability: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page.
Same reasoning as for cookies and clicks: we can again expect an even distribution amongst the control and experiment groups or in other words: at this stage of the funnel the experience is the same for all 

### Evaluation metrics
- Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button.
Clearly these are metrics measuring the impact of the experiment or in other words the success (checkout, enrollment) or failure of the experiment. Hence, it an evaluation metrics 

- Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button.
Clearly these are metrics measuring the impact of the experiment or in other words the success (remaining enrollment and hence pay) or failure (not enrolled more than 14d) of the experiment. Hence, it an evaluation metrics 

### Comment on remaining metrics 
In the frame of the Final Project quiz two metrics seem unused so we disregard them for now:
Number of user-ids: That is, number of users who enroll in the free trial.
Retention: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout.
Could probably also be used as an evaluation metric 


## 3. Measuring Standard Deviation

**Comment User**

We are calculating the analytical standard deviation for our two evaluation metrics: **Gross conversion and Net conversion**

We are using the following formula:

$Gross Conv STDV= \sqrt{Probability of enrolling* (1 - Probability of enrolling) / (Page views * Unique cookies to click "Start free trial" p.d.)/Unique cookies to view course overview page p.d. )}$ 

$Net Conv STDV= \sqrt{Probability of payment given click * (1 - Probability of payment given click) / (Page views * Unique cookies to click "Start free trial" p.d.)/Unique cookies to view course overview page p.d. )}$ 


Set up variables

In [286]:
unique_cookies_view = 40000
unique_cookies_click = 3200
enroll_pd=660
pageviews = 5000
prob_enroll=0.206250
prob_pay = 0.530000
prob_pay_click = 0.109313
ctr_start_free=0.08


In [287]:
GrossConvSTDV = np.sqrt((prob_enroll*(1-prob_enroll))/(pageviews*unique_cookies_click/unique_cookies_view))
GrossConvSTDV_rounded=round(GrossConvSTDV,4)
print("GrossConvSTDV_rounded: " + str(GrossConvSTDV_rounded))

GrossConvSTDV_rounded: 0.0202


In [288]:
NetConvSTDV = np.sqrt((prob_pay_click*(1-prob_pay_click))/(pageviews*unique_cookies_click/unique_cookies_view))
NetConvSTDV_rounded=round(NetConvSTDV,4)
print("NetConvSTDV_rounded: " + str(NetConvSTDV_rounded))

NetConvSTDV_rounded: 0.0156


**Comment User**

Additionally we calculate the standard dev of the retention

In [289]:
RetentionSTDV = np.sqrt((prob_pay*(1-prob_pay))/(pageviews*enroll_pd/unique_cookies_view))
RetentionSTDV_rounded=round(RetentionSTDV,4)
print("RetentionSTDV_rounded: " + str(RetentionSTDV_rounded))

RetentionSTDV_rounded: 0.0549


**Further User comments**

We are also checking the empirical variability of the gross and net conversion to check against. To do so, we need to import control and experiment data.

After this step, we calculate the gross and net converstion per day and determine the standard deviation of this result for all 23 data points. This is to say: we calculate the average of the std dev per day, calculate the delta between the std dev per day and the average, which we take times and sum up for all days. After this, we divide by the number of samples (23) and take the square root.


In [290]:
control  = pd.read_csv("data/control.csv")
control["Grossperday"]= control["Enrollments"]/control["Clicks"]
control["Netperday"]= control["Payments"]/control["Clicks"]

GrossConvSTDV_rounded_empirical=round(control["Grossperday"].std(),4)
NetConvSTDV_rounded_empirical=round(control["Netperday"].std(),4)
print("Control group: GrossConvSTDV_rounded_empirical: " + str(GrossConvSTDV_rounded_empirical))
print("Control group: NetConvSTDV_rounded_empirical: " + str(NetConvSTDV_rounded_empirical))

Control group: GrossConvSTDV_rounded_empirical: 0.044
Control group: NetConvSTDV_rounded_empirical: 0.0294


In [291]:
experiment  = pd.read_csv("data/experiment.csv")
experiment["Grossperday"]= experiment["Enrollments"]/experiment["Clicks"]
experiment["Netperday"]= experiment["Payments"]/experiment["Clicks"]

GrossConvSTDV_rounded_empirical_exp=round(experiment["Grossperday"].std(),4)
NetConvSTDV_rounded_empirical_exp=round(experiment["Netperday"].std(),4)
print("Experiment group: GrossConvSTDV_rounded_empirical_exp: " + str(GrossConvSTDV_rounded_empirical_exp))
print("Experiment group: NetConvSTDV_rounded_empirical_exp: " + str(NetConvSTDV_rounded_empirical_exp))

Experiment group: GrossConvSTDV_rounded_empirical_exp: 0.0475
Experiment group: NetConvSTDV_rounded_empirical_exp: 0.0322


## 4. Sizing

**4.1 Answer student: Bonferroni correction**


Usually the Bonferroni correction is used to adjust p-values in order to control the family-wise error rate (FWER) in multiple hypothesis testing or in other words the purpose is to reduce the probability of a type I error (false-positive) when making a larger number of hypothesis tests.

See: https://www.investopedia.com/terms/b/bonferroni-test.asp

In this case we are only a control and experimental group with 23 observations. Hence, the number of tests is fairly small and I will not use the Bonferroni correction.

**4.2 Calculation of the sizing algorithm:**
1. First, we will need the **analytical standard deviations** from part 3 for gross and net conversion --> given from 3
2. **Minimum Detectable Effect** that is given by dmin=0.01 and dmin= 0.0075
3. We also need the **baseline conversion rates** for both metrics --> given by Probability of enrolling, given click and Probability of payment, given click
4. **Sample Size**: using the inputs from 2 and 3 allows us to calculate the sample size using a calculator like https://www.evanmiller.org/ab-testing/sample-size.html
5. **Total sample size:** Sample size * number of groups (2: control and experiment)
6. **Number of page views:** = Total Sample Size / / Click-through-probability on "Start free trial" for each metric
7. **Maximum number of page view**: take the higher number in 6.




**Calculation**

1. **Analytical standard deviations**

In [292]:
print("GrossConvSTDV_rounded: " + str(GrossConvSTDV_rounded))
print("NetConvSTDV_rounded: " + str(NetConvSTDV_rounded))


GrossConvSTDV_rounded: 0.0202
NetConvSTDV_rounded: 0.0156


2. **Minimum Detectable Effect** 

In [293]:
min_det_eff_gross = 0.01
min_det_eff_net = 0.0075
print("min_det_eff_gross: " + str(min_det_eff_gross))
print("min_det_eff_net: " + str(min_det_eff_net))


min_det_eff_gross: 0.01
min_det_eff_net: 0.0075


3. **Baseline conversion rates**

In [294]:
baseline_gross = prob_enroll
baseline_net = prob_pay_click 
print("baseline_gross: " + str(baseline_gross))
print("baseline_net: " + str(baseline_net))


baseline_gross: 0.20625
baseline_net: 0.109313


4. **Sample Size**

Remark: we are using the min. detectable effect as well as the baseline convesion rates to calculate the sample size using this sample size genernator:
https://www.evanmiller.org/ab-testing/sample-size.html

In [295]:
sample_size_gross = 25835
sample_size_net = 27413

print("sample_size_gross: " + str(sample_size_gross))
print("sample_size_net: " + str(sample_size_net))



sample_size_gross: 25835
sample_size_net: 27413


**5. Total sample size:**

Sample size * number of groups (2: control and experiment)

In [296]:
total_sample_size_gross = sample_size_gross * 2
total_sample_size_net = sample_size_net * 2

print("total_sample_size_gross: " + str(total_sample_size_gross))
print("total_sample_size_net: " + str(total_sample_size_net))


total_sample_size_gross: 51670
total_sample_size_net: 54826


6. **Number of page views:**

Total Sample Size / / Click-through-probability on "Start free trial" for each metric

In [297]:
number_pageviews_gross = total_sample_size_gross / ctr_start_free
number_pageviews_net = total_sample_size_net / ctr_start_free

print("number_pageviews_gross: " + str(number_pageviews_gross))
print("number_pageviews_net: " + str(number_pageviews_net))


number_pageviews_gross: 645875.0
number_pageviews_net: 685325.0


7. **Maximum number of page view**: 
    
Max from 6

In [298]:
max_number_pageviews =max(number_pageviews_gross,number_pageviews_net)
print("max_number_pageviews: " + str(max_number_pageviews))

max_number_pageviews: 685325.0


## 5. Duration vs. Exposure

**5.1 .Comment Student**

After having calculated the max number of pages views, we are checking the required traffic needed with 100% of the traffic and will then reflect on the risk and feasability of the result

**5.2 Calculation of required traffic**

8. **Traffic required with 100%:** Maximum number of page view (from 7) / Unique cookies to view course overview page per day



In [299]:
import math

traffic100 = max_number_pageviews / unique_cookies_view
traffic100_rounded = math.ceil(traffic100)
print("Using 100% of the traffic it would take this many days to run the experiment: " + str(traffic100_rounded))

Using 100% of the traffic it would take this many days to run the experiment: 18


**5.3 Reasoning on traffic selection**

Using 100% of the traffic would be quite risky for different reasons:

- Other experiments: Udacity might have other experiments to do
- Risk: using 100% of the traffic would mean that we are using the new version for the complete traffic without having tested for bugs, problems on a smaller scope
- Technical problems: we might still have bugs or smaller issues on the new page and hence running it on 100% of the traffic seems very risky

**Bottom line:**
It seems quite risky to use the full traffic. Hence, we will use **50% of the traffic**


**5.4 Calculation of required traffic for 50%**


In [300]:
traffic50 = max_number_pageviews / (unique_cookies_view *0.5)
traffic50_rounded = math.ceil(traffic50)
print("Using 100% of the traffic it would take this many days to run the experiment: " + str(traffic50_rounded))

Using 100% of the traffic it would take this many days to run the experiment: 35


## 6. Sanity Checks

**6.1 Purpose**:

In this section we will perform some sanity checks. The general purpose of this is to ensure that our experiment has been set up correctly and  the data is reliable. 

In other words: we can check for issues or even biases this way and assure that data between control and experiment group are statistically similar and comparable before proceeding with the analysis.

We will carry out these checks ofr our invariant metrics, as defined:
- Number of cookies: That is, number of unique cookies to view the course overview page.
- Number of clicks: That is, number of unique cookies to click the "Start free trial" button.
- Click-through-probability: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page.

**6.2 Calculation of sanity checks**
1. Probability p: $ = 0.5$ 
2. P hat: $\hat{p}$ = $N_{Control} / N_{Total}$ 
3. Standard Error: $SE= \sqrt{p*(1-p)/ N_{Total}}$ 
4. Margin Error for 95% confidence interval: $ME= 1,96 * SE, for \alpha = 0,05$ 
5. Upper and Lower bound of CI: $CI_{upper}$ = $ p + ME$, $CI_{lower}$ = $ p - ME$
6. Check, if sanity check passed: compare, if CI includes zero or not 

**6.3 Sanity check for number of cookies**


**1. Probability p** 


In [301]:
p = 0.5
print("Probability p: " + str(p))


Probability p: 0.5


**2. P hat:**

In [302]:
n_control=control["Pageviews"].sum()
n_exp=experiment["Pageviews"].sum()
n_total = n_control + n_exp
p_hat = n_control  / n_total
p_hat_rounded=round(p_hat,4)


print("n_control: " + str(n_control))
print("n_exp: " + str(n_exp))
print("n_total: " + str(n_total))
print("p_hat_rounded (observed value): " + str(p_hat_rounded))


n_control: 345543
n_exp: 344660
n_total: 690203
p_hat_rounded (observed value): 0.5006


**3. Standard Error:**

In [303]:
SE = np.sqrt(p*(1-p)/n_total)
SE_rounded=round(SE,4)
print("SE_rounded: " + str(SE_rounded))

SE_rounded: 0.0006


**4. Margin Error for 95% confidence interval**

In [304]:
ME = (1.96 * SE)
ME_rounded=round(ME,4)
print("ME: " + str(ME_rounded))

ME: 0.0012


**5. Upper and Lower bound of CI**

In [305]:
CI_upper_cookies= p + ME
CI_upper_cookies_rounded=round(CI_upper,4)

CI_lower_cookies= p - ME
CI_lower_cookies_rounded=round(CI_lower,4)

print("CI_lower_cookies_rounded: " + str(CI_lower_cookies_rounded))
print("CI_upper_cookies_rounded: " + str(CI_upper_cookies_rounded))


CI_lower_cookies_rounded: 0.4988
CI_upper_cookies_rounded: 0.5012


**6. Check, if sanity check passed**

 **--> PASSED, CI does not include zero ✔️**
 

**6.4 Sanity check for number of clicks**

**1. Probability p** 


In [306]:
p_clicks = 0.5
print("Probability p for Clicks: " + str(p_clicks))


Probability p for Clicks: 0.5


**2. P hat:**

In [331]:
n_control_clicks=control["Clicks"].sum()
n_exp_clicks=experiment["Clicks"].sum()
n_total_clicks = n_control_clicks + n_exp_clicks
p_hat_clicks = n_control_clicks  / n_total_clicks
p_hat_clicks_rounded=round(p_hat_clicks,4)


print("n_control_clicks: " + str(n_control_clicks))
print("n_exp_clicks: " + str(n_exp_clicks))
print("n_total_clicks: " + str(n_total_clicks))
print("p_hat_clicks_rounded (observed value): " + str(p_hat_clicks_rounded))


n_control_clicks: 28378
n_exp_clicks: 28325
n_total_clicks: 56703
p_hat_clicks_rounded (observed value): 0.5005


**3. Standard Error:**

In [308]:
SE_clicks = np.sqrt(p*(1-p)/n_total_clicks)
SE_clicks_rounded=round(SE_clicks,4)
print("SE_clicks_rounded: " + str(SE_clicks_rounded))

SE_clicks_rounded: 0.0021


**4. Margin Error for 95% confidence interval**

In [309]:
ME_clicks = (1.96 * SE_clicks_rounded)
ME_clicks_rounded=round(ME,4)
print("ME_clicks_rounded: " + str(ME_clicks_rounded))

ME_clicks_rounded: 0.0012


**5. Upper and Lower bound of CI**

In [310]:
CI_upper_clicks= p_clicks + ME_clicks
CI_upper_clicks_rounded=round(CI_upper_clicks,4)

CI_lower_clicks= p_clicks - ME_clicks
CI_lower_clicks_rounded=round(CI_lower_clicks,4)

print("CI_upper_clicks_rounded: " + str(CI_upper_clicks_rounded))
print("CI_lower_clicks_rounded: " + str(CI_lower_clicks_rounded))


CI_upper_clicks_rounded: 0.5041
CI_lower_clicks_rounded: 0.4959


**6. Check, if sanity check passed**

 **--> PASSED, CI does not include zero ✔️**
 

**6.5 Sanity check for number of click-through-probability (CTR)**

**1. P hat:**

In [360]:
p_ctr_exp_hat = experiment["Clicks"].sum() / experiment["Pageviews"].sum() 
p_ctr_exp_hat_rounded=round(p_ctr_exp_hat,4)

p_ctr_cont_hat = control["Clicks"].sum() / control["Pageviews"].sum() 
p_ctr_cont_hat_rounded=round(p_ctr_cont_hat,4)

print("Probability p hat for CTR, control: " + str(p_ctr_cont_hat_rounded))
print("Probability p hat for CTR, experiment: " + str(p_ctr_exp_hat_rounded))




Probability p hat for CTR, control: 0.0821
Probability p hat for CTR, experiment: 0.0822


**2. Standard Error:**

In [353]:
SE_ctr = np.sqrt(p_ctr_cont*(1-p_ctr_cont)/n_control)
SE_ctr_rounded=round(SE_ctr,4)
print("SE_ctr_rounded: " + str(SE_ctr_rounded))

SE_ctr_rounded: 0.0005


**3. Margin Error for 95% confidence interval**

In [357]:
ME_ctr = (1.96 * SE_ctr)
ME_ctr_rounded=round(ME_ctr,4)
print("ME_ctr_rounded: " + str(ME_ctr_rounded))


ME_ctr_rounded: 0.0009


**4. Upper and Lower bound of CI**

In [365]:
CI_upper_ctr= p_ctr_exp_hat + ME_ctr
CI_upper_ctr_rounded=round(CI_upper_ctr,4)

CI_lower_ctr= p_ctr_exp_hat - ME_ctr
CI_lower_ctr_rounded=round(CI_lower_ctr,4)

print("CI_upper_ctr_rounded: " + str(CI_upper_ctr_rounded))
print("CI_lower_ctr_rounded: " + str(CI_lower_ctr_rounded))

CI_upper_ctr_rounded: 0.0831
CI_lower_ctr_rounded: 0.0813


**5. Check, if sanity check passed**

 **--> PASSED, CI does not include zero ✔️**
 

**6.6 Sanity check Sumary ✔️**

All sanity checks for our three invariant metrics:
- Number of cookies
- Number of clicks: That is, number of unique cookies to click the "Start free trial" button.
- Click-through-probability:

have been checked and passed the sanity check as all observed values were between the upper and lower bounds of the CI. Hence, we can proceed with the Effect Size Tests.

## 7. Effect Size Tests

**7.1 Purpose**

The Effect Size test tells us how meaningful the relationship between variables or the difference between groups is or in other words it tells us, if a difference between a control and experiment group is also practically significant.

Sources:
https://www.scribbr.com/statistics/effect-size/#:~:text=Effect%20size%20tells%20you%20how,size%20indicates%20limited%20practical%20applications.

**7.2 Calculation Effect Size Test Algorithm**
1. P hat: 
    - $\hat{p GrossConv}$ = $(Enrollments_{Experiment} + Enrollments_{Control}) / (Clicks_{Experiment} + Clicks_{Control}) $ 
    - $\hat{p NetConv}$ = $(Payments_{Experiment} + Payments_{Control}) / (Clicks_{Experiment} + Clicks_{Control}) $ 
2. GrossConversion and NetConversion Difference:
    -  $ GrossConversion_{Difference}  = GrossConversion_{Experiment} - GrossConversion_{Control} = Enrollments_Experiment/Clicks_Experiment - Enrollments_control/Clicks_control$ 
    -  $ NetConversion_{Difference}  = NetConversion_{Experiment} - NetConversion_{Control}= Payments_Experiment/Clicks_Experiment - Payments_control/Clicks_control$ 

3. Standard Error: $SE= \sqrt{phat*(1-phat) + (1/N_{CONT} + 1/N_{EXP})}$ 
4. Margin Error for 95% confidence interval: $ME= Z-Score * SE = 1,96 * SE, for \alpha = 0,05$ 
5. Upper and Lower bound of CI: 
    - Gross Conversion: $CI_{upper}$ = $ GrossConversion_{Difference}  + ME$ and  $CI_{lower}$ = $ GrossConversion_{Difference}  - ME$
    - Net Conversion: $CI_{upper}$ = $ NetConversion_{Difference}  + ME$ and $CI_{lower}$ = $ NetConversion_{Difference}  - ME$
6. Check, 
    - if statistically relevant
        - Gross Conv: if CI includes zero or not 
        - Net Conv: if CI includes zero or not 
    - Practically relevant: $ D{min}$ not included in lower and upper CI bounds
    

Note: We are only counting the clicks for the not null values of enrollments to be consistent.

**7.3 Calculation Effect Size Test: GrossConversion and NetConversion**


**1. P hat**

Note: We are only counting the clicks for the not null values of enrollments to be consistent.

In [428]:
N_cont = control["Clicks"].loc[control["Enrollments"].notnull()].sum()
N_exp = experiment["Clicks"].loc[experiment["Enrollments"].notnull()].sum()

In [432]:
p_hat_gross = (experiment["Enrollments"].sum() + control["Enrollments"].sum())/(N_exp + N_cont)
p_hat_gross_rounded=round(p_hat_gross,4)
print("p_hat_gross_rounded: " + str(p_hat_gross_rounded))

p_hat_net = (experiment["Payments"].sum() + control["Payments"].sum())/(N_exp + N_cont)
p_hat_net_rounded=round(p_hat_net,4)
print("p_hat_net_rounded: " + str(p_hat_net_rounded))

p_hat_gross_rounded: 0.2086
p_hat_net_rounded: 0.1151


**2. GrossConversion and NetConversion Difference**

Note: We are only counting the clicks for the not null values of enrollments to be consistent.

In [440]:
GrossConvDiff = experiment["Enrollments"].loc[experiment["Enrollments"].notnull()].sum()/N_exp - control["Enrollments"].loc[control["Enrollments"].notnull()].sum()/N_cont
GrossConvDiff_rounded=round(GrossConvDiff,4)
print("GrossConvDiff_rounded: " + str(GrossConvDiff_rounded))

NetConvDiff = experiment["Payments"].loc[experiment["Payments"].notnull()].sum()/N_exp - control["Payments"].loc[control["Payments"].notnull()].sum()/N_cont
NetConvDiff_rounded=round(NetConvDiff,4)
print("NetConvDiff_rounded: " + str(NetConvDiff_rounded))


GrossConvDiff_rounded: -0.0206
NetConvDiff_rounded: -0.0049


**3. Standard Error**

In [444]:
SE_gross = np.sqrt(p_hat_gross*(1-p_hat_gross)*(1/N_cont + 1/N_exp))
SE_gross_rounded=round(SE_gross,4)
print("SE_gross_rounded: " + str(SE_gross_rounded))

SE_net = np.sqrt(p_hat_net*(1-p_hat_net)*(1/N_cont + 1/N_exp))
SE_net_rounded=round(SE_net,4)
print("SE_net_rounded: " + str(SE_net_rounded))

SE_gross_rounded: 0.0044
SE_net_rounded: 0.0034


**4. Margin Error**

In [450]:
ME_gross = (1.96 * SE_gross)
ME_gross_rounded=round(ME_gross,4)
print("ME_gross_rounded: " + str(ME_gross_rounded))

ME_net = (1.96 * SE_net)
ME_net_rounded=round(ME_net,4)
print("ME_net_rounded: " + str(ME_net_rounded))


ME_gross_rounded: 0.0086
ME_net_rounded: 0.0067


**5. Upper and Lower bound of CI**

In [456]:
CI_upper_gross= GrossConvDiff + ME_gross
CI_upper_gross_rounded=round(CI_upper_gross,4)

CI_lower_gross= GrossConvDiff - ME_gross
CI_lower_gross_rounded=round(CI_lower_gross,4)

print("CI_lower_gross_rounded: " + str(CI_lower_gross_rounded))
print("CI_upper_gross_rounded: " + str(CI_upper_gross_rounded))

CI_upper_net= NetConvDiff + ME_net
CI_upper_net_rounded=round(CI_upper_net,4)

CI_lower_net= NetConvDiff - ME_net
CI_lower_net_rounded=round(CI_lower_net,4)

print("CI_lower_net_rounded: " + str(CI_lower_net_rounded))
print("CI_upper_net_rounded: " + str(CI_upper_net_rounded))

CI_lower_gross_rounded: -0.0291
CI_upper_gross_rounded: -0.012
CI_lower_net_rounded: -0.0116
CI_upper_net_rounded: 0.0019


**6. Statistical and practical relevance**

1. Gross Conversion
    - Statistical:   **--> PASSED, CI does not include zero ✔️**
    - Practical: $ D{min} = 0.01$ not included in lower and upper CI bounds
    
 **--> OK, practically and statically significant ✔️**

2. Net Conversion
    - Statistical:  **NOT PASSED as zero is included in CI ❌**
    - Practical: $ D{min} = 0.0075$ included in lower and upper CI bounds
    
 **--> KO, NOT practically and statically significant ❌**


**7. Effect Size Test Summary**

We have seen that our Gross Conversion Metric **is very well pratically and statistically relevant**. However, the Net Conversion Metric is **NOT pratically and statistically**.

This is interesting and surprising as Gross and Net Conversion should correlate with each other.

## 8. Sign test

**8.1 Purpose**

"The Sign test is a non-parametric test that is used to test whether or not two groups are equally sized." In other words this means we want to check, if in the case of two groups there is a difference that is significant.

Sources:
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/sign-test/

**8.2 Calculation of Sign Test algorithm**

1. Create a clean version of control and experiment data frame without null values
2. We need to join our two dataframe control and experiment group so we get the following rates. Some renaming will have to be done
    - Gross conversion rate (Enrollments/Clicks)
    - Net conversion rate ( payments/clicks)
3. On the resulting dataframe we can calculate the daily Difference for Gross/Net Converion rate between control and experiment group: 
    - GrossConversionDiff = Gross Conversion Rate Control - Gross Conversion Rate Experiment
    - NetConversionDiff = Net Conversion Rate Control - Net Conversion Rate Experiment 
4. Get the number of trials and successes to run a sign and binomial test:
    - Trials = number of samples in new data frame
    - Successes:
        - Number of daily GrossConversionDiff < 0
        - Number of daily NetConversionDiff < 0
        
5. Use a binomial test to calculate the p-value. Either using https://www.graphpad.com/quickcalcs/binomial1.cfm or using scipy.stats import binom_test

**1. Clean**

In [521]:
experiment_clean = experiment[experiment["Enrollments"].notnull()]
control_clean = control[control["Enrollments"].notnull()]
experiment_clean.head(5)

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments,Grossperday,Netperday
0,"Sat, Oct 11",7716,686,105.0,34.0,0.153061,0.049563
1,"Sun, Oct 12",9288,785,116.0,91.0,0.147771,0.115924
2,"Mon, Oct 13",10480,884,145.0,79.0,0.164027,0.089367
3,"Tue, Oct 14",9867,827,138.0,92.0,0.166868,0.111245
4,"Wed, Oct 15",9793,832,140.0,94.0,0.168269,0.112981


**2a. Join dataframes**

In [487]:
df_merged = pd.merge(control_clean, experiment_clean,  how='inner', on=['Date'])
df_merged.head(5)

Unnamed: 0,Date,Pageviews_x,Clicks_x,Enrollments_x,Payments_x,Grossperday_x,Netperday_x,Pageviews_y,Clicks_y,Enrollments_y,Payments_y,Grossperday_y,Netperday_y
0,"Sat, Oct 11",7723,687,134.0,70.0,0.195051,0.101892,7716,686,105.0,34.0,0.153061,0.049563
1,"Sun, Oct 12",9102,779,147.0,70.0,0.188703,0.089859,9288,785,116.0,91.0,0.147771,0.115924
2,"Mon, Oct 13",10511,909,167.0,95.0,0.183718,0.10451,10480,884,145.0,79.0,0.164027,0.089367
3,"Tue, Oct 14",9871,836,156.0,105.0,0.186603,0.125598,9867,827,138.0,92.0,0.166868,0.111245
4,"Wed, Oct 15",10014,837,163.0,64.0,0.194743,0.076464,9793,832,140.0,94.0,0.168269,0.112981


**2b. Rename**

In [499]:
df_merged_final = df_merged.rename(columns=
                                   {
                                    'Pageviews_y': 'Pageviews_exp',
                                    'Clicks_y': 'Clicks_exp', 
                                    'Enrollments_y': 'Enrollments_exp', 
                                    'Payments_y': 'Payments_exp', 
                                    'Grossperday_y': 'Grossperday_exp', 
                                    'Netperday_y': 'Netperday_exp', 

                                    'Pageviews_x': 'Pageviews_cont',
                                    'Clicks_x': 'Clicks_cont', 
                                    'Enrollments_x': 'Enrollments_cont', 
                                    'Payments_x': 'Payments_cont', 
                                    'Grossperday_x': 'Grossperday_cont', 
                                    'Netperday_x': 'Netperday_cont', 
                                   })
df_merged_final.head(5)

Unnamed: 0,Date,Pageviews_cont,Clicks_cont,Enrollments_cont,Payments_cont,Grossperday_cont,Netperday_cont,Pageviews_exp,Clicks_exp,Enrollments_exp,Payments_exp,Grossperday_exp,Netperday_exp
0,"Sat, Oct 11",7723,687,134.0,70.0,0.195051,0.101892,7716,686,105.0,34.0,0.153061,0.049563
1,"Sun, Oct 12",9102,779,147.0,70.0,0.188703,0.089859,9288,785,116.0,91.0,0.147771,0.115924
2,"Mon, Oct 13",10511,909,167.0,95.0,0.183718,0.10451,10480,884,145.0,79.0,0.164027,0.089367
3,"Tue, Oct 14",9871,836,156.0,105.0,0.186603,0.125598,9867,827,138.0,92.0,0.166868,0.111245
4,"Wed, Oct 15",10014,837,163.0,64.0,0.194743,0.076464,9793,832,140.0,94.0,0.168269,0.112981


**3. Difference for Gross/Net Converion rate between control and experiment group**

In [503]:
df_merged_final["GrossConversionDiff"] = df_merged_final["Grossperday_cont"] -df_merged_final["Grossperday_exp"]
df_merged_final["NetConversionDiff"] = df_merged_final["Netperday_cont"] -df_merged_final["Netperday_exp"]

df_merged_final.head(5)

Unnamed: 0,Date,Pageviews_cont,Clicks_cont,Enrollments_cont,Payments_cont,Grossperday_cont,Netperday_cont,Pageviews_exp,Clicks_exp,Enrollments_exp,Payments_exp,Grossperday_exp,Netperday_exp,GrossConversionDiff,NetConversionDiff
0,"Sat, Oct 11",7723,687,134.0,70.0,0.195051,0.101892,7716,686,105.0,34.0,0.153061,0.049563,0.04199,0.05233
1,"Sun, Oct 12",9102,779,147.0,70.0,0.188703,0.089859,9288,785,116.0,91.0,0.147771,0.115924,0.040933,-0.026065
2,"Mon, Oct 13",10511,909,167.0,95.0,0.183718,0.10451,10480,884,145.0,79.0,0.164027,0.089367,0.019691,0.015144
3,"Tue, Oct 14",9871,836,156.0,105.0,0.186603,0.125598,9867,827,138.0,92.0,0.166868,0.111245,0.019735,0.014353
4,"Wed, Oct 15",10014,837,163.0,64.0,0.194743,0.076464,9793,832,140.0,94.0,0.168269,0.112981,0.026474,-0.036517


**4. Get the number of trials and successes**


In [520]:
trials = len(df_merged_final)
print("Number of trials: " + str(trials))

successes_gross = df_merged_final.query('GrossConversionDiff < 0')["GrossConversionDiff"].count()
print("Number of successes Gross Conversion: " + str(successes_gross))

successes_net = df_merged_final.query('NetConversionDiff < 0')["NetConversionDiff"].count()
print("Number of successes Net Conversion: " + str(successes_net))

Number of trials: 23
Number of successes Gross Conversion: 4
Number of successes Net Conversion: 10


**5. Binomial test to calculate the p-value**

**5.1 Using Scipy binom_test**

Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom_test.html

In [535]:
from scipy.stats import binom_test

p_value_gross = binom_test(successes_gross, trials, p=0.5, alternative='two-sided')
p_value_gross_rounded=round(p_value_gross,4)
print("p_value_gross_rounded: " + str(p_value_gross_rounded))

p_value_net = binom_test(successes_net, trials, p=0.5, alternative='two-sided')
p_value_net_rounded=round(p_value_net,4)
print("p_value_net_rounded: " + str(p_value_net_rounded))


p_value_gross_rounded: 0.0026
p_value_net_rounded: 0.6776


**5.2 Using Online Calculator**

Source:
https://www.graphpad.com/quickcalcs/binomial1

- For the Gross Conversion: 
    - "The two-tail P value is 0.0026
This is the chance of observing either 4 or fewer successes, or 19 or more successes, in 23 trials."
- For the Net Conversion: 
    - "The two-tail P value is 0.6776
This is the chance of observing either 10 or fewer successes, or 13 or more successes, in 23 trials."

**5.3 Interpreation**
- For the Gross Conversion: 
    - The two-tail P value is 0.0026 being smaller (<) than the significance level alpha 0,05. This means that we have **strong evidence to reject null hypothesis H0 and the effect for the gross conversion is statistically relevant**. In other words: the effect between the control and experimental group can be considered as significant and the observed effect is not random.
    
- For the Net Conversion: 
    - "The two-tail P value is 0.6776 being higher (>) than the significance level alpha 0,05. This means that we **have not enough evidence to reject null hypothesis H0 and the effect for the gross conversion is statistically NOT relevant**. In other words: the effect between the control and experimental group can be considered as insignificant and the observed effect is by random chance.


## XX. External resources

- Bonferroni Correction: https://www.investopedia.com/terms/b/bonferroni-test.asp
- Sample Size calculator: https://www.evanmiller.org/ab-testing/sample-size.html
- Effect size test:  https://www.scribbr.com/statistics/effect-size/#:~:text=Effect%20size%20tells%20you%20how,size%20indicates%20limited%20practical%20applications.
- Sign test: https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/sign-test/
- Sign and binomial test:
    - https://www.graphpad.com/quickcalcs/binomial1.cfm
    - https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom_test.html
 

p_value_gross_rounded: 0.0026
p_value_net_rounded: 0.6776


p-value: 0.0025994777679443364
