# Audacity A/B Testing Project

## Experiment Overview: Free Trial Screener

Udacity is an online course platform specialized in IT sector. At the moment, they want to run an experiment on their website with the goal of improving the course completion rate of their students.

### Context

Let's take a more in-deepth look to how the Udacity environment is setup before the experiment:

* Udacity courses currently have two options on the course overview page: "Start Free Trial", and "Access Course Materials".
* If the student clicks "Start Free Trial" button, they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course.
* If the student clicks "Access Course Materials", they will be able to view the videos and take the quizzes for free, but they will not receive any bonuses, like coaching services or earning the certificate of the course.

### Description of the Experiment

* In the experiment, Udacity tested a change where if the student clicked "Start Free Trial" button, they were asked how much time they had available to devote to the course. 
* If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. 
* If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. 
* At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead.

### Hypothesis Testing

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

### Unit of Diversion

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

## Experiment Design
### Metric Choice

The invariant metrics are the ones which shouldn't change across the experiment and the control group. These metrics work to perform a sanity check after running the experiment to test if there was any issue on the process.

The evaluation metrics are those target metrics that you expect to change across groups and are relevant for the business goals. For each metric, it is defined a $D_{min}$ which marks the minimum change which is practically significant to the business. This figure is provided by Udacity for each of the metrics:

#### Invariant Metrics (Sanity Check)

For this case, the following metrics will be chosen as invariant metrics:

* Number of cookies per course page by day. This metric has been chosen because in this case the number of cookies for each group should remain the same during the experiment. As cookies is our unit of diversion for this experiment, the sample size needs to be the same across both groups. It will be represented by $Cookies$ and it has a $D_{min} = 3,000$.
  
* Number of clicks in the 'Start Free Trial' button by day. As this click happens before the Free Trial Screener message appears, this metric shouldn't be impacted by the experiment. It should remain with no changes. It will be represented by $Clicks$ and it has a $D_{min} = 240$.
  
* Click-Through-Probability in the 'Start Free Trial' button by day. This metric is the relationship between the number of clicks on the 'Start Free Trial' button and the number of cookies on page. If these two metrics don't change, the CTP shouldn't change either. It will be represented by $\frac{Clicks}{Cookies}$ and it has a $D_{min} = 0.01$.

#### Evaluation Metrics

The metrics that will be used as evaluation metrics are:

* Gross Conversion. This metric is the relationship between the number of user IDs to complete the checkout and enroll the free trial divided by unique cookie clicks on the button. It will be represented by $\frac{User IDs enrolled}{Clicks}$ and it has a $D_{min} = 0.01$.

* Retention. This metric is the relationship between the number of user IDs to remain enrolled and make at least one payment divided by unique user IDs to complete the checkout. It will be represented by $\frac{User IDs paid}{User IDs enrolled}$ and it has a $D_{min} = 0.01$.
  
* Net Conversion. This metric is the relationship between the number of user IDs to remain enrolled and make at least one payment divided by the number of unique cookie clicks on the button. It will be represented by $\frac{User IDs paid}{Clicks}$ and it has a $D_{min} = 0.0075$.

These three metrics are expected to change because they are measured after the Free Trial Screener message appears. Also, they are relevant for the business because they help to measure the low funnel performance and retention.

### Measuring Standard Deviation

Udacity provides the following rough estimates for these metrics, probably measured with a daily aggregation. This is the baseline for each of the metrics:

* Unique cookies to view course overview page per day: 40,000
* Unique cookies to click "Start free trial" per day: 3,200
* Enrollments per day: 660
* Click-through-probability on "Start free trial": 0.08
* Probability of enrolling, given click (Gross Conversion): 0.20625
* Probability of payment, given enroll (Retention): 0.53
* Probability of payment, given click (Net Conversion): 0.1093125

Now, we will need to calculate the standard deviation for each of the evaluation metrics. This step is very important to test if a metric has a great variability or it is more robust. On one hand, the most variant a metric is, the harder is to get significant results. In the other hand, if the metric is too robust, it is possible it's too insensitive to capture the statistically significant change. 

Udacity assumes a sample size of 5,000 cookies visiting the course overview page per day. As the previous data is based on baseline numbers, we will need to readjust the metrics to a sample size of 5,000 cookies.

In [159]:
# Create a dictionary with the baseline estimates
baseline = {"Total Cookies":40000, "Total Clicks":3200, "Total Enrollments":660, "CTP":0.08, "Gross Conversion":0.20625, "Retention":0.53, "Net Conversion":0.1093125}

# Creating a copy of baseline to add the adjustments
sample_adjusted = baseline

# Defining the samples
n = 40000
n_adjusted = 5000

In [160]:
# Scale the estimates from a sample size of 40,000 to 5,000

sample_adjusted["Total Cookies"] = 5000
sample_adjusted["Total Clicks"] = (n_adjusted * baseline["Total Clicks"] / n)
sample_adjusted["Total Enrollments"] = (n_adjusted * baseline["Total Enrollments"] / n)
sample_adjusted

{'Total Cookies': 5000,
 'Total Clicks': 400.0,
 'Total Enrollments': 82.5,
 'CTP': 0.08,
 'Gross Conversion': 0.20625,
 'Retention': 0.53,
 'Net Conversion': 0.1093125}

As our evaluations metrics are probabilities, we can assume all of them have a binomial distribution (or normal, as we have enough data samples). Now, let's calculate the standard deviation based on this with the following formula:

$$SD = \sqrt{\frac{p'(1-p')}{N}}$$

We can do this assumption for the Gross Conversion and the Net Conversion because the unit of diversion is the same that the unit of analysis (metric placed on the denominator). In this case, the unit of diversion is the cookie, as well as the unit of analysis (number of cookies who clicked). We can expect the analytical estimates to be accurate.

However, the Retention metric doesn't have the same unit of analysis and unit of diversion. In this case, the unit of diversion is the cookie but the unit of analysis is the user ID. For this reason, the analytical estimates might not match the empirical estimates and it will be worth it to calculate it empirically too. 

In [161]:
# Importing libraries
import numpy as np
import pandas as pd
import scipy.stats as stats

In [162]:
# Creating a function to calculate the standard deviation
def sd(p, n):
    dic = {}
    dic["sd"] = round(np.sqrt((p*(1-p))/n), 4)
    return dic['sd']

In [163]:
# Let's create three new dictionaries for each metric
gross_conversion = {}
retention = {}
net_conversion = {}

# Adding the Dmin data to each dictionary
gross_conversion["d_min"] = 0.01
retention["d_min"] = 0.01
net_conversion["d_min"] = 0.0075

# Calculating p and n for each metric
gross_conversion["p"] = sample_adjusted["Gross Conversion"]
gross_conversion["n"] = sample_adjusted["Total Clicks"]

retention["p"] = sample_adjusted["Retention"]
retention["n"] = sample_adjusted["Total Enrollments"]

net_conversion["p"] = sample_adjusted["Net Conversion"]
net_conversion["n"] = sample_adjusted["Total Clicks"]

# Using the function created to get the standard deviation
print(sd(gross_conversion["p"], gross_conversion['n']))
print(sd(retention["p"], retention['n']))
print(sd(net_conversion["p"], net_conversion['n']))

0.0202
0.0549
0.0156


### Sizing
#### Number of Samples vs. Power

First of all, we are going to use the formula Evan Miller used on his online calculator to calculate the sample size:

$$n = \frac{Z_{1-\frac{\alpha}{2}}·sd_{1}+Z_{1-\beta}·sd_{2}}{d^2}$$

$$sd_{1} = \sqrt{2p(1-p)}$$
$$sd_{2} = \sqrt{p(1-p)+(p+d)(1-p-d)}$$

Where:

* $p_{1}$ is the baseline conversion rate.
* $\delta$ is the detectable change.
* $\alpha$ is the significance level.
* $\beta$ is the statistical power or practice significance.
* $Z_{\frac{\alpha}{2}}$ means the z-score from the z table that corresponds to $\frac{\alpha}{2}$
* $Z_{\beta}$ means the z-score from the z table that corresponds to $\beta$

During the whole experiment, $\alpha = 0.05$ and $1-\beta = 0.20$. The z-score for each of them are $Z_{\frac{0.05}{2}} = -1.959963985$ and $Z_{\beta} = -0.841621234$.

In [164]:
# Create a function to calculate the sample size
def sample_size(p, delta):
    if p > 0.5:
        p = 1.0 - p
    
    z_a = 1.959963985
    z_b = 0.841621234

    sd1 = np.sqrt(2 * p * (1.0 - p))
    sd2 = np.sqrt(p * (1.0 - p) + (p + delta) * (1.0 - p - delta))

    return round((z_a * sd1 + z_b * sd2) * (z_a * sd1 + z_b * sd2) / (delta * delta), 0)

##### Gross Conversion

For Gross Conversion, we will need at least 25,835 cookies who click in the Free Trial button per group.

In [165]:
gross_conversion['sample_size'] = sample_size(gross_conversion['p'], gross_conversion['d_min'])
gross_conversion['sample_size']

25835.0

Now, we need to estimate the number of pageviews needed to achieve those 25,835 cookies per group. To do so, we need to calculate the ratio between clicks and pageviews: $400/5000 = 0.08$. Now, let's divide the sample size we got between this result and multiply it by two, as we have two groups in the experiment:

In [166]:
gross_conversion['sample_size'] = (gross_conversion['sample_size']/0.08)*2
gross_conversion['sample_size']

645875.0

We would need in total 645,875 pageviews in total counting both groups.

##### Retention

Regarding retention, we will need at least 39,087 users who enrolled per group. 

In [167]:
retention['sample_size'] = sample_size(retention['p'], retention['d_min'])
retention['sample_size']

39115.0

Now, we divide this result by 0.08 to know how many users need to click on the 'Start Free Trial' button and then how many cookies viewed a course overview page:

In [168]:
retention['sample_size'] = ((retention['sample_size']/0.08)/gross_conversion['p'])*2
retention['sample_size']

4741212.121212121

This means we will need 4,74 million pageviews. However, this number is quite high, as Udacity attracts 40,000 cookies per day. The experiment would need to last 120 days to gather the necessary sample. For these reasons, we drop this metric from the experiment.

##### Net Conversion

For net conversion, we will need at least 27,413 users who click per group. 

In [169]:
net_conversion['sample_size'] = sample_size(net_conversion['p'], net_conversion['d_min'])
net_conversion['sample_size']

27413.0

Now, we need to calculate how many pageviews we will need by dividing between 0.08. This way, we will need 685,325 pageviews.

In [170]:
net_conversion['sample_size'] = ((net_conversion['sample_size']/0.08))*2
net_conversion['sample_size']

685325.0

As this number of pageviews is bigger than the one needed for the gross conversion, this is going to be our sample size.

#### Duration vs. Exposure

As we said, we will need 685,325 cookies who viewed the course overview page. If we take the 100% of the traffic, the experiment will take 18 days. However, we prefer the experiment to take one month, so we collect four different weeks of data. This way, we'll be able to see if there is important variations between workdays and weekdays. After taking this decision, the experiment will last 28 days, collecting 24,476 cookies per day. This is around 55% of daily traffic.

## Experiment Analysis

Now, let's analyze the data once the experiment has been run. First of all, we will create two different dataframes, one for the control group and other for the experiment group.

In [171]:
con = pd.read_excel("final_project_results.xlsx", sheet_name = 'Control')
exp = pd.read_excel("final_project_results.xlsx", sheet_name = 'Experiment')
con.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


### Sanity Checks

The first step before analyzing the results is performing a sanity check in the invariant metrics. This will help us to define if the experiment was ran as expected. For each of our invariant metrics, we will check if they pass the test or not:

* Number of cookies per course page.
  
* Number of clicks in the 'Start Free Trial' button.
  
* Click-Through-Probability in the 'Start Free Trial' button.

#### Sanity Check for Counts

For a count, we should calculate a confidence interval around the fraction of events you expect to be assigned to the control group, and the observed value should be the actual fraction that was assigned to the control group. 

##### Number of Cookies per Course Page

Let's take a look to the total pageviews for each group. If we calculate the total number of pageviews per group, we can see the difference is quite low between groups. However, is this within what we expected? We need to verify that this difference is not statistically significant and was randomly 

As this invariant metric follows a binomial distribution, we can build a binomial confidence interval. The total sample size here is about 690,203 cookies, which is definitively enough to assume a normal distribution in this case.

In [172]:
# Summing the total number of pageviews for each group
con_pages = con['Pageviews'].sum()
exp_pages = exp['Pageviews'].sum()
print(f"Total number of pagesviews for control group: {con_pages} \nTotal number of pageviews for experiment group: {exp_pages}")

# Summing the total sample size
total_pages = con_pages + exp_pages
print(f"The size of the sample is {total_pages}")

Total number of pagesviews for control group: 345543 
Total number of pageviews for experiment group: 344660
The size of the sample is 690203


First, we will compute the standard deviation of a binomial distribution with probability 0.5 of success, which I'll assign as control group. 

Then, we'll multiply the standard deviation by the z-score to get the margin of error. The z-score used in the margin of error for a 95% of confidence interval is 1.96. This can be checked in the z-score tables availables on the Internet.

After that, we'll compute a confidence interval around 0.5. If the experiment is set up properly, it's very likely that this observed fraction of successes or cookies in the control group will fall within this confidence interval.

In [173]:
# Calculating the standard error for a normal distribution
se_pages = round(np.sqrt((0.5*0.5)/total_pages), 4)
print(f"The standard error for pageviews is {se_pages}")

# Calculating the margin of error
me_pages = round(se_pages * 1.96, 4)
print(f"The margin of error for pageviews is {me_pages}")

# Calculating the confidence interval
print(f"The upper confidence interval for pageviews is {0.5+me_pages} and the lower confidence interval is {0.5-me_pages}")

The standard error for pageviews is 0.0006
The margin of error for pageviews is 0.0012
The upper confidence interval for pageviews is 0.5012 and the lower confidence interval is 0.4988


Now, let's calculate the portion of pageviews for the control group to see if it's inside the interval or not. As we can see, 0.5006 is within the confidence interval. This means the difference between the groups is expected, so this metric passes the sanity check correctly.

In [174]:
# Calculating the fraction of the control group
p_hat_pages = round(con_pages/total_pages, 4)
print(f"The fraction of pageviews from the total for the control group is {p_hat_pages}")

The fraction of pageviews from the total for the control group is 0.5006


##### Number of Clicks in 'Start Free Trial' Button

Let's follow the same approach for the total number of clicks on the 'Start Free Trial' button.

In [175]:
# Summing the total number of pageviews for each group
con_clicks = con['Clicks'].sum()
exp_clicks = exp['Clicks'].sum()
print(f"Total number of clicks for control group: {con_clicks} \nTotal number of pageviews for experiment group: {exp_clicks}")

# Summing the total sample size
total_clicks = con_clicks + exp_clicks
print(f"The size of the sample is {total_clicks}")

Total number of clicks for control group: 28378 
Total number of pageviews for experiment group: 28325
The size of the sample is 56703


In [176]:
# Calculating the standard error for a normal distribution
se_clicks = round(np.sqrt((0.5*0.5)/total_clicks), 4)
print(f"The standard error for clicks is {se_clicks}")

# Calculating the margin of error
me_clicks = round(se_clicks * 1.96, 4)
print(f"The margin of error for clicks is {me_clicks}")

# Calculating the confidence interval
print(f"The upper confidence interval for clicks is {0.5+me_clicks} and the lower confidence interval is {0.5-me_clicks}")

The standard error for clicks is 0.0021
The margin of error for clicks is 0.0041
The upper confidence interval for clicks is 0.5041 and the lower confidence interval is 0.4959


In [177]:
# Calculating the fraction of the control group
p_hat_clicks = round(con_clicks/total_clicks, 4)
print(f"The fraction of clicks from the total for the control group is {p_hat_clicks}")

The fraction of clicks from the total for the control group is 0.5005


#### Sanity Check for Probabilities & Rates

For any other type of metric, we should construct a confidence interval for a difference in proportions, then check whether the difference between group values falls within that confidence level.

##### Click-Through-Probability in 'Start Free Trial' Button

for the CTP, we want to make sure the proportion of clicks given a pageview is similar in both groups. We'll need to calculate the confidence interval for the difference. We'll start by calculating the pooled probability, that is the total number of clicks in both groups divided by the total number of users.

$$p'_{pooled} = \frac{clicks_{con}+clicks_{exp}}{pages_{con}+pages_{exp}}$$

In [178]:
# Calculating the pooled probability
p_pool = round((total_clicks) / (total_pages), 4)
print(f"The pooled probability is {p_pool}")

The pooled probability is 0.0822


Next, we'll calculate the standard error using the following formula:

$$SE_{pool} = \sqrt{p'_{pool}·(1-p'_{pool})·(\frac{1}{N_{con}}+\frac{1}{N_{exp}})}$$

In [179]:
# Calculating the standard error
se_pool = round(np.sqrt(p_pool*(1-p_pool)*(1/con_pages+1/exp_pages)),4)
print(f"The standard error is {se_pool}")

The standard error is 0.0007


We defined the estimated difference as the experimental probability minus the control probability. Let's calculate this difference:

In [180]:
# Calculating the CTP for control and experiment group
con_ctp = con_clicks/con_pages
exp_ctp = exp_clicks/exp_pages

# Calculating difference for the CTP
d_hat=round(exp_ctp-con_ctp,4)
print(f"The difference for the CTP is {d_hat}")

The difference for the CTP is 0.0001


Let's calculate the margin of error and the confidence intervals:

In [181]:
# Calculating the margin of error
me_ctp = round(se_pool * 1.96, 4)
print(f"The margin of error is is {se_pool}")

# Calculating the confidence interval
print(f"The upper confidence interval for CTP is {d_hat+me_ctp} and the lower confidence interval is {d_hat-me_ctp}")

The margin of error is is 0.0007
The upper confidence interval for CTP is 0.0015 and the lower confidence interval is -0.0013


As 0.0001 is inside our confidence interval, we can say this change in the CTP is not significant and random. So, this metric passes the test too!

### Result Analysis
#### Evaluating Practical and Statistical Significance

Next, for our evaluation metrics, we will calculate a confidence interval for the difference between the experiment and control groups, and check whether each metric is statistically and/or practically significance. A metric is statistically significant if the confidence interval does not include 0 (that is, you can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, you can be confident there is a change that matters to the business.)

We can see that for the last 14 days of the experiment, we only collected pageviews and clicks. For this reason, we should remove the last 14 days from the dataset, so we are not counting more pageviews and clicks, so we only keep the completed records.

In [182]:
# Calculating the count clicks from the completed rows
con_clicks_test = con["Clicks"].loc[con["Enrollments"].notnull()].sum()
exp_clicks_test = exp["Clicks"].loc[exp["Enrollments"].notnull()].sum()

# Calculating the count of enrollments and payments
con_enroll_test = con["Enrollments"].sum()
exp_enroll_test = exp["Enrollments"].sum()
con_pay_test = con["Payments"].sum()
exp_pay_test = exp["Payments"].sum()

Now, let's calculate the confidence interval for all the evaluations metrics.

##### Gross Conversion

First of all, let's calculate the value of the gross conversion for the total experiment in both groups.

In [183]:
con_gc = con_enroll_test/con_clicks_test
exp_gc = exp_enroll_test/exp_clicks_test

Now, we will calculate the pooled probability and the standard pooled error:

In [184]:
# Calculating the pooled probability
gc_pooled = (con_enroll_test+exp_enroll_test)/(con_clicks_test+exp_clicks_test)
print(f"The pooled probability is {gc_pooled}")

# Calculating the standard error 
gc_se_pooled = np.sqrt((gc_pooled)*(1-gc_pooled)*((1/con_clicks_test)+(1/exp_clicks_test)))
print(f"The standard error is {gc_se_pooled}")

The pooled probability is 0.20860706740369866
The standard error is 0.004371675385225936


Next, we are going to estimate the margin of error, the difference and the confidence intervals.

In [185]:
# Calculating the margin of error
gc_me = round(1.96*gc_se_pooled,4)
print(f"The margin of error is is {gc_me}")

# Getting the difference between Gross Conversion in experiment group and control group
gc_d = round(exp_gc-con_gc, 4)
print(f"The difference is {gc_d}")

# Calculating the confidence interval
print(f"The upper confidence interval is {gc_d+gc_me} and the lower confidence interval is {gc_d-gc_me}")

The margin of error is is 0.0086
The difference is -0.0206
The upper confidence interval is -0.012 and the lower confidence interval is -0.0292


As the zero is not included in the interval [-0.012, -0.0292], we can conclude that the gross conversion has experienced a statistically significant change. Also, we can say there is practical significance because -0.01 is not included in the interval as well. 

We agreed to have at least 1% of change in this evaluation metric and the new version had -2.06% change in the gross conversion. This means the experiment group experienced a decrease on the gross conversion rate. Then, we can conclude that less people enrolled in the Free Trial after including the pop-up.

##### Net Conversion

First of all, let's calculate the value of the net conversion for the total experiment in both groups.

In [186]:
con_ec = con_pay_test/con_clicks_test
exp_ec = exp_pay_test/exp_clicks_test

Now, we will calculate the pooled probability and the standard pooled error:

In [187]:
# Calculating the pooled probability
ec_pooled = (con_pay_test+exp_pay_test)/(con_clicks_test+exp_clicks_test)
print(f"The pooled probability is {ec_pooled}")

# Calculating the standard error 
ec_se_pooled = np.sqrt((ec_pooled)*(1-ec_pooled)*((1/con_clicks_test)+(1/exp_clicks_test)))
print(f"The standard error is {ec_se_pooled}")

The pooled probability is 0.1151274853124186
The standard error is 0.0034341335129324238


Next, we are going to estimate the margin of error, the difference and the confidence intervals.

In [188]:
# Calculating the margin of error
ec_me = round(1.96*ec_se_pooled,4)
print(f"The margin of error is is {ec_me}")

# Getting the difference between Gross Conversion in experiment group and control group
ec_d = round(exp_ec-con_ec, 4)
print(f"The difference is {ec_d}")

# Calculating the confidence interval
print(f"The upper confidence interval is {ec_d+ec_me} and the lower confidence interval is {ec_d-ec_me}")

The margin of error is is 0.0067
The difference is -0.0049
The upper confidence interval is 0.0018000000000000004 and the lower confidence interval is -0.0116


In this case, we have a change of -0.49%, which is not statistically significant (the 0 is included in the confidence interval) and it is not practically significant (-0.0075 is included in the interval). We needed at least a change of 0.75% to see a practical change, but we only got a 0.49%. We can conclude that we had a decrease in the net conversion after adding the pop-up.

#### Sign Tests

Finally, we are going to perform a sign using to see if there is any significant change in the day-by-day data. First of all, we will perform the metrics per day for both experiments and add them to the same dataframe:

In [189]:
# Merging the control and experiment group in the same DataFrame
full = con.join(exp, how = 'left', lsuffix="_con",rsuffix="_exp")
full.count()

Date_con           37
Pageviews_con      37
Clicks_con         37
Enrollments_con    23
Payments_con       23
Date_exp           37
Pageviews_exp      37
Clicks_exp         37
Enrollments_exp    23
Payments_exp       23
dtype: int64

Now, let's remove the incomplete rows:

In [190]:
full = full.dropna(axis = 0, how = 'any')
full.count()

Date_con           23
Pageviews_con      23
Clicks_con         23
Enrollments_con    23
Payments_con       23
Date_exp           23
Pageviews_exp      23
Clicks_exp         23
Enrollments_exp    23
Payments_exp       23
dtype: int64

Now, let's calculate the daily Gross Conversion and Net Conversion:

In [191]:
full.insert(5, 'GC_con', full['Enrollments_con']/full['Clicks_con'])
full.insert(6, 'NC_con', full['Payments_con']/full['Clicks_con'])

full.insert(12, 'GC_exp', full['Enrollments_exp']/full['Clicks_exp'])
full.insert(13, 'NC_exp', full['Payments_exp']/full['Clicks_exp'])

In [192]:
full.head(5)

Unnamed: 0,Date_con,Pageviews_con,Clicks_con,Enrollments_con,Payments_con,GC_con,NC_con,Date_exp,Pageviews_exp,Clicks_exp,Enrollments_exp,Payments_exp,GC_exp,NC_exp
0,"Sat, Oct 11",7723,687,134.0,70.0,0.195051,0.101892,"Sat, Oct 11",7716,686,105.0,34.0,0.153061,0.049563
1,"Sun, Oct 12",9102,779,147.0,70.0,0.188703,0.089859,"Sun, Oct 12",9288,785,116.0,91.0,0.147771,0.115924
2,"Mon, Oct 13",10511,909,167.0,95.0,0.183718,0.10451,"Mon, Oct 13",10480,884,145.0,79.0,0.164027,0.089367
3,"Tue, Oct 14",9871,836,156.0,105.0,0.186603,0.125598,"Tue, Oct 14",9867,827,138.0,92.0,0.166868,0.111245
4,"Wed, Oct 15",10014,837,163.0,64.0,0.194743,0.076464,"Wed, Oct 15",9793,832,140.0,94.0,0.168269,0.112981


Now, let's add a 1 when the Gross Conversion is higher in the experiment group than in the control group and 0 in the other case.

In [194]:
full['GC'] = np.where(full['GC_con']<full['GC_exp'], 1, 0)
full['NC'] = np.where(full['NC_con']<full['NC_exp'], 1, 0)
full.head()

Unnamed: 0,Date_con,Pageviews_con,Clicks_con,Enrollments_con,Payments_con,GC_con,NC_con,Date_exp,Pageviews_exp,Clicks_exp,Enrollments_exp,Payments_exp,GC_exp,NC_exp,GC,NC
0,"Sat, Oct 11",7723,687,134.0,70.0,0.195051,0.101892,"Sat, Oct 11",7716,686,105.0,34.0,0.153061,0.049563,0,0
1,"Sun, Oct 12",9102,779,147.0,70.0,0.188703,0.089859,"Sun, Oct 12",9288,785,116.0,91.0,0.147771,0.115924,0,1
2,"Mon, Oct 13",10511,909,167.0,95.0,0.183718,0.10451,"Mon, Oct 13",10480,884,145.0,79.0,0.164027,0.089367,0,0
3,"Tue, Oct 14",9871,836,156.0,105.0,0.186603,0.125598,"Tue, Oct 14",9867,827,138.0,92.0,0.166868,0.111245,0,0
4,"Wed, Oct 15",10014,837,163.0,64.0,0.194743,0.076464,"Wed, Oct 15",9793,832,140.0,94.0,0.168269,0.112981,0,1


In [201]:
gc_all = full['GC'][full["GC"] == 1].count()
nc_all = full['NC'][full["NC"] == 1].count()
n = full['NC'].count()
print("Number of cases for GC:", gc_all,'\n',
      "Number of cases for NC:", nc_all,'\n',
      "Number of total cases:", n)

Number of cases for GC: 4 
 Number of cases for NC: 10 
 Number of total cases: 23


In this case, we can use an online calculator for the sign test. We only need to add the number of success and the total number of cases and get the p-value for a two-tail test. That's the probability of observing a result at least this extreme by chance.

For the Gross Conversion, we have 4 success cases and 23 cases in total. If the probability of success in each trial is 0.05, then the p-value is 0.0026. Since it is less than 0.05, we can conclude the sign test agrees with the hypothesis test: this result was unlikely to come about by chance. So, it's significant.

For the Net Conversion, we have 10 success cases and 23 cases in total. If the probability of success in each trial is 0.05, then the p-value is 0.6776. Since it is higher than 0.05, we can conclude the sign test is not significant.

We get the same conclusions that in the effect size section of this analysis.

### Recommendations

With the results that we have got, I wouldn't recommend to proceed with the change. We have seen it might have a negative impact in the gross conversion, but not in the net conversion.