# Lesson 4 - A/B Testing Case Study - Analyzing Data

Let's assume that the experiment was given the green light to go ahead, and data was collected for 29 days. As a reminder of the discussion on experiment sizing, it was found that a three-week period was needed to collect enough visitors to achieve our desired power level. Eight additional days of collection were added to allow visitors in the last week to complete their trials and come back to make a purchase – if you look at the data linked in the next paragraph, you will see that it takes about eight days before the license purchases reaches its steady level.
The collected data can be found here. The data file reports the daily counts for the number of unique cookies, number of downloads, and number of license purchases attributed to each group: the experimental group with the new homepage, or the control group with the old homepage. The number of license purchases only includes purchases by users who joined after the start of the experiment, so there will be some time before the counts reach their steady state. As noted earlier, we'll assume that the potentially muddying effects of visits across multiple days, established user visits, and 'lost' cookie tracking will be ignorable, at least unless we find reason to doubt our findings.

In [13]:
# import packages

import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.stats import proportion as proptests

import matplotlib.pyplot as plt
%matplotlib inline

In [14]:
# import data

data = pd.read_csv('data/homepage-experiment-data.csv')
data.head(10)

Unnamed: 0,Day,Control Cookies,Control Downloads,Control Licenses,Experiment Cookies,Experiment Downloads,Experiment Licenses
0,1,1764,246,1,1850,339,3
1,2,1541,234,2,1590,281,2
2,3,1457,240,1,1515,274,1
3,4,1587,224,1,1541,284,2
4,5,1606,253,2,1643,292,3
5,6,1681,287,3,1780,299,3
6,7,1534,262,5,1555,276,8
7,8,1798,331,12,1787,326,20
8,9,1478,223,30,1553,298,38
9,10,1461,236,32,1458,289,23


## Checking the Invariant Metric

First, we should check our invariant metric, the number of cookies assigned to each group. If there is a statistically significant difference detected, then we shouldn't move on to the evaluation metrics right away. We'd need to first dig deeper to see if there was an issue with the group-assignment procedure, or if there is something about the manipulation that affected the number of cookies observed, before we feel secure about analyzing and interpreting the evaluation metrics.

In [15]:
# Get number of cookies
n_control = data['Control Cookies'].sum()
n_experiment = data['Experiment Cookies'].sum()
n_total = n_control + n_experiment
print('n_total = {}\nn_control = {}\nn_experiment = {}'.format(n_total, n_control, n_experiment))

n_total = 94197
n_control = 46851
n_experiment = 47346


In [16]:
# Compute z-score and p-value
p = 0.5
sd = np.sqrt(p * (1 - p) * n_total)
z = ((n_control + 0.5) - p * n_total) / sd

print('z = ', z)
print('p = ', 2 * stats.norm.cdf(z))

z =  -1.6095646049678511
p =  0.10749294050130412


## Checking the Evaluation Metrics

Assuming that the invariant metric passed inspection, we can move on to the evaluation metrics: download rate and license purchasing rate. For a refresher, the download rate is the total number of downloads divided by the number of cookies, and the license purchasing rate the number of licenses divided by the number of cookies.
One tricky point to consider is that there is a seven or eight day delay between when most people download the software and when they make a purchase. There's no direct way of attributing cookies all the way through license purchases due to the daily aggregation of results, so the best we can do is to make a justified argument for handling the data. To answer the question below about the license purchasing rate, you should only take the cookies observed through day 21 as the denominator of the ratio as being responsible for all of the license purchases observed. (A more informed model of license purchasing could come up with a different handling of the data, such as including part of the day 22 cookies in the denominator.) (Note that we don't need to perform this kind of correction for the download rate, since the link between homepage visits and downloads is much closer.)

### Downloads

In [17]:
n_control_downloads = data['Control Downloads'].sum()
p_control_downloads = n_control_downloads / n_control    # p_null
print('p_control_downloads = ', p_control_downloads)
n_experiment_downloads = data['Experiment Downloads'].sum()
p_experiment_downloads = n_experiment_downloads / n_experiment
print('p_experiment_downloads = ', p_experiment_downloads)

p_control_downloads =  0.16123455209067042
p_experiment_downloads =  0.180543234908968


In [18]:
# Compute standard error, z-score, and p-value
se_p = np.sqrt(p_control_downloads * (1 - p_control_downloads) * (1 / n_control + 1 / n_experiment))
z = (p_experiment_downloads - p_control_downloads) / se_p
print('z = ', z)
print('p = ', 1 - stats.norm.cdf(z))

z =  8.05723199177085
p =  4.440892098500626e-16


### Licenses

In [19]:
# Count cookies only for the first 21 days
#n_control_21 = data['Control Cookies'][:22].sum()    # assumes records are sorted by day, but safer:
n_control_21 = data.query('Day < 22')['Control Cookies'].sum()
#n_experiment_21 = data['Experiment Cookies'][:22].sum()    # assumes records are sorted by day, but safer:
n_experiment_21 = data.query('Day < 22')['Experiment Cookies'].sum()

n_control_licenses = data['Control Licenses'].sum()
p_control_licenses = n_control_licenses / n_control_21    # p_null
print('p_control_licenses = ', p_control_licenses)
n_experiment_licenses = data['Experiment Licenses'].sum()
p_experiment_licenses = n_experiment_licenses / n_experiment_21
print('p_experiment_licenses = ', p_experiment_licenses)

p_control_licenses =  0.021032051661828307
p_experiment_licenses =  0.021317490826489604


In [22]:
# Compute standard error, z-score, and p-value
se_p = np.sqrt(p_control_licenses * (1 - p_control_licenses) * (1 / n_control_21 + 1 / n_experiment_21))
z = (p_experiment_licenses - p_control_licenses) / se_p
print('z = ', z)
print('p = ', 1 - stats.norm.cdf(z))

z =  0.259539555695547
p =  0.39760948313293754
