<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 15px">

# A/B and R - "The Red or the Green Button"

Week 11 | Lesson 3.x

---

A/B  (A/B/C/etc.) split testing is a popular and ubiquitous technique for improving products in industry (particular the tech industry).

---

---

## A/B Testing Case Studies

In groups read about A/B testing case studies, discuss, and then present to the class on what the findings and outcome of the case study was.


#### Case study 1: How Obama raised 60 million dollars

https://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/

#### Case study 2: How AMD increased social sharing by 3600%

https://vwo.com/blog/amd-3600-social-sharing-increase/

#### Case study 3: Failed A/B tests increase conversion

http://unbounce.com/a-b-testing/failed-ab-test-results/

#### Case study 4: When good design is bad for business

https://vwo.com/blog/good-design-bad-conversion-rate/



---

### Setup of a Split Test

Companies running as split test, and particularly the data scientists responsible for the construction and analysis of the test, should consider in detail the value, purpose, and setup of the test before beginning any _technical_ work. Four essential considerations before beginning a test are:

**1. WHAT ELEMENTS WILL BE CHANGED IN THE PRODUCT?**

Data scientist typically work closely with Product or Project Managers (PMs). You will often have limited say in what elements are changed for a test, but this does not mean you should avoid "weighing in". In fact, it is essential as a data scientist to clearly communicate your opinion of the test since you are the most statistically savvy. At this stage in the process, ensuring the smallest/most limited changes are made will prevent false correlations in the data and have the most meaningful results.

**2. WHO WILL BE PART OF THE TESTING GROUPS (ARMS) AND BY HOW MUCH?**

Will the test split incoming traffic 50/50 between variants? Should you serve the variant under test to a smaller group? Will the test split change? A common and safe practice is to begin by only showing the new variant(s) to a very small proportion of users to ensure there is nothing very wrong with the change, then performing the actual split test on a larger proportion of users.

**3. HOW LONG WILL THE TEST RUN?**

This is a very important question to ask. If the test doesn't run long enough, your data won't be useful. If it runs too long, that can impact business needs. _In the standard split test procedure, you cannot check the results multiple times!_ We will examine and explain why in a later section.

**4. IS THE THE TEST NECESSARY? WHY?**

A/B testing is a gamble, and potentially an expensive one. If the business result of the test is less valuable than the possible negative effects on churn or conversion rate, then it is worth re-evaluating your variants and design.

---

### Requirements of the Standard Split Test

As with any scientific test, there are requirements for ensuring that the experimental design and results are valid. Below are described the requirements for constructing a _standard_ split test, though variations exist that attempt to get around one or more of these.

---

**SPLIT TEST REQUIREMENTS**

1. **Randomized design:** users are randomly assigned to one of the arms of the test as they visit the site/app. 
- **Win criteria set in advance:** you must decide before running the test what the condition for accepting a change will be.
- **One thing changed per test:** the control and experiemental conditions only vary on one change. Note that more than one change per arm is not statistically invalid, but makes results difficult or impossible to interperet.
- **Split test is evaluated only once:** The test ends at a specific point – there is no "peeking" at the rates as the test runs.

The final point is the most commonly violated and also the biggest pain point when working with other departments in your company. The most popular variations on the standard split test design try to address this issue. We will discuss this more in depth later.

---

**TWO COMMON SPLIT TEST MISCONCEPTIONS**

1. **Split testing can only compare two versions.** This is not true, and multiple arm tests are often referred to as A/B/C, A/B/C/D, etc.
2. **Split test arms must have equal fractions of users.** There is no statistical reason that your arms must have equal splits (other than requiring fewer data points).


In [2]:
from numpy import *
import scipy as sp
from pandas import *

In [3]:
# !conda install -c r rpy2 

In [4]:
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
import pandas.rpy.common as com

See here for a guide on how to port your code to rpy2: http://pandas.pydata.org/pandas-docs/stable/r_interface.html
  app.launch_new_instance()


In [5]:
ro.r('x=c()')
ro.r('x[1]=22')
ro.r('x[2]=44')
print(ro.r('x'))
print(ro.r['x'])

[1] 22 44

[1] 22 44



### How the Power of a Test Translates into Sample Requirement

In [6]:
# p1 = original clickthrough
# p2 = minimal gain for significance for clickthrough
# significance level = 0.05

ro.r('power.prop.test(p1 = .03, p2 = .033, sig.level =0.05, power = .90)')

R object with classes: ('power.htest',) mapped to:
<ListVector - Python:0x117b3f248 / R:0x118859e70>
[Float..., Float..., Float..., ..., StrVe..., StrVe..., StrVe...]
  n: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x117b5a908 / R:0x11816ba38>
[71232.997518]
  p1: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x117b6a7a0 / R:0x11811f938>
[0.030000]
  p2: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x117b6ab90 / R:0x11811f8a8>
[0.033000]
  ...
  n: <class 'rpy2.robjects.vectors.StrVector'>
  R object with classes: ('character',) mapped to:
<StrVector - Python:0x117b6acb0 / R:0x118150a38>
[str]
  p1: <class 'rpy2.robjects.vectors.StrVector'>
  R object with classes: ('character',) mapped to:
<StrVector - Python:0x117b6ad88 / R:0x1181510f8>
[str]
  p2: <class 'rpy2.robjects.vector

In [7]:
ro.r('data(mtcars)')

R object with classes: ('character',) mapped to:
<StrVector - Python:0x117b63a70 / R:0x100ce4dc8>
[str]

In [8]:
pydf = com.load_data('mtcars')

In [9]:
pydf.describe()

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
count,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0
mean,20.090625,6.1875,230.721875,146.6875,3.596563,3.21725,17.84875,0.4375,0.40625,3.6875,2.8125
std,6.026948,1.785922,123.938694,68.562868,0.534679,0.978457,1.786943,0.504016,0.498991,0.737804,1.6152
min,10.4,4.0,71.1,52.0,2.76,1.513,14.5,0.0,0.0,3.0,1.0
25%,15.425,4.0,120.825,96.5,3.08,2.58125,16.8925,0.0,0.0,3.0,2.0
50%,19.2,6.0,196.3,123.0,3.695,3.325,17.71,0.0,0.0,4.0,2.0
75%,22.8,8.0,326.0,180.0,3.92,3.61,18.9,1.0,1.0,4.0,4.0
max,33.9,8.0,472.0,335.0,4.93,5.424,22.9,1.0,1.0,5.0,8.0


In [10]:
ro.r('''fit=lm(mpg ~ wt + cyl, data=mtcars)''')

R object with classes: ('lm',) mapped to:
<ListVector - Python:0x117b5af38 / R:0x102b22a00>
[Float..., Float..., Float..., ..., Vector, Formula, DataF...]
  coefficients: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x117d911b8 / R:0x101c9db70>
[39.686261, -3.190972, -1.507795]
  residuals: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x117d910e0 / R:0x10d5f8f80>
[-1.279145, -0.465447, -3.452026, ..., -2.100499, -1.232131, -3.384179]
  effects: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x117d91248 / R:0x10d5fa490>
[-113.649737, -29.115722, -9.335415, ..., -2.042178, -1.606972, -2.428575]
  ...
  coefficients: <class 'rpy2.robjects.vectors.Vector'>
  R object with classes: ('lm',) mapped to:
<Vector - Python:0x117d914d0 / R:0x1180d1200>
[RNULLType, Vector, Vector]
  residuals: 

In [11]:
print(ro.r('summary(fit)'))


Call:
lm(formula = mpg ~ wt + cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2893 -1.5512 -0.4684  1.5743  6.1004 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  39.6863     1.7150  23.141  < 2e-16 ***
wt           -3.1910     0.7569  -4.216 0.000222 ***
cyl          -1.5078     0.4147  -3.636 0.001064 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.568 on 29 degrees of freedom
Multiple R-squared:  0.8302,	Adjusted R-squared:  0.8185 
F-statistic: 70.91 on 2 and 29 DF,  p-value: 6.809e-12




In [12]:
from numpy import *
import scipy as sp
from pandas import *
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
import pandas.rpy.common as com

In [13]:
stats = importr('stats')
base = importr('base')
datasets = importr('datasets')

In [14]:
df = com.load_data('mtcars')


In [15]:
rdf = com.convert_to_r_dataframe(df)

In [16]:
formula = 'mpg ~ wt + cyl'

In [17]:
fit_full = stats.lm(formula, data=rdf)
print(base.summary(fit_full))


Call:
(function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w)) 
        stop("'weights' must be a numeric vector")
    offset <- as.vector(model.offset(mf))
    if (!is.null(offset)) {
        if (length(offset) != NROW(y)) 
            stop(gettextf("number of offsets is %d, shou

In [19]:
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
graphics = importr('graphics')
grdevices = importr('grDevices')
base = importr('base')
stats = importr('stats')
# ro.r('sys.putenv("DISPLAY"=":0.0")')

import array

x = array.array('i', range(10))
y = stats.rnorm(10)

# grdevices.X11()

graphics.par(mfrow = array.array('i', [2,2]))
graphics.plot(x, y, ylab = "foo/bar", col = "red")

kwargs = {'ylab':"foo/bar", 'type':"b", 'col':"blue", 'log':"x"}
graphics.plot(x, y, **kwargs)


m = base.matrix(stats.rnorm(100), ncol=5)
pca = stats.princomp(m)
graphics.plot(pca, main="Eigen values")
stats.biplot(pca, main="biplot")

rpy2.rinterface.NULL