# A/B Hands-On Exercises

In [1]:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import scipy
import seaborn as sns
from scipy import stats

## Exercise 1: Match the functionality with the library
This is a classic matching exercise. The libraries in this course are used quite often, so a quick recall of which one does what will aid in your learning across all lessons.

From the imports above match which library or libraries 
  1. is used to define models
  2. provides the data container for model results
  3. are used for plotting
  4. is specialized for Bayesian plotting
  5. is specialized for data manipulation and cleaning

## Exercise 2: Estimate conversion for Website A
We've launched a new website and your boss wants you to estimate the conversion rate. Your boss tells you that there wasn't a lot of money for testing the new website, so you only have 20 data points for the new website, **Website A**.

### Part 1: Estimating `p(Conversion)`

Give your boss an estimate of conversion rate in two ways:

1. Calculate the point value of percentage conversion from pandas or plain python
  * This is what most people would do using Excel
2. Use the model from the A/B testing lesson and give her an estimate with uncertainty
  * Hint: The posterior plot will be quite helpful

In [2]:
websites = pd.read_csv("data/websites.csv")
websites.head(20)

Unnamed: 0,Website A,Website B
0,1,0
1,1,1
2,0,0
3,0,1
4,0,0
5,1,1
6,0,0
7,1,1
8,1,0
9,1,1


In [3]:
website_a = websites["Website A"]

In [4]:
# Calculate tthe point estimte of percentage

In [5]:
# Fill in the PyMC Model

In [6]:
# Plot the posterior plot

### Part 2: The A/B test

Your boss tells you there also wasn't any money for logging the control website, Website B, so you also only have 20 samples. Run the same analysis as before, but for Website B, and compare your results. Specifically:

1. Calculate the point value of percentage conversion from pandas or plain python and draw a conclusion on which website is better.
2. As a professional statistician you have to provide advice, not just code. Based on the point estimates would you feel comfortable telling your boss Website B is better (or worse)?
3. Use your Bayesian model to also estimate the conversion rate with uncertainty. Use a forest plot to compare the results from the two models. If your boss asked which website has higher conversion, what would you advise now? How did you confidence-level in your recommendation change?

In [7]:
website_b = websites["Website B"]

In [8]:
# Calculate the point value of percentage

In [9]:
# Create a Bayesian model for the conversion of Website B

In [10]:
# Compare the two. Hint: The forest plot is quite useful here

### Part 3: Rerun your analysis with more data
You convince your boss that it's worth getting more samples. You warn her that if she doesn't invest in proper analysis she could lose out on millions of dollars in sales. She reluctantly agrees. 

Now you have 5000 samples for each website. Rerun your analysis from above, and now provide a recommendation as to which website has a better conversion rate.

In [11]:
websites = pd.read_csv("data/websites_5000_samples.csv")
websites.shape

(5000, 2)

In [12]:
website_a = websites["Website A"]

In [13]:
# Create a model for Website A

In [14]:
website_b = websites["Website B"]

In [15]:
# Create a model for Website B

In [16]:
# Compare them again

## Part 4: See the effect of more data
You also want to check your own assumptions. Compare your estimation for conversion for Website A with 20 samples versus 5000 samples.

1. Before writing any code, pause and ponder for a minute: would you be more certain or less certain with more data?
2. Use a forest plot to compare your estimation from 20 samples with 5000. Do the results match your intuition?

In [17]:
# Compare the inference data objects 

# Exercise 3: Reading docs
Documentation is an important tool in becoming a proficient computational data scientist or statistician.

Utilizing the [ArviZ documentation](https://arviz-devs.github.io/arviz/) determine how to
1. Generate a rankplot from the output of `pm.sample()`
2. Convert the Forest Plot into a Ridge Plot
3. **Optional:** Access the posterior group of the returned `InferenceData`, and get the mean of the samples across all dimensions

Use any of the inference data objects available in these exercises

In [18]:
# Generate a rank plot

In [19]:
# Generate a ridge plot

In [20]:
# Get the mean of the posterior