**Flowbot pricing test A/B testing data challenge**


# Pricing test

## Goal

Pricing optimization is, non surprisingly, another area where data science can provide huge value. 
The goal here is to evaluate whether a pricing test running on the site has been successful. As always, you should focus on user segmentation and provide insights about segments who behave differently as well as any other insights you might find. 


## Summary of results
1.	Should the company sell its software for USD39 or USD59?
    - The company should sell for USD 59 because it leads to increased revenue despite fewer users. With different statistical approaches...
    
2.	What are your main findings looking at the data?
    - Certain segments... ... 
    
3.	After how many days you would have stopped the test?
    - From power analysis...


## Data & Features (TBD)

## Exercise description

Company XYZ sells a software for USD39. Since revenue has been flat for some time, the VP of Product has decided to run a test increasing the price. She hopes that this would increase revenue. In the experiment, 66% of the users have seen the old price (USD39), while a random sample of 33% users a higher price (USD59). 
The test has been running for some time and the VP of Product is interested in understanding how it went and whether it would make sense to increase the price for all the users. 

Especially he asked you the following questions: 
 - **Should the company sell its software for USD39 or USD59?**

- The VP of Product is interested in having a holistic view into user behavior, especially focusing on actionable insights that might increase conversion rate.
- **What are your main findings looking at the data?**

- [Bonus] The VP of Product feels that the test has been running for too long and he should have been able to get statistically significant results in a shorter time. Do you agree with her intuition?
- **After how many days you would have stopped the test?** Please, explain why. 


## Setup

In [1]:
# Import libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
import seaborn as sns
import pandas_profiling as pp
import etsy_py
from scipy.stats import shapiro
from scipy.stats import skew
from scipy.stats import kurtosis

In [2]:
# Stats
from scipy.stats import mannwhitneyu
from scipy.stats import ttest_ind
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats import multitest

In [18]:
# Code formatting Jupyter black
%load_ext nb_black

# Colors
B_beige = "#CDA577"
B_brown = "#643E34"
B_slate = "#3F5B66"
B_dkgray = "#5A7E8E"
B_ltgray = "#6D949B"
B_green = "#01CB8B"
B_lime = "#D3F04A"

B_colors = [B_beige, B_brown, B_slate, B_dkgray, B_ltgray, B_green, B_lime]
B_colors_cat = [B_beige, B_green, B_brown, B_ltgray, B_slate, B_lime, B_dkgray]

The nb_black extension is already loaded. To reload it, use:
  %reload_ext nb_black


<IPython.core.display.Javascript object>

# Initial data analysis

In [4]:
def initial_analysis(df):
    """
    Given a dataframe produces a simple report on initial data analytics
    Params:
        - df 
    Returns:
        - Shape of dataframe records and columns
        - Columns and data types
    """
    print("Report of Initial Data Analysis:\n")
    print(f"Shape of dataframe: {df.shape}")
    print(f"Features and Data Types: \n {df.dtypes}")

<IPython.core.display.Javascript object>

In [5]:
def percent_missing(df):
    """
    Given a dataframe it calculates the percentage of missing records per column
    Params:
        - df
    Returns:
        - Dictionary of column name and percentage of missing records
    """
    col = list(df.columns)
    perc = [round(df[c].isna().mean() * 100, 2) for c in col]
    miss_dict = dict(zip(col, perc))
    return miss_dict

<IPython.core.display.Javascript object>

In [7]:
# os.getcwd()
os.chdir(
    "/Users/lacar/Documents/Goals_and_careers/Edu_Data_Science/Insight/data_challenges"
)
os.listdir()

['.DS_Store',
 'dc2_credit_card_unsupervised',
 'BL_data_challenge_notebook.ipynb',
 'Building Data Challenge Notebook.ipynb',
 'dc1_breast_cancer_supervised',
 'dc3_product_analytics',
 '.ipynb_checkpoints',
 'brl_data_challenges',
 'flowbot_data_challenge']

<IPython.core.display.Javascript object>

## Results df overview

In [8]:
# Load data
df_results = pd.read_csv(
    "flowbot_data_challenge/Pricing_test_DataChallenge/Pricing_Test_data/test_results.csv"
)
df_results.head()

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0
2,317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0
3,685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0
4,820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0


<IPython.core.display.Javascript object>

In [10]:
initial_analysis(df_results)

Report of Initial Data Analysis:

Shape of dataframe: (316800, 8)
Features and Data Types: 
 user_id              int64
timestamp           object
source              object
device              object
operative_system    object
test                 int64
price                int64
converted            int64
dtype: object


<IPython.core.display.Javascript object>

## Users df overview

In [12]:
df_users = pd.read_csv(
    "flowbot_data_challenge/Pricing_test_DataChallenge/Pricing_Test_data/user_table.csv"
)
df_users.head()

Unnamed: 0,user_id,city,country,lat,long
0,510335,Peabody,USA,42.53,-70.97
1,89568,Reno,USA,39.54,-119.82
2,434134,Rialto,USA,34.11,-117.39
3,289769,Carson City,USA,39.15,-119.74
4,939586,Chicago,USA,41.84,-87.68


<IPython.core.display.Javascript object>

In [11]:
initial_analysis(df_users)

Report of Initial Data Analysis:

Shape of dataframe: (275616, 5)
Features and Data Types: 
 user_id      int64
city        object
country     object
lat        float64
long       float64
dtype: object


<IPython.core.display.Javascript object>

## Merge data

In [13]:
df_merge = pd.merge(
    df_results, df_users, how="inner", left_on="user_id", right_on="user_id"
)
df_merge.head()

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted,city,country,lat,long
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0,Buffalo,USA,42.89,-78.86
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0,Lakeville,USA,44.68,-93.24
2,317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0,Parma,USA,41.38,-81.73
3,685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0,Fayetteville,USA,35.07,-78.9
4,820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0,Fishers,USA,39.95,-86.02


<IPython.core.display.Javascript object>

In [14]:
### Verify group assignments
df_merge.groupby(["test", "price"]).count()["user_id"]

test  price
0     39       176241
      59          187
1     39          135
      59        99053
Name: user_id, dtype: int64

<IPython.core.display.Javascript object>

In [15]:
# Drop the users that are mis-assigned
bool_mis = ((df_merge["test"] == 0) & (df_merge["price"] == 59)) | (
    (df_merge["test"] == 1) & (df_merge["price"] == 39)
)
# bool_mis.sum()
df_merge = df_merge.loc[~bool_mis, :]

<IPython.core.display.Javascript object>

In [16]:
df_merge.shape

(275294, 12)

<IPython.core.display.Javascript object>

# Exploratory data analysis

In [19]:
# Set context
sns.set_context(
    "talk", rc={"font.size": 14, "axes.titlesize": 14, "axes.labelsize": 14}
)

<IPython.core.display.Javascript object>

## Calculate the difference

In [20]:
# Generate revenue per person
df_merge["revenue"] = df_merge["price"] * df_merge["converted"]

<IPython.core.display.Javascript object>

In [21]:
df_merge.head()

Unnamed: 0,user_id,timestamp,source,device,operative_system,test,price,converted,city,country,lat,long,revenue
0,604839,2015-05-08 03:38:34,ads_facebook,mobile,iOS,0,39,0,Buffalo,USA,42.89,-78.86,0
1,624057,2015-05-10 21:08:46,seo-google,mobile,android,0,39,0,Lakeville,USA,44.68,-93.24,0
2,317970,2015-04-04 15:01:23,ads-bing,mobile,android,0,39,0,Parma,USA,41.38,-81.73,0
3,685636,2015-05-07 07:26:01,direct_traffic,mobile,iOS,1,59,0,Fayetteville,USA,35.07,-78.9,0
4,820854,2015-05-24 11:04:40,ads_facebook,web,mac,0,39,0,Fishers,USA,39.95,-86.02,0


<IPython.core.display.Javascript object>

In [46]:
groupA = df_merge[df_merge["test"] == 0].loc[:, "revenue"].copy()
groupB = df_merge[df_merge["test"] == 1].loc[:, "revenue"].copy()

n_A = len(groupA)
n_B = len(groupB)

se_A = groupA.std() / np.sqrt(n_A)
se_B = groupB.std() / np.sqrt(n_B)

<IPython.core.display.Javascript object>

In [34]:
print("Group A mean, SD, SE: ", groupA.mean(), groupA.std(), se_A)
print("Group B mean, SD, SE: ", groupB.mean(), groupB.std(), se_B)

Group A mean, SD, SE:  0.7709670281035628 5.428949361675566 0.01293189937458914
Group B mean, SD, SE:  0.9113302979213148 7.275884764868449 0.02311809240828995


<IPython.core.display.Javascript object>

In [36]:
se_of_diff = np.sqrt(se_A ** 2 + se_B ** 2)
print("SE of difference: ", combined_SE)

SE of difference:  0.026489247215289686


<IPython.core.display.Javascript object>

Generate a 95% confidence interval for the difference


$\bar{x_A} - \bar{x_B} \pm 1.96*(se(\bar{x_A} - \bar{x_B}))$



In [44]:
lower_bound = np.abs(groupA.mean() - groupB.mean()) - 1.96 * (se_of_diff)
upper_bound = np.abs(groupA.mean() - groupB.mean()) + 1.96 * (se_of_diff)

if groupA.mean() > groupB.mean():
    higher_group = "group A"
else:
    higher_group = "group B"

print(
    "Interval for difference is: Between ",
    "{:.3f}".format(lower_bound) + " and ",
    "{:.3f}".format(upper_bound),
)

print(higher_group + " is greater")

Interval for difference is: Between  0.088 and  0.192
group B is greater


<IPython.core.display.Javascript object>

## What is the right sample size?



### Same SD for each group, change n for each group

Let's assume the means and SD stay the same (a big assumption, but the sample sizes are big). Let's keep the approximate ratio of samples (B/A is 0.56203).

In [50]:
print("Sample sizes (A and B): ", n_A, n_B)

Sample sizes (A and B):  176241 99053


<IPython.core.display.Javascript object>

In [58]:
print(n_B / n_A)

0.5620315363621404


<IPython.core.display.Javascript object>

In [54]:
def return_ci(sample_A_size, sample_B_size):

    se_A_sample = groupA.std() / np.sqrt(sample_A_size)
    se_B_sample = groupB.std() / np.sqrt(sample_B_size)

    se_pooled_of_diff = np.sqrt(se_A_sample ** 2 + se_B_sample ** 2)

    lower_bound_test = np.abs(groupA.mean() - groupB.mean()) - 1.96 * (
        se_pooled_of_diff
    )
    upper_bound_test = np.abs(groupA.mean() - groupB.mean()) + 1.96 * (
        se_pooled_of_diff
    )

    if groupA.mean() > groupB.mean():
        higher_group = "group A"
    else:
        higher_group = "group B"

    print(
        "Interval for difference is: Between ",
        "{:.3f}".format(lower_bound_test) + " and ",
        "{:.3f}".format(upper_bound_test),
    )

    print(higher_group + " is greater")

<IPython.core.display.Javascript object>

In [59]:
# Test different n values
n_testA = len(groupA)
n_testB = round(0.56203 * n_testA)

return_ci(n_testA, n_testB)

Interval for difference is: Between  0.088 and  0.192
group B is greater


<IPython.core.display.Javascript object>

In [63]:
for test_size_a in [175000, 150000, 125000, 100000, 75000, 50000, 25000, 10000]:
    n_testA = test_size_a
    n_testB = round(0.56203 * n_testA)
    print("sample sizes: ", n_testA, n_testB)
    return_ci(n_testA, n_testB)
    print("\n")

sample sizes:  175000 98355
Interval for difference is: Between  0.088 and  0.192
group B is greater


sample sizes:  150000 84304
Interval for difference is: Between  0.084 and  0.197
group B is greater


sample sizes:  125000 70254
Interval for difference is: Between  0.079 and  0.202
group B is greater


sample sizes:  100000 56203
Interval for difference is: Between  0.071 and  0.209
group B is greater


sample sizes:  75000 42152
Interval for difference is: Between  0.061 and  0.220
group B is greater


sample sizes:  50000 28102
Interval for difference is: Between  0.043 and  0.238
group B is greater


sample sizes:  25000 14051
Interval for difference is: Between  0.003 and  0.278
group B is greater


sample sizes:  10000 5620
Interval for difference is: Between  -0.078 and  0.358
group B is greater




<IPython.core.display.Javascript object>

### Use a sample size for a desired margin of error

(z * (est. of variability) / margin)^2


In [None]:
# Using online calculator

# Two-sided test: mu1=groupA_mean, mu2=groupB_mean, "sigma"=pooled_SE, alpha=.05


# One-sided test: mu1=groupA_mean, mu2=groupB_mean, "sigma"=pooled_SE, alpha=.025

