# A/B Test: A New Menu Launch
## Project Overview
You're a business analyst for Round Roasters, a coffee restaurant in the United States of America. The executive team conducted a market test with a new menu and needs to figure whether the new menu can drive enough sales to offset the cost of marketing the new menu. Your job is to analyze the A/B test and write up a recommendation to whether the Round Roasters chain should launch this new menu.

In [None]:
# Load package
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind, ttest_rel
from sklearn.neighbors import KDTree
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

from statsmodels.tsa.seasonal import seasonal_decompose

import matplotlib.pyplot as plt
# plt.style.use('seaborn-whitegrid')
plt.rcParams['figure.figsize'] = [12, 12]

## Step 1: Plan Your Analysis
To perform the correct analysis, you will need to prepare a data set. Prior to rolling up your sleeves and preparing the data, it’s a good idea to have a plan of what you need to do in order to prepare the correct data set. A good plan will help you with your analysis. Here are a few questions to get you started:

-What is the performance metric you’ll use to evaluate the results of your test?  
-What is the test period?  
-At what level (day, week, month, etc.) should the data be aggregated?  

In [None]:
# load Stores data
stores_data = pd.read_csv('round-roaster-stores.csv')
stores_data.info()

In [None]:
stores_data.head(3)

In [None]:
# load Transactions data
# force Invoice Date column to datetime 
transactions_data = pd.read_csv('RoundRoastersTransactions.csv', parse_dates=['Invoice Date'])
transactions_data.info()

In [None]:
transactions_data.head()

In [None]:
# load Treatment Stores data
treatment_stores_data = pd.read_csv('treatment-stores.csv')
treatment_stores_data.info()

In [None]:
treatment_stores_data.head(3)

## Step 2: Clean Up Your Data
In this step, you should prepare the data for steps 3 and 4. You should aggregate the transaction data to the appropriate level and filter on the appropriate data ranges. You can assume that there is no missing, incomplete, duplicate, or dirty data. You’re ready to move on to the next step when you have weekly transaction data for all stores.

In [None]:
# Test cities: Denver and Chicago
# Treatment: 12 Weeks [2016-April-29 to 2016-July-21], start on Friday
# Control: 12 Weeks [(2015-April-29 to 2015-July-21], start on Wednesday
# Total weeks to identify trend and season: 52 Weeks + treatment weeks + control weeks = 52 + 12 + 12 = 76 Weeks

data_end_date = datetime(2016, 7, 21)
data_start_date = data_end_date - timedelta(weeks=76)
print(f'Data Start Date: {data_start_date} \nData End Date: {data_end_date}')

In [None]:
# Filter data for further process
_trans = transactions_data.query('`Invoice Date` > @data_start_date and `Invoice Date` <= @data_end_date')
# Make Invoice Date Index
_trans.set_index('Invoice Date', inplace=True)

# Aggregate the data to get the weekly gross margin and weekly traffic count unique invoices
agg_rules = {'Gross Margin': 'sum', 'Invoice Number': 'nunique'}
_trans = _trans.groupby([pd.Grouper(freq='W-FRI', closed='left'), 'StoreID']).agg(agg_rules).reset_index()
_trans.rename(columns={'Invoice Number': 'Weekly Foot Traffic'}, inplace=True)

# Hack to start week on first date
_trans['Invoice Date'] = _trans['Invoice Date'] - pd.offsets.Week(1)

# Create Trend and Seasonal
result = seasonal_decompose(_trans['Weekly Foot Traffic'], period=12, extrapolate_trend='freq')

# Add Trend and Seasonal weekly gross data
_trans = _trans.assign(Trend = result.trend, Seasonal = result.seasonal)

# join weekly transactions and stores data 
stores_columns = ['StoreID', 'Sq_Ft', 'AvgMonthSales', 'Region']
_trans = _trans.merge(stores_data[stores_columns], on='StoreID')

# Add group variables to merged data
_is_treatment = _trans['StoreID'].isin(treatment_stores_data['StoreID'])
weekly_gross_and_traffic = _trans.assign(Group = np.where(_is_treatment, 'Treatment', 'Control'))

# Test the progress
weekly_gross_and_traffic.query('StoreID == 10018 and `Invoice Date` == @datetime(2015, 2, 6)')

In [None]:
# Filter Post and Pre test data
post_data = weekly_gross_and_traffic.query('`Invoice Date` >= @datetime(2016, 4, 29)')
pre_data = weekly_gross_and_traffic.query('`Invoice Date` >= @datetime(2015, 4, 29) and `Invoice Date` <= @datetime(2015, 7, 21)')

pre_data.head()

In [None]:
# Aggregate Pre and Post data per store
agg_store_rules = {
    'Gross Margin': 'sum', 
    'Weekly Foot Traffic': 'sum', 
    'Trend': 'sum', 
    'Seasonal': 'sum', 
    'Sq_Ft': 'first',
    'AvgMonthSales': 'first',
    'Region': 'first',
    'Group': 'first'
}

def agg_per_store(df, rule = agg_store_rules):
    return df.groupby(['StoreID'], as_index=False).agg(agg_rules)

In [None]:
# Pre Data per store
pre_data_per_store = agg_per_store(pre_data)

# Post Data per store
post_data_per_store = agg_per_store(post_data)

In [None]:
post_data_per_store.head()

## Step 3: Match Treatment and Control Units
In this step, you should create the trend and seasonality variables, and use them along with you other control variable(s) to match two control units to each treatment unit. Treatment stores should be matched to control stores in the same region. Note: Calculate the number of transactions per store per week and use 12 periods to calculate trend and seasonality.  

Apart from trend and seasonality...  

-What control variables should be considered? Note: Only consider variables in the RoundRoastersStore file.  
-What is the correlation between your each potential control variable and your performance metric? (Example of correlation matrix below)  
-What control variables will you use to match treatment and control stores?

In [None]:
# Correlation Gross Margin and Stores variables
pre_data_per_store[['Gross Margin', 'Sq_Ft', 'AvgMonthSales']].corr().round(2)


In [None]:
selected_variables = ['Trend', 'Seasonal', 'AvgMonthSales']
selected_variables

## Step 4: Analysis and Writeup
Conduct your A/B analysis and create a short report outlining your results and recommendations.  

In an AB Analysis we use the correlation matrix to find the most correlated variable to the performance metric to include in the AB controls tool to help find the best matches.

In [None]:
# Match Pre Treatment and Control Stores
pre_control_stores = pre_data_per_store.query('Group == "Control"')
pre_treatment_stores = pre_data_per_store.query('Group == "Treatment"')
regions = ['Central', 'West']
transformer = ColumnTransformer([('scaler', StandardScaler(), selected_variables)], remainder='drop')
transformer.fit(pre_control_stores)


In [None]:
# TODO
# Matched Treatment and Stores - version: 2
# return: 'Treatment store ID', 'Control store ID', 'Pre store gross', 'Post store gross', 'Region'


In [None]:
# TODO
# TTest to find if the pre and post mean of treatments are not random


In [None]:
# TODO
# compare the changes from pre and post between treatment and control
