# A/B Test: A New Menu Launch
## Project Overview
You're a business analyst for Round Roasters, a coffee restaurant in the United States of America. The executive team conducted a market test with a new menu and needs to figure whether the new menu can drive enough sales to offset the cost of marketing the new menu. Your job is to analyze the A/B test and write up a recommendation to whether the Round Roasters chain should launch this new menu.

In [2]:
# Load package
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind
from sklearn.neighbors import KDTree
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

## Step 1: Plan Your Analysis
To perform the correct analysis, you will need to prepare a data set. Prior to rolling up your sleeves and preparing the data, it’s a good idea to have a plan of what you need to do in order to prepare the correct data set. A good plan will help you with your analysis. Here are a few questions to get you started:

-What is the performance metric you’ll use to evaluate the results of your test?  
-What is the test period?  
-At what level (day, week, month, etc.) should the data be aggregated?  

In [4]:
# load Stores data
stores_data = pd.read_csv('round-roaster-stores.csv')
stores_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 133 entries, 0 to 132
Data columns (total 20 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   StoreID                  133 non-null    int64  
 1   Sq_Ft                    133 non-null    int64  
 2   AvgMonthSales            133 non-null    int64  
 3   Right_Name               133 non-null    object 
 4   Phone Number             130 non-null    object 
 5   Street Combined          133 non-null    object 
 6   Street 1                 133 non-null    object 
 7   Street 2                 37 non-null     object 
 8   Street 3                 25 non-null     object 
 9   City                     133 non-null    object 
 10  State                    133 non-null    object 
 11  Postal Code              133 non-null    int64  
 12  Region                   133 non-null    object 
 13  Country                  133 non-null    object 
 14  Coordinates              1

In [7]:
stores_data.head()

Unnamed: 0,StoreID,Sq_Ft,AvgMonthSales,Right_Name,Phone Number,Street Combined,Street 1,Street 2,Street 3,City,State,Postal Code,Region,Country,Coordinates,Latitude,Longitude,Timezone,Current Timezone Offset,Olson Timezone
0,10018,1183,18000,Bellflower & Spring,562-420-1317,"2890 N Bellflower Blvd, #A-1, The Los Altos Ma...",2890 N Bellflower Blvd,#A-1,The Los Altos Marketplace,Long Beach,CA,908151125,West,US,"(33.8085823059082, -118.124931335449)",33.808582,-118.124931,Pacific Standard Time,-480,GMT-08:00 America/Los_Angeles
1,10068,1198,16000,"Foothill & Boston, La Crescenta",818-541-7693,"3747 Foothill Boulevard, A",3747 Foothill Boulevard,A,,La Cresenta,CA,912141700,West,US,"(34.2375450134277, -118.26114654541)",34.237545,-118.261146,Pacific Standard Time,-480,GMT-08:00 America/Los_Angeles
2,10118,1204,13000,Magic Mountain & Tourney,661-260-0844,"25349 Wayne Mills Place, Tourney Retail Plaza",25349 Wayne Mills Place,,Tourney Retail Plaza,Valencia,CA,913551827,West,US,"(34.4237022399902, -118.579261779785)",34.423702,-118.579262,Pacific Standard Time,-480,GMT-08:00 America/Los_Angeles
3,10168,1195,19000,Alameda & Shelton,(818) 557-6604,"1190 Alameda Avenue, Suite G-2",1190 Alameda Avenue,Suite G-2,,Burbank,CA,915062806,West,US,"(34.1625328063965, -118.314529418945)",34.162533,-118.314529,Pacific Standard Time,-480,GMT-08:00 America/Los_Angeles
4,10218,1193,15000,Victoria Gardens,909-646-8562,"12466 North Main Street, Space 3340, Victoria ...",12466 North Main Street,Space 3340,Victoria Gardens,Rancho Cucamonga,CA,917398886,West,US,"(34.1120910644531, -117.533462524414)",34.112091,-117.533462,Pacific Standard Time,-480,GMT-08:00 America/Los_Angeles


In [5]:
# load Transactions data
transactions_data = pd.read_csv('RoundRoastersTransactions.csv')
transactions_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4332333 entries, 0 to 4332332
Data columns (total 10 columns):
 #   Column          Dtype  
---  ------          -----  
 0   StoreID         int64  
 1   Invoice Number  int64  
 2   Invoice Date    object 
 3   SKU             int64  
 4   Category        object 
 5   Product         object 
 6   QTY             int64  
 7   Size            object 
 8   Gross Margin    float64
 9   Sales           float64
dtypes: float64(2), int64(4), object(4)
memory usage: 330.5+ MB


In [6]:
transactions_data.head()

Unnamed: 0,StoreID,Invoice Number,Invoice Date,SKU,Category,Product,QTY,Size,Gross Margin,Sales
0,10018,16296643,2015-01-21,1043,Espresso,Mocha,3,L,6.7365,14.97
1,10018,16296643,2015-01-21,2001,Pastry,Croissant,1,,1.1,2.75
2,10018,16297717,2015-01-21,1021,Espresso,Espresso,3,S,4.185,8.37
3,10018,16297717,2015-01-21,1022,Espresso,Espresso,4,M,5.98,11.96
4,10018,16297717,2015-01-21,1023,Espresso,Espresso,3,L,4.785,9.57


In [8]:
# load Treatment Stores data
treatment_stores_data = pd.read_csv('treatment-stores.csv')
treatment_stores_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 20 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   StoreID                  10 non-null     int64  
 1   Sq_Ft                    10 non-null     int64  
 2   AvgMonthSales            10 non-null     int64  
 3   Phone Number             9 non-null      object 
 4   Street Combined          10 non-null     object 
 5   Street 1                 10 non-null     object 
 6   Street 2                 1 non-null      float64
 7   Street 3                 1 non-null      object 
 8   City                     10 non-null     object 
 9   Postal Code              10 non-null     int64  
 10  Region                   10 non-null     object 
 11  Country                  10 non-null     object 
 12  Coordinates              10 non-null     object 
 13  Latitude                 10 non-null     float64
 14  Longitude                10 n

In [9]:
treatment_stores_data.head()

Unnamed: 0,StoreID,Sq_Ft,AvgMonthSales,Phone Number,Street Combined,Street 1,Street 2,Street 3,City,Postal Code,Region,Country,Coordinates,Latitude,Longitude,Timezone,Current Timezone Offset,Olson Timezone,Name,Right_State
0,1664,1475,11000,8478428048,"101 W. Main St., Barrington Village Center",101 W. Main St.,,Barrington Village Center,Barrington,60010,Central,US,"(42.1540565490723, -88.1362915039063)",42.154057,-88.136291,Central Standard Time,-360,GMT-06:00 America/Chicago,Barrington,IL
1,1675,1472,15000,8472531188,90 East Northwest Highway,90 East Northwest Highway,,,Mount Prospect,60056,Central,US,"(42.0633544921875, -87.9355773925781)",42.063354,-87.935577,Central Standard Time,-360,GMT-06:00 America/Chicago,Northwest Hwy & Elmhurst Rd,IL
2,1696,1471,10000,2242232528,1261 East Higgins Road,1261 East Higgins Road,,,Schaumburg,60173,Central,US,"(42.039363861084, -88.048828125)",42.039364,-88.048828,Central Standard Time,-360,GMT-06:00 America/Chicago,Higgins & Meacham,IL
3,1700,1465,15000,(224) 500-9575,17 W 633 Roosevelt Road,17 W 633 Roosevelt Road,,,Oakbrook Terrace,60181,Central,US,"(41.8600807189941, -87.9739685058594)",41.860081,-87.973969,Central Standard Time,-360,GMT-06:00 America/Chicago,Roosevelt & Summit,IL
4,1712,1456,19000,708-403-1461,15858 South LaGrange Road,15858 South LaGrange Road,,,Orland Park,60462,Central,US,"(41.603816986084, -87.8534317016602)",41.603817,-87.853432,Central Standard Time,-360,GMT-06:00 America/Chicago,159th & LaGrange,IL


## Step 2: Clean Up Your Data
In this step, you should prepare the data for steps 3 and 4. You should aggregate the transaction data to the appropriate level and filter on the appropriate data ranges. You can assume that there is no missing, incomplete, duplicate, or dirty data. You’re ready to move on to the next step when you have weekly transaction data for all stores.

## Step 3: Match Treatment and Control Units
In this step, you should create the trend and seasonality variables, and use them along with you other control variable(s) to match two control units to each treatment unit. Treatment stores should be matched to control stores in the same region. Note: Calculate the number of transactions per store per week and use 12 periods to calculate trend and seasonality.  

Apart from trend and seasonality...  

-What control variables should be considered? Note: Only consider variables in the RoundRoastersStore file.  
-What is the correlation between your each potential control variable and your performance metric? (Example of correlation matrix below)  
-What control variables will you use to match treatment and control stores?

## Step 4: Analysis and Writeup
Conduct your A/B analysis and create a short report outlining your results and recommendations.  

In an AB Analysis we use the correlation matrix to find the most correlated variable to the performance metric to include in the AB controls tool to help find the best matches.