# Experimentation and uplift testing

#### Extend your analysis from Task 1 to help you identify benchmark stores that allow you to test the impact of the trial store layouts on customer sales.

**Here is your task**<br>
Julia has asked us to evaluate the performance of a store trial which was performed in stores 77, 86 and 88.<br>
We have chosen to complete this task in R, however you will also find Python to be a useful tool in this piece of analytics. We have also provided an R solution template if you want some assistance in getting through this Task.<br>
To get started use the QVI_data dataset below or your output from task 1 and consider the monthly sales experience of each store.<br>

This can be broken down by:<br>

- total sales revenue
- total number of customers
- average number of transactions per customer

Create a measure to compare different control stores to each of the trial stores to do this write a function to reduce having to re-do the analysis for each trial store. Consider using Pearson correlations or a metric such as a magnitude distance<br>
e.g.<br>
**1-(Observed distance – minimum distance)/(Maximum distance – minimum distance)** as a measure.<br>
Once you have selected your control stores, compare each trial and control pair during the trial period. You want to test if total sales are significantly different in the trial period and if so, check if the driver of change is more purchasing customers or more purchases per customers etc.<br>

# Libraries and Data

In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
data = read.csv("QVI_data.csv")

In [3]:
data$NEW_DATE = format(as.Date(data$DATE, format = "%Y-%m-%d"), "%Y%m")

In [4]:
write.csv(data,"./QVI_W_data.csv")

# Data Processing

In [4]:
stores = data %>%
    filter(NEW_DATE==201902) %>%
           group_by(STORE_NBR,LYLTY_CARD_NBR) %>%
           summarize(totSales = sum(TOT_SALES),
                     nCustomer = n_distinct(LYLTY_CARD_NBR),
                     nTxnPerCust = n_distinct(TXN_ID),
                     nChipsPerTxn = sum(PROD_QTY),
                     avgPricePerUnit = mean(TOT_SALES))

[1m[22m`summarise()` has grouped output by 'STORE_NBR'. You can override using the
`.groups` argument.


In [5]:
stores

STORE_NBR,LYLTY_CARD_NBR,totSales,nCustomer,nTxnPerCust,nChipsPerTxn,avgPricePerUnit
<int>,<int>,<dbl>,<int>,<int>,<int>,<dbl>
1,1024,3.0,1,1,1,3.00
1,1042,4.2,1,1,1,4.20
1,1043,3.3,1,1,1,3.30
1,1046,4.2,1,1,1,4.20
1,1054,2.9,1,1,1,2.90
1,1057,5.4,1,1,1,5.40
1,1062,2.7,1,1,1,2.70
1,1064,6.4,1,2,2,3.20
1,1080,1.9,1,1,1,1.90
1,1081,3.8,1,1,2,3.80
