#  Can the Scent of Chocolate Get You to Spend More in a Book Store? (COMPLETE)

## Chapter 4.10-4.13 Understanding How Randomness Can Generate Patterns of Data

In [None]:
# This code will load the R packages we will use
library(coursekata)

# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles_v2.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

# Load the data
# https://r-packages.io/datasets/mcgrath
install.packages("experimentr")
library(experimentr)
mcgrath$treatment <- factor(mcgrath$treatment)

<div class="teacher-note">
    <b>Teacher Note:</b> The purpose of this mini-JNB is to practice thinking about randomness as a DGP, and using shuffle() to explore the range of outcomes a random DGP might produce.

## 1 About the `mcgrath` Data

Researcher Mary McGrath and colleagues conducted an experiment to test the effects of ambient scents on consumer behavior. They went to a bookstore in Canada that had an adjoining cafe, and across a span of 31 experimental days, the researchers randomly assigned each day to be either a treatment or a control day. On treatment days, they would release the scent of chocolate throughout the bookstore. They measured store sales on all of the days. 

Below are the variables in the data frame:

- `treatment`: Treatment indicator (1 = treatment, 0 = control)
- `book`: Sales of books
- `coffee`: Sales of bulk coffee, tea, or spices
- `food`: Sales of food
- `grandtotal`: Total of book, coffee, and food sales


[data source](https://r-packages.io/datasets/mcgrath)

McGrath, Mary C., et al. “Chocolate Scents and Product Sales: A Randomized Controlled Trial in a Canadian Bookstore and Café.” SpringerPlus, vol. 5, no. 1, 2016, https://doi.org/10.1186/s40064-016-2303-5.

### 1.1 Take a look at the `mcgrath` data frame.

In [None]:
# 1.1
# run code here


In [None]:
# 1.1
# sample response

str(mcgrath)

## 2 Thinking about the DGP

We could explore a few ideas with this dataset, but let's start with this question: Will ambient chocolate scent affect book sales?

### 2.1 Write the two possible word equations:

1. Write the researchers' hypothesis as a word equation.

2. Write the word equation that represents the idea that `treatment` does NOT explain variation in the outcome variable.

2.1 Response:

<div class="teacher-note">

<b>Sample Response</b>: 
   
1. book = treatment + other stuff
    
2. book = other stuff
    
</div>

### 2.2 If it turns out that `treatment` does indeed explain variation in book sales, will we be able to say that it also *causes* variation in book sales? Why or why not?

2.2 Response:

<div class="teacher-note">

<b>Sample Response</b>: Yes, because the experimental design involved random assignment of the treatment condition.
</div>

In a bit, we will make a jitter plot to look at the distribution of `book` by `treatment`. Before we do, let’s think about what we might expect to see. 

### 2.3 If there is such a thing as an affect of chocolate scent on book sales, would the jitter plot look like the one on the left or the right? Explain why you think so.


<img src="https://i.imgur.com/js9QYw5.png" title="On the left, a jitter plot where the one distribution is generally lower than the other. On the right, a jitter plot where the distributions are roughly similar" width = 600/>

2.3 Response:

<div class="teacher-note">

<b>Sample Response</b>: We would expect it to look more like the plot on the left because it shows one group tends to be associated with higher values, while the plot on the right has a lot of overlap between the two groups, and there doesn't seem to be much of a difference between the groups.
</div>

### 2.4 Which word equation would go with each plot?

2.4 Response:

<div class="teacher-note">

<b>Sample Response</b>: 
   
1. book = treatment + other stuff --> plot on the left
    
2. book = other stuff --> plot on the right
    
</div>

## 3 Visualize the Hypothesis

### 3.1 Create a jitter plot to explore the hypothesis.

In [None]:
# 3.1
# run code here



In [None]:
# 3.1
# sample response
gf_jitter(book~treatment, data = mcgrath, width = .2, color = "chocolate4")

### 3.2 Does it look as if `treatment` explains variation in `book`? Use the features of the distribution to make one argument for "yes" and one argument for "no"?

3.2 Response:

<div class="teacher-note">

<b>Sample Response</b>: 
   
1. Yes, it does: The chocolate scent group has some of the highest book sale days, and the no chocolate scent group has some of the lowest book sale days.
    
2. No, it doesn't: The distributions have A LOT of overlap. There is not very much difference between the two groups.
    
</div>

## 4 Could it be Random?

To figure out which of these is the better representation of the DGP:

1. book = treatment + other stuff
2. book = other stuff

Let's conduct a bunch of simulations of the `book = other stuff` DGP (where `treatment` doesn't matter) by mixing up the values of `book` so they are randomly matched to a new row.

### 4.1 Compare the distribution of our data to the distributions produced by simulations of a randomized DGP (run the simulation at least 10 times). What is similar and what is different across each shuffled plot?

In [None]:
# 4.1
# shuffle `book` in your plot


# copy and paste the plot code multiple times to see many at once



In [None]:
# 4.1
# Sample Code
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")
gf_jitter(shuffle(book)~treatment, data = mcgrath, width = .2, color = "chocolate4")

4.1 Response:

<div class="teacher-note">

<b>Sample Response</b>: 
   
- Similarities: They have a similar range and spread.
    
- Differences: The individual data points randomly appear in different treatment groups in each plot. Each time you run it, the distribution is slightly varied.
    
</div>

 ### 4.2 Find the case with the highest book sales. What happens to it each time you shuffle the plot?

4.2 Response:

<div class="teacher-note">

<b>Sample Response</b>: It randomly switches from the treatment to the no treatment group.
    
</div>

### 4.3 Given your analyses, which word equation of the DGP does the data appear to match more closely?

4.3 Response:

<div class="teacher-note">

<b>Sample Response</b>: book = other stuff
    
</div>

### 4.4 What does that suggest about the researchers' hypothesis? Is there an effect of scent, or does it appear to be other stuff affecting book sales?

4.4 Response:

<div class="teacher-note">

<b>Sample Response</b>: There does not appear to be an effect of chocolate scent on book sales.
    
</div>