<div class="alert alert-block alert-danger">

# 15A: Burgernomics (COMPLETE)

**Use with textbook version 6.0+**


**Lesson assumes students have read up through page: 15.4**


</div>

<div class="alert alert-block alert-warning">

#### Summary of Notebook:

In this notebook students will examine how the price of a Big Mac has changed over time and across various regions of the world. They will examine trends in visualizations and try to predict whether parallel lines (additive models) or non-parallel lines (interaction models) would be a better fit to the data. They will also try making predictions for the future prices of a Big Mac.
 
#### Includes:

- Fitting and interpreting multivariate interaction models with one categorical and one quantitative predictor
- Comparing interaction models to additive models
- Making predictions with interaction models
- Extrapolating predictions with models

</div>

<div class="alert alert-block alert-success">

## Approximate time to complete Notebook: 45-60 Mins

</div>

In [None]:
# This code will load the R packages we will use
suppressPackageStartupMessages({
    library(coursekata)
})

bigmac <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQM7-5W4fHvTnQD2tyjK1d5RlNsT7E1TRmAPrSZL-3bBMwyVUM6TyR4iM69ly-fqnm7aQkSIxBGVFrf/pub?gid=47818490&single=true&output=csv")

## About the Data

The dataset `bigmac` contains information collected by the publication The Economist, and it is comprised of variables that can be used to show how the price of a Big Mac changes over time, and across the world. Each row represents a Big Mac sold in the world some time between 2000-22.

Data Source: [The Economist](https://www.economist.com/big-mac-index?utm_source=chartr&utm_medium=newsletter&utm_campaign=chartr_20230127)

**Description of Variables:**
	
- `country_code` Three-character ISO 3166-1 country code	
- `country` Country where Big Mac was sold	
- `price` Price of a Big Mac in US dollars	
- `region` Region where Big Mac was sold
- `continent` Continent where the Big Mac was sold
- `year` The year that Big Mac was sold (year since 2000 - 1 means 2001)

<div class="alert alert-block alert-success">

### 1.0 - Approximate Time:  10-15 mins

</div>

## 1.0 - Explore Variation

1.1 - First, take a moment to explore the data frame (look at whatever you are curious about!). 

In [None]:
# Sample Responses

head(bigmac)

# which continents, regions, and countries sell big macs?
#tally(~continent, data = bigmac)
#tally(~region, data = bigmac)
#tally(~country, data = bigmac)

# where are big macs expensive/cheap?
#head(arrange(bigmac, price))
#head(arrange(bigmac, desc(price)))

#favstats(~price, data = bigmac)

1.2 - Let's explore the price of a Big Mac with a visualization. How much does the price vary?

In [None]:
# Sample Response
gf_histogram(~price, data = bigmac, fill = "green4") %>% 
    gf_boxplot(width = 15, color = "darkgreen", alpha = .1)

<div class="alert alert-block alert-warning">

**Sample Response:**

The price varies from about \$1 to over \$8. The average price is near about \$3.

</div>

<div class="alert alert-block alert-success">

### 2.0 - Approximate Time:  35-40 mins

</div>

## 2.0 - Comparing Countries Over Time

Today we'll analyze the way the price has changed over time across different countries of your choosing. 

2.1 - How would you write the hypothesis that both time and country would explain variation in Big Mac prices as a word equation?

<div class="alert alert-block alert-warning">

**Sample Responses:**

dollar_price = year + country + error

</div>

2.2 - Modify the following code to create a new data frame called `bigmac3` that only includes 3 specific countries that you are interested in comparing. 

Then create a visualization to explore your word equation (with `bigmac3`).

In [None]:
# Sample Response
bigmac3 <- filter(bigmac, country == "Egypt" | country == "Singapore" | country == "United States")

gf_point(price ~ year, data = bigmac3, color = ~country)

2.3 - Just eyeballing the data, do you think parallel lines (additive model) would be enough? Or do you want to have lines that are not parallel (interaction model)?

<div class="alert alert-block alert-warning">

**Sample Responses:**

In our example (with Egypt, Singapore, and US), an interaction model would be more appropriate because although Singapore and US look like parallel lines, Egypt has a distinctly different pattern (more flat).

Encourage your students to make other observations about their three countries, for example in this set:
- All countries were more similar in 2005 than in 2020.
- Generally price is increasing.
- At around 2016-17, Egypt takes a bit of a dip. Students might consider googling what happened in a country during these times (e.g., in Egypt there were political protests against president Abdel Fattah Al-Sisi).

</div>

## 3.0 - Multivariate Models

3.1 - Fit the **additive model** and the **interaction model** for `price = year + country + error` using your new data frame, and put each model into a visualization. What do you notice? Which model seems to explain more of the variation?

In [None]:
# Sample Responses

# Additive Model
add_model <- lm(price ~ year + country, data = bigmac3)
gf_point(price ~ year, data = bigmac3, color = ~country, alpha = .2) %>%
    gf_model(add_model)

# Interaction Model
int_model <- lm(price ~ year * country, data = bigmac3)
gf_point(price ~ year, data = bigmac3, color = ~country, alpha = .2) %>%
    gf_model(int_model)

<div class="alert alert-block alert-warning">

**Sample Responses:**

In this example, the interaction model seems to explain more variation.
</div>

3.2 - Put what you think is the better model into GLM notation.

In [None]:
int_model

<div class="alert alert-block alert-warning">

**Sample Response:**

Note -- we didn't do any rounding for now because we're going to combine some of these numbers and wanted to preserve precision.

Interaction Model GLM:
$price_i = 1.598 + 0.043year_i + 0.064Sing_i + 0.441US_i + 0.091year_i*Sing_i + 0.101year_i*US_i + e_i$ 

- $Y_i = 1.598 + 0.043X_{1i} + -0.064X_{2i} + 0.441X_{3i} + \\ 
0.091X_{1i}X_{2i} + 0.101X_{1i} X_{3i} + e_i$ 
(year is $X_{1i}$, whether the country is Singapore is $X_{2i}$, whether the country is the US is $X_{3i}$)

...doesn't render well in DeepNote but renders in CKHub.
The double \\ in the Latex equation notation says to put the equation on two lines but DeepNote doesn't recognize it.
</div>

3.2b, bonus - This giant equation actually represents 3 separate lines, one for each country. Try writing the simplified equation for each country.

<div class="alert alert-block alert-warning">

**Sample Response:**

EGYPT 

$Y_i=1.598 + 0.043year_i + e_i$

SINGAPORE 

1.598 + -0.064 (these are both for intercept of singapore) 
0.043year_i + 0.09178 (both for slope of singapore line)

$Y_i= 1.222 + 0.134year_i + e_i$


U.S. 

1.598 + 0.441 (combine for intercept for U.S.)
0.043year_i + 0.101 (combine for slope)

$Y_i=2.039 + 0.144year_i + e_i$
</div>


In [None]:
# using R as a calculator for combining like terms

1.158+0.064

0.043+0.09178

1.598+.441

0.043+0.101

3.3 - Use the model to make a few predictions. Try predicting the price of Big Macs in each of the three countries for the year 2030.

In [None]:
# Sample Responses
# If they did the 3.2 bonus question, they could use the simplified equations for each country.
# Here we showed how to make these predictions using the single GLM model that includes all three countries.
# We need to use "30" instead of "2030" because the variable year is "years since 2000"

# Egypt, 2030
1.598 + 0.043*30 + -0.064 *0 + 0.441*0 + 0.091*30*0 + 0.101*30*0  

# Singapore, 2030
1.598 + 0.043*30 + -0.064 *1 + 0.441*0 + 0.091*30*1 + 0.101*30*0  

# United States, 2030
1.598 + 0.043*30 + -0.064 *0 + 0.441*1 + 0.091*30*0 + 0.101*30*1  

#or, using the simplified equations

# Egypt, 2030
1.598 + 0.043*30
# Singapore, 2030
1.222 + 0.13478*30
# United States, 2030
2.039+ 0.144*30

In [None]:
# for teachers, students can also plot their predictions on the graph
# We need to use "30" instead of "2030" because the variable year is "years since 2000"
gf_point(price ~ year, data = bigmac3, color = ~country, alpha = .2) %>%
    gf_model(int_model) %>%
    gf_point(2.888 ~ 30, color = "mediumturquoise")%>%
    gf_point(5.554 ~ 30, color = "purple")%>%
    gf_point(6.359 ~ 30, color = "tan4")

3.4 - Based on your analyses today, what is the overall story of the price of a Big Mac that you are seeing in the data? Describe and conclude.

<div class="alert alert-block alert-warning">

**Sample Response:**

Varies based on students' selected countries. 

For students who would be satisfied with an additive model for modeling their three countries, they might conclude that Big Mac prices have changed steadily (at a constant rate of change).  

For students who need an interaction model for modeling their three countries, they might conclude that Big Mac prices have changed at different rates in various countries of the world.  

</div>