# API-201 ABC REVIEW SESSION #10

**Friday, December 9**

# Statistical Inference and the Minimum Wage

Card and Krueger surveyed fast food restaurants before and after the minimum wage change. Suppose you are working on a congressional campaign in the state of Michigan. The candidate you are working for is deciding whether they should come out in support of a $15 minimum wage. There are many factors that will influence the decision, but at the forefront of your candidate's mind are JOBS JOBS JOBS. Your candidate would like for their policies to increase wages, but to them this is much less important than keeping unemployment low. 

You remember from Econ 101 that in the classical model of labor supply and demand a minimum wage will create surplus labor AKA unemployment. However, there are other economic models under which minimum wage changes may actually increase employment! In these models, if firms have a lot of market power they set wages low which reduces employment because people don't want to work for those low wages. 

Card and Krueger tested how these models fared when faced with real world data by evaluating how a minimum wage change in New Jersey actually affected employment. They compared the change in employment at fast food restaurants in New Jersey — which increased its state minimum wage from \\$4.25 to \\$5.05 — to the change in Eastern Pennsylvania — which didn't change its minimum wage. 

Card and Krueger surveyed fast food restaurants in New Jersey and Pennsylvania before and after the minimum wage change and assessed the **difference in the change in number of of full-time-equivalent workers those restaurants employed.** (For example, if 20 workers each worked 20 hours each, this is 10 full-time-equivalent workers.) **What does this accomplish?**

![](distribution_plots.png)

Let's first focus on the change in full-time-equivalent workers (FTE) in New Jersey. Among all fast food restaurants in New Jersey there is an unknown **population distribution** of the change in FTE. _We do not know what the population distribution is._ We don't even know its mean or standard deviation, but we can call them $\mu$ and $\sigma$. By using statistics, we attempt to learn something about the population using a **sample**. 

The sample distribution of the change in FTE among New Jersey fast food restaurants is depicted in the middle plot above. There is wide dispersion in the change across restaurants. We can calculate a sample average $\hat\mu$ and a sample standard deviation $\hat\sigma$ to describe this sample distribution. 

The Central Limit Theorem allows us to learn about the population from just our sample. If the sample is representative and sufficiently large, the CLT tells us that $\hat\mu$ is normally-distributed with mean $\mu$ and standard deviation $\dfrac{\sigma}{\sqrt{n}}$. If we approximate the standard deviation using $\hat\sigma$, we know how much variation in the sample mean that there is around the population mean, and therefore we know what a range of reasonable values for $\mu$ is!

**0. Download the Card and Krueger data.**

In [0]:
# Download data
download.file("https://github.com/5harad/API201-students/raw/main/review_sessions/review_10/fastfood.dta", 
              "fastfood.dta")

**1. In 1992, Card and Krueger attempted to contact 473 Burger King, KFC, Wendy's, and Roy Rogers chains in New Jersey and Eastern Pennsylvania for initial interviews.**

**a. What are the relevant populations considered in this study?**



**b. Card and Krueger were only able to complete initial interviews with 410 restaurants. If busier stores were less likely to complete interviews is the sample still representative? If not, how might this bias the estimate?** 



Card and Krueger managed to re-interview most of the 410 restaurants after the minimum wage change (the rest were closed, under renovation, or refused to be re-interviewed). When all was said and done, there were 309 restaurants in New Jersey and 75 in Pennsylvania that were interviewed both before and after the minimum wage change. They then calculated the change in full-time-equivalent workers (FTE) at each restaurant. 

**c. Let $\mu_s$ be the average change in FTE at restaurants in state $s$ and $\sigma_s$ be the standard deviation of the change in FTE at restaurants in that state. Then, what is the sampling distribution of $\hat{\mu}_{NJ} - \hat{\mu}_{PA}$?**



**d. Because a pilot study found that McDonald's restaurants had poor response rates, Card and Krueger did not survey any McDonald's. Suppose you uncover McDonald's secret human resources archive — second in secrecy only to their Big Mac sauce recipe. You use it to determine store employment in New Jersey and Pennsylvania prior to and after the minimum wage change. You supplement Card and Krueger's data by including employment in McDonald's restaurants. How does this affect the following:**

- **i. The populations of interest**

- **ii. The population difference in means**

- **iii. Statistical power**


In [0]:
# Load packages
library(tidyverse)
library(haven)

# Read data
fastfood <- read_dta("fastfood.dta")

head(fastfood)

**Data Dictionary**
* `restaurant_id` - a numerical identifier for each restaurant
* `state` - which state the restaurant is located in (PA or NJ)
* `fte_pre` - full-time-equivalent workers prior to the minimum wage increase in NJ
* `fte_post` - full-time-equivalent workers after the minimum wage increase in NJ

**2. Consider the code chunk and output below:**

In [0]:
fastfood %>% 
    filter(!is.na(fte_post) & !is.na(fte_pre)) %>%
    mutate(fte_change = fte_post - fte_pre) %>%
    group_by(state) %>%
    summarize(n = n(),
              mean_fte_change = mean(fte_change),
              sd_fte_change   = sd(fte_change))


**a. Some restaurants are not open either before the minimum wage change or after the minimum wage change. These restaurants have FTE coded as `NA` in the periods they aren't open. Given that is the case, what does the line beginning with `filter` do?**



**b. What do the remaining lines do? Interpret the output.**



**c. Calculate a 95% confidence interval for the difference in pre-post changes in FTE. Note that this is often called the _difference in differences_. Under certain assumptions you'll learn in API-202, it reflects the causal effect of the policy.**



**d. What null hypothesis would allow you to test whether there is a statistically significant difference in average changes in FTE? Calculate the correspoding Z-score and p-value. Do you reject the null hypothesis?**

In [0]:
# Your answer here!



**e. On average, New Jersey restaurants had 20 full-time-equivalent workers prior to the minimum wage increase. Assess the practical significance of the effect relative to this baseline.**



**f. Next semester, you'll learn how to critically evaluate whether this study is "internally valid" — whether it yields an unbiased estimate of the causal effect. For now, assume it does. Based on this analysis alone, would you recommend your candidate come out in favor of a $15 minimum wage? If so, would you recommend they advocate for it by saying that "minimum wages actually increase employment?"**



# Goodbyes

**Thank you all for a great semester!**

Next semester I will be the TF for API-202Z with Professor Tim Layton. If you are interested and want to know whether 202Z may be right for you, please come talk to me.

I'm always happy to discuss research, the health care system, Ph.D. student life, politics, my dog, photography, and cool birds, so say hi if you see me around or reach out by email: benjamin_berger@hks.harvard.edu.

**Good luck on the exam, and I'll see you around in the Spring!**