# API-201 ABC REVIEW SESSION #10

**Friday, December 9**

# Statistical Inference and the Minimum Wage

Card and Krueger surveyed fast food restaurants before and after the minimum wage change. Suppose you are working on a congressional campaign in the state of Michigan. The candidate you are working for is deciding whether they should come out in support of a $15 minimum wage. There are many factors that will influence the decision, but at the forefront of your candidate's mind are JOBS JOBS JOBS. Your candidate would like for their policies to increase wages, but to them this is much less important than keeping unemployment low. 

You remember from Econ 101 that in the classical model of labor supply and demand a minimum wage will create surplus labor AKA unemployment. However, there are other economic models under which minimum wage changes may actually increase employment! In these models, if firms have a lot of market power they set wages low which reduces employment because people don't want to work for those low wages. 

Card and Krueger tested how these models fared when faced with real world data by evaluating how a minimum wage change in New Jersey actually affected employment. They compared the change in employment at fast food restaurants in New Jersey — which increased its state minimum wage from \\$4.25 to \\$5.05 — to the change in Eastern Pennsylvania — which didn't change its minimum wage. 

Card and Krueger surveyed fast food restaurants in New Jersey and Pennsylvania before and after the minimum wage change and assessed the **difference in the change in number of of full-time-equivalent workers those restaurants employed.** (For example, if 20 workers each worked 20 hours each, this is 10 full-time-equivalent workers.) **What does this accomplish?**

![](distribution_plots.png)

Let's first focus on the change in full-time-equivalent workers (FTE) in New Jersey. Among all fast food restaurants in New Jersey there is an unknown **population distribution** of the change in FTE. _We do not know what the population distribution is._ We don't even know its mean or standard deviation, but we can call them $\mu$ and $\sigma$. By using statistics, we attempt to learn something about the population using a **sample**. 

The sample distribution of the change in FTE among New Jersey fast food restaurants is depicted in the middle plot above. There is wide dispersion in the change across restaurants. We can calculate a sample average $\hat\mu$ and a sample standard deviation $\hat\sigma$ to describe this sample distribution. 

The Central Limit Theorem allows us to learn about the population from just our sample. If the sample is representative and sufficiently large, the CLT tells us that $\hat\mu$ is normally-distributed with mean $\mu$ and standard deviation $\dfrac{\sigma}{\sqrt{n}}$. If we approximate the standard deviation using $\hat\sigma$, we know how much variation in the sample mean that there is around the population mean, and therefore we know what a range of reasonable values for $\mu$ is!

**0. Download the Card and Krueger data.**

In [1]:
# Download data
download.file("https://github.com/5harad/API201-students/raw/main/review_sessions/review_10/fastfood.dta", 
              "fastfood.dta")

**1. In 1992, Card and Krueger attempted to contact 473 Burger King, KFC, Wendy's, and Roy Rogers chains in New Jersey and Eastern Pennsylvania for initial interviews.**

**a. What are the relevant populations considered in this study?**

##### START

1. Burger King, KFC, Wendy's, and Roy Rogers chains in New Jersey
2. Burger King, KFC, Wendy's, and Roy Rogers chains in Eastern Pennsylvania

##### END

**b. Card and Krueger were only able to complete initial interviews with 410 restaurants. If busier stores were less likely to complete interviews is the sample still representative? If not, how might this bias the estimate?** 

##### START

By using only data from restaurants that responded to the interview, Card and Krueger estimate employment only conditional on taking an interview, even though they are interested in average employment across all stores. Restaurants with managers that are able to take a phone interviews may also not have staffing issues and so they are flexible enough to cut back hours when the minimum wage increases. 

##### END

Card and Krueger managed to re-interview most of the 410 restaurants after the minimum wage change (the rest were closed, under renovation, or refused to be re-interviewed). When all was said and done, there were 309 restaurants in New Jersey and 75 in Pennsylvania that were interviewed both before and after the minimum wage change. They then calculated the change in full-time-equivalent workers (FTE) at each restaurant. 

**c. Let $\mu_s$ be the average change in FTE at restaurants in state $s$ and $\sigma_s$ be the standard deviation of the change in FTE at restaurants in that state. Then, what is the sampling distribution of $\hat{\mu}_{NJ} - \hat{\mu}_{PA}$?**

##### START
By the Central Limit Theorem, the distribution is Normal with mean $\mu_{NJ} - \mu_{PA}$ and standard deviation $\sqrt{\dfrac{\sigma_{NJ}^2}{309} + \dfrac{\sigma_{PA}^2}{75}}$. In practice, we don't know $\sigma_s$, so we estimate the standard deviation as $\sqrt{\dfrac{\hat{\sigma}_{NJ}^2}{309} + \dfrac{\hat{\sigma}_{PA}^2}{75}}$.

##### END

**d. Because a pilot study found that McDonald's restaurants had poor response rates, Card and Krueger did not survey any McDonald's. Suppose you uncover McDonald's secret human resources archive — second in secrecy only to their Big Mac sauce recipe. You use it to determine store employment in New Jersey and Pennsylvania prior to and after the minimum wage change. You supplement Card and Krueger's data by including employment in McDonald's restaurants. How does this affect the following:**

- **i. The populations of interest**
##### START
The populations are now Burger King, KFC, Wendy's, Roy Rogers, **and McDonald's** stores in New Jersey and Pennsylvania, respectively. 
##### END

- **ii. The population difference in means**
##### START
The true difference in average changes in full-time equivalent workers will be different from before because the population now includes McDonald's restaurants. For example, McDonald's restaurants may respond more negatively to the minimum wage change and tend to decrease workers more than other stores.
##### END

- **iii. Statistical power**
##### START
Statistical power — the probability of rejecting the null hypothesis if it is false – increases with the sample size and the true difference in means. 
Adding McDonald's restaurants will definitely increase the sample size, as there are more restaurants in the sample. 
The change in the true difference in means will depend on how McDonald's changed employment in Pennslyvania relative to New Jersey. As we will see, other restaurants actually increased employment in New Jersey relative to Pennsylvania. If McDonald's increased employment by even more, including these restaurants will definitely increase power. If they increased employment by less or decreased employment, the change in power is ambiguous: it will depend on the relative importance of the increase in sample size to the decrease in effect size.
##### END

In [20]:
# Load packages
library(tidyverse)
library(haven)

# Read data
fastfood <- read_dta("fastfood.dta")

head(fastfood)

restaurant_id,state,fte_pre,fte_post
<dbl>,<chr>,<dbl>,<dbl>
1,PA,40.5,24.0
2,PA,13.75,11.5
3,PA,8.5,10.5
4,PA,34.0,20.0
5,PA,24.0,35.5
6,PA,20.5,


**Data Dictionary**
* `restaurant_id` - a numerical identifier for each restaurant
* `state` - which state the restaurant is located in (PA or NJ)
* `fte_pre` - full-time-equivalent workers prior to the minimum wage increase in NJ
* `fte_post` - full-time-equivalent workers after the minimum wage increase in NJ

**2. Consider the code chunk and output below:**

In [51]:
fastfood %>% 
    filter(!is.na(fte_post) & !is.na(fte_pre)) %>%
    mutate(fte_change = fte_post - fte_pre) %>%
    group_by(state) %>%
    summarize(n = n(),
              mean_fte_change = mean(fte_change),
              sd_fte_change   = sd(fte_change))


state,n,mean_fte_change,sd_fte_change
<chr>,<int>,<dbl>,<dbl>
NJ,309,0.4666667,8.452195
PA,75,-2.2833333,10.853628


**a. Some restaurants are not open either before the minimum wage change or after the minimum wage change. These restaurants have FTE coded as `NA` in the periods they aren't open. Given that is the case, what does the line beginning with `filter` do?**

##### START
This line keeps only the restaurants that are open prior to the minimum wage change AND are open after the minimum wage change. These are the only restaurants for which we can calculate the change in employment. 
##### END

**b. What do the remaining lines do? Interpret the output.**

##### START
- `mutate(fte_change = fte_post - fte_pre)`

This line adds a variable to the data that is equal to the change in full-time-equivalent workers after the implementation of the minimum wage change in New Jersey. 

- `group_by(state)`

This line groups observations by state. This means that the following commands will aggregate statistics at the state level rather than calculating the mean of all restaurants.

- `summarize(n = n(), mean_fte_change = mean(fte_change), sd_fte_change = sd(fte_change))`

This line collapses the data by state, calculating three statistics for each state: `n`, the number of restaurants in that state; `mean_fte_change`, the average change in FTE for restaurants in that state; and `sd_fte_change`, the standard deviation of the change in FTE for restaurants in that state.

The output indicates that there were 309 restaurants in New Jersey and 75 in Pennsylvania that were surveyed both before and after the minimum wage change. The average change in FTE was and increase of 0.47 in New Jersey and a decrease of 2.28 in Pennsylvania. The standard deviation of these changes was 8.45 workers in New Jersey and 10.85 in Pennsylvania.

##### END

**c. Calculate a 95% confidence interval for the difference in pre-post changes in FTE. Note that this is often called the _difference in differences_. Under certain assumptions you'll learn in API-202, it reflects the causal effect of the policy.**

##### START

The difference in differences is $0.467 - -2.283 = 2.75$. 

The standard error is $\sqrt{\frac{8.45^2}{309} + \frac{10.85^2}{75}} = 1.342$.

The 95% confidence interval is $2.75 \pm 2 \times 1.342 = [0.065, 5.434]$.

##### END

**d. What null hypothesis would allow you to test whether there is a statistically significant difference in average changes in FTE? Calculate the correspoding Z-score and p-value. Do you reject the null hypothesis?**

In [12]:
# Your answer here!

# START
z <- (2.75 - 0) / 1.342

p <- 2 * pnorm(-abs(z))

c("Z-score" = z, "p-value" = p)
# END

##### START

The null hypothesis is written as $H_0: \hat\mu_{NJ} - \hat\mu_{PA} = 0$.

The p-value is 0.04. This is less than 0.05, so we reject the null hypothesis.
##### END

**e. On average, New Jersey restaurants had 20 full-time-equivalent workers prior to the minimum wage increase. Assess the practical significance of the effect relative to this baseline.**

##### START
Card and Krueger estimate an increase of 2.75 full-time-equivalent workers. This is an increase of 13.75 percent, a modest increase, but notable considering classical models predict a decrease in employment. 
##### END

**f. Imagine Card and Krueger surveyed restaurants on far more topics than just employment. Perhaps they asked about revenues, capital expenditures, process innovation, input costs, and more, each measured in multiple ways, and then tested the relative change in New Jersey and Pennsylvania on each measure. They then only reported their findings on full-time-equivalent workers. Would this make you question the validity of their results?** 

##### START
Testing this many hypotheses should make you rightly skeptical of the results. Recall that if the null hypothesis is true, we reject it 5 percent of the time. As the number of hypotheses tested increases, the probability of rejecting at least one increases rapidly. If Card and Krueger tested 20 true null hypotheses, they should expect to reject one. If you notice or suspect that a study tested many hypotheses, you should adjust your expectations accordingly.
##### END

**f. Next semester, you'll learn how to critically evaluate whether this study is "internally valid" — whether it yields an unbiased estimate of the causal effect. For now, assume it does. Based on this analysis alone, would you recommend your candidate come out in favor of a $15 minimum wage? If so, would you recommend they advocate for it by saying that "minimum wages actually increase employment?"**

##### START
Card and Krueger produced some of the first methodologically sound evidence on the minimum wage, playing a large role in the credibility revolution in economics. However, before getting too excited about the results, you may want to keep a few things in mind:

- It is generally unwise to make policy decisions based off of one study, especially if that study is fairly limited in scope like this one which evaluated the effect of a single minimum wage change in the fast food industry in New Jersey. To glean more generalizable insights, a survey of the literature would be appropriate.

- The setting evaluated by Card and Krueger is quite different from the setting that concerns your candidate. For instance:
    * i. Card and Krueger studied a minimum wage change 30 years ago.
    * ii. Card and Krueger studied a minimum wage change in New Jersey not Michigan.
    * iii. Card and Krueger only studied 4 fast food chains. Your candidate is concerned with the effect of a minimum wage not just in the whole restaurant sector but in _all sectors_.
    * iv. Card and Krueger studied the effect of an \\$0.80 increase in the minimum wage. The increase in Michigan would be a \\$5 increase.

- While we reject the null hypothesis, it is still possible these results are due to sampling fluctuations. The p-value is just below 0.05 — p-values from published studies are known to cluster in this range due to publication bias. 

##### END

# Goodbyes

**Thank you all for a great semester!**

Next semester I will be the TF for API-202Z with Professor Tim Layton. If you are interested and want to know whether 202Z may be right for you, please come talk to me.

I'm always happy to discuss research, the health care system, Ph.D. student life, politics, my dog, photography, and cool birds, so say hi if you see me around or reach out by email: benjamin_berger@hks.harvard.edu.

**Good luck on the exam, and I'll see you around in the Spring!**