# Econometrics 1 

## TD 1 - 26/09/2024

TA: Pedro Vergara Merino ([pedro.vergaramerino@ensae.fr](mailto:pedro.vergaramerino@ensae.fr), office 4081) 

We aim at explaining log(wages) with age and education. Henceforth, `lnw` (or `lnwage`) denotes the logarithm of hourly wage, `eduy` is the number of years of education (the count starts at 6 years old), and `age` is the age measured in years.

![Regression_Short](figures/reg_lnw_eduy.png)

![Regression_Long](figures/reg_lnw_eduy_age.png)

![Summary](figures/sum_lnw_eduy_age.png)

#### 1. Given the outcome in the previous tables, what can we learn about the correlation between `age` and `eduy`? Explain.

To deduce something about the correlation between `age`and `eduy` it is useful to look at the **omitted variable bias (OVB)** formula (see Proposition 4 of Chapter 1).

#### 2. Recompute the second regression using the data emp2007.dta. Interpret the coefficient age in this new regression.

We start by loading the required packages and the data.

In [None]:

# Load packages
library('haven')
library('dplyr')
library('margins')

We can then load data in .dta format using the command `read_dta` from the package `haven`.

In [None]:
#Load data
data <- haven::read_dta("../emp2007.dta")

The command `summary` gives descriptive statistics of the data.
With the command `kable` we can display the first rows of the dataset.

In [None]:
summary(data)

In [None]:
knitr::kable(head(data), "simple")

Notice that we have the monthly wage and the weekly hours. So, first we transform the data using the command `mutate` from the package `dplyr`.

The syntax `%>%` allows to call the dataset only once.

In [None]:
data <- data %>% mutate(w=(salred*12)/(hhc*52)) # Create variable w
data <- data %>% mutate(lnw=log(w)) # Apply the logarithm

In [None]:
knitr::kable(head(data), "simple") # See new dataframe

We now run the regression of the logarithm of hourly wages on education level and age. For this, we use the command `lm`.

In [None]:
model1 <- lm(lnw ~ eduy + age, data=data) # Create the model
summary(model1) # Gives the regression coefficients 

#### 3. Compute the coefficients of the regression of `lnw` on `eduy`, `age`, and `age`$^2$. What is the marginal effect of `age`? What is the magnitude of such a marginal effect for a person who is 20 years old? And for someone who is 50 years old? Compute the average effect in the sample, first "by hand" and then with the command `margins`.

We start first by the manual computation of marginal effects.

Replace `???` in the code below.

In [None]:
data <- data %>% mutate(age2=age**2) # Create age^2
knitr::kable(head(data), "simple") # See new dataframe

We now run the new regression.

Replace `???` in the code below.

In [None]:
model2 <- lm(lnw ~ eduy + age + age2, data=data)
summary(model2)

The marginal effects of age are estimated by: $\widehat{\beta_2}+ 2\widehat{\beta_3}age$. So we want to compute their value for ages 20, 50, and the average age.

In [None]:
mean_age <- mean(data$age)
effect_20 <- round((coef(model2)["age"] + 2*coef(model2)["age2"]*20),digits=3)
effect_50 <- round(coef(model2)["age"] + 2*coef(model2)["age2"]*50,digits=3)
effect_mean <- round(coef(model2)["age"] + 2*coef(model2)["age2"]*mean_age,digits=3)
mean_age <- round(mean_age,digits=3)

In [None]:
cat("The marginal effect of age on the log-hourly wages at age 20 is", effect_20,"\n")
cat("The marginal effect of age on the log-hourly wages at age 50 is", effect_50,"\n")
cat("The marginal effect of age on the log-hourly wages at the mean age",mean_age,"is", effect_mean)

Now we compute the marginal effects using the command `margins`. So `R` understands that `age`$^2$ is a function of `age`, we use the syntax `I(age^2)` inside the regression formula instead of `age2`.

Replace `???` in the code below.

In [None]:
model2_margins <- lm(lnw ~ eduy + age + I(age^2), data=data)
summary(model2_margins)

In [None]:
margins_20 <- margins(model2_margins, at=list(age=20))
margins_50 <- margins(model2_margins, at=list(age=50))
margins_mean <- margins(model2_margins, atmeans = TRUE)

In [None]:
print(margins_20)
print(margins_50)
print(margins_mean)

If we do not specify an option, the command `margins` computes the **average marginal effects (AME)**, which can be different than the marginal effects at the means!

In [None]:
margins_AME <- margins(model2_margins)
summary(margins_AME)

#### 4. Recover the coefficient of education in the previous regression using the Frisch-Waugh Theorem.

We start by regressing `eduy` on the other covariates `age`and `age`$^2$

Replace `???` in the code below.

In [None]:
model3 <- lm(lnw ~ age + age^2, data=data)
summary(model3)

We then recover the residuals from the regression.

In [None]:
data$u <- residuals(model3)

Finally, we run a regression of `lnw` on the residuals `u`.

In [None]:
model4 <- lm(lnw ~ u , data=data)
summary(model4)

#### 5. Build the variable `pexp = age - eduy - 6`. Explain why it is a proxy of the professional experience. Compute the regression of `lnw` on `eduy`, `age` and `pexp`. Explain the results.

We first create the new variable.

Replace `???` in the code below.

In [None]:
data <- data %>% mutate(pexp = age - eduy - 6)


We then run the new regression.

In [None]:
model5 <- lm(lnw ~ eduy + age + pexp, data = data)
summary(model5)