In [1]:
install.packages("randomForest")

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



# A Case Study: The Effect of Gun Ownership on Gun-Homicide Rates

We consider the problem of estimating the effect of gun
ownership on the homicide rate. For this purpose, we estimate the following partially
linear model

$$
 Y_{j,t} = \beta D_{j,(t-1)} + g(Z_{j,t}) + \epsilon_{j,t}.
$$

## Data

$Y_{j,t}$ is the log homicide rate in county $j$ at time $t$, $D_{j, t-1}$ is the log fraction of suicides committed with a firearm in county $j$ at time $t-1$, which we use as a proxy for gun ownership,  and  $Z_{j,t}$ is a set of demographic and economic characteristics of county $j$ at time $t$. The parameter $\beta$ is the effect of gun ownership on homicide rates, controlling for county-level demographic and economic characteristics.

The sample covers 195 large United States counties between the years 1980 through 1999, giving us 3900 observations.

In [2]:
data <- read.csv("/content/gun_clean.csv")
dim(data)

**Note:** In this notebook, we do not account for fixed effects in the following estimations. We treat the panel data as cross-sectional.

**Exercise 1:** Estimate a simple linear regression of $Y_{j,t}$ (*logghomr*) on $D_{j,t-1}$ (*logfssl*) without any controls. Determine the regression coefficient $\beta$, its 95% confidence interval and the standard error.

**Exercise 2:** Repeat the linear regression from Exercise 1, but now include the full set of county-level control variables $Z_{j,t}$. Again, determine the regression coefficient $\beta$ of the target regressor *logfssl*, its 95% confidence interval and the standard error.

In [None]:
Z <- as.matrix(data)[, c('logrobr','logburg','burg_missing','robrate_missing','newblack','newfhh','newmove','newdens','newmal',
                                'AGE010D','AGE050D','AGE110D','AGE170D','AGE180D','AGE270D','AGE310D','AGE320D','AGE350D','AGE380D','AGE410D','AGE470D','AGE570D',
                                'AGE640D','AGE670D','AGE760D','BNK010D','BNK050D','BPS030D','BPS130D','BPS230D','BPS020D','BPS120D','BPS220D','BPS820D','BZA010D',
                                'BZA110D','BZA210D','EDU100D','EDU200D','EDU600D','EDU610D','EDU620D','EDU630D','EDU635D','EDU640D','EDU650D','EDU680D','EDU685D',
                                'ELE010D','ELE020D','ELE025D','ELE030D','ELE035D','ELE060D','ELE065D','ELE210D','ELE220D','HIS010D','HIS020D','HIS030D','HIS040D',
                                'HIS110D','HIS120D','HIS130D','HIS140D','HIS200D','HIS300D','HIS500D','HIS700D','HSD010D','HSD020D','HSD030D','HSD110D','HSD120D',
                                'HSD130D','HSD140D','HSD150D','HSD170D','HSD200D','HSD210D','HSD230D','HSD300D','HSD310D','HSG030D','HSG195D','HSG200D','HSG220D',
                                'HSG440D','HSG445D','HSG460D','HSG680D','HSG700D','HSD410D','HSD500D','HSD510D','HSD520D','HSD530D','HSD540D','HSD550D','HSD560D',
                                'HSD570D','HSD580D','HSD590D','HSD610D','HSD620D','HSD710D','HSD720D','HSD730D','HSD740D','HSD750D','HSD760D','HSD770D','HSD780D',
                                'HSG040D','HSG045D','HSG050D','HSG182D','HSG210D','HSG230D','HSG240D','HSG250D','HSG310D','HSG315D','HSG320D','HSG325D','HSG335D',
                                'HSG350D','HSG370D','HSG375D','HSG380D','HSG450D','HSG490D','HSG500D','HSG510D','HSG520D','HSG530D','HSG540D','HSG550D','HSG560D',
                                'HSG570D','HSG650D','HSG690D','HSG710D','HSG730D','INC110D','INC650D','INC670D','INC680D','INC690D','INC700D','INC710D','INC720D',
                                'INC730D','INC760D','INC790D','LFE020D','LFE023D','LFE030D','LFE080D','LFE090D','LFE210D','LFE220D','LND110D','PIN020D','POP110D',
                                'POP210D','POP240D','POP440D','POP450D','POP470D','POP480D','POP540D','POP550D','POP570D','POP580D','POP700D','POP710D','POP720D',
                                'POP740D','PPQ010D','PPQ100D','PPQ110D','PPQ120D','PVY020D','PVY120D','PVY210D','PVY310D','PVY420D','PVY520D','SPR030D','SPR130D',
                                'SPR230D','SPR330D','SPR440D','VST020D')]

The control variables $Z_{j,t}$ are from the U.S. Census Bureau and include 195 county-level features such as demographic statistics, crime rates, income, education and housing indicators. The following code might help you.

In [None]:
data <- data.frame(logghomr  = data$logghomr, logfssl = data$logfssl, Z)
lm_formula <- as.formula(paste("logghomr", "~", paste("logfssl",paste(colnames(Z),collapse="+"),sep="+")))

**Exercise 3:** So far, linear models have been estimated with OLS. Now, consider a partially linear model that controls for county-level features in a non-linear way. Estimate the effect of gun ownership using a **naive partialling-out** approach based on **random forest** (cf. slide 20 in L8). First, estimate the conditional expectation $\ell(Z):= E[Y|Z]$ with a random forest on the full sample. Then compute the residuals $\tilde Y = Y  - \hat \ell (Z)$ and regress them linearly on *logfssl*. Do not use sample splitting or cross-fitting here. Determine the regression coefficient $\beta$ of the target regressor *logfssl* and its standard error.

Hint: The function *randomForest()* and *predict()* from the package *randomForest* may be helpful.

**Exercise 4:** Now, use the standard **partialling-out** approach based on **random forest**, where you also residualize the regressor *logfssl*. Determine the regression coefficient $\beta$ of the target regressor *logfssl* and its standard error.

To simplify this procedure, do not apply cross-fitting. Instead, use a **naive sample split**: Use the first half of the data to estimate the nuisance functions and the second half to estimate the target parameter $\beta$.

In [None]:
# split data
set.seed(1)
n <- nrow(data)
index <- sample(seq_len(n), size = 0.5 * n, , replace = FALSE)

first_sample <- data[index, ]
second_sample  <- data[-index, ]
Z_firstsample <- as.matrix(first_sample)[, -c(1, 2)]  # exclude logghomr, logfssl
Z_secondsample <- as.matrix(second_sample)[, -c(1, 2)]  # exclude logghomr, logfssl