# A Case Study: The Effect of Gun Ownership on Gun-Homicide Rates

We consider the problem of estimating the effect of gun
ownership on the homicide rate. For this purpose, we estimate the following partially
linear model

$$
 Y_{j,t} = \beta D_{j,(t-1)} + g(Z_{j,t}) + \epsilon_{j,t}.
$$

## Data

$Y_{j,t}$ is log homicide rate in county $j$ at time $t$, $D_{j, t-1}$ is log  fraction of suicides committed with a firearm in county $j$ at time $t-1$, which we use as a proxy for gun ownership,  and  $Z_{j,t}$ is a set of demographic and economic characteristics of county $j$ at time $t$. The parameter $\beta$ is the effect of gun ownership on the
homicide rates, controlling for county-level demographic and economic characteristics. 

The sample covers 195 large United States counties between the years 1980 through 1999, giving us 3900 observations.

In [145]:
using Pkg
Pkg.add("CSV"), using CSV
Pkg.add("DataFrames"), using DataFrames
Pkg.add("StatsModels"), using StatsModels
Pkg.add("GLM"), using GLM


[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\PC\.julia\environments\v1.6\Manifest.toml`


(nothing, nothing)

In [2]:
data = CSV.File("../data/gun_clean.csv") |> DataFrame;
println("Number of rows: ",size(data,1))
println("Number of columns: ",size(data,2))

Number of rows: 3900
Number of columns: 415


### Preprocessing

To account for heterogeneity across counties and time trends in  all variables, we remove from them county-specific and time-specific effects in the following preprocessing.

In [50]:
function varlist(df = nothing , type_dataframe = ["numeric","categorical","string"], pattern=String , exclude =  nothing)

    varrs = []
    if "numeric" in type_dataframe
        append!(varrs, [i for i in names(data) if eltype(eachcol(data)[i]) <: Number])    
    end
    if "categorical" in type_dataframe
        append!(varrs,[i for i in names(data) if eltype(eachcol(data)[i]) <: CategoricalVector])
    end
    if "string" in type_dataframe
        append!(varrs,[i for i in names(data) if eltype(eachcol(data)[i]) <: String])
    end
    varrs[(!varrs in exclude) & varrs[findall(x->contains(x,pattern),names(data))]]
end

varlist (generic function with 5 methods)

In [54]:
fixed = filter(x->contains(x,"X_Jfips"),names(data));
year = filter(x->contains(x,"X_Tyear"),names(data));

In [56]:
size(year)

(21,)

In [66]:
census = []
census_var = ["AGE", "BN", "BP", "BZ", "ED", "EL","HI", "HS", "INC", "LF", "LN", "PI", "PO", "PP", "PV", "SPR", "VS"]
for i in 1:size(census_var,1) 
    append!(census,filter(x->contains(x,census_var[i]),names(data)))
end

In [90]:
d = ["logfssl"]
y = ["logghomr"]
X1 = ["logrobr", "logburg", "burg_missing", "robrate_missing"]
X2 = ["newblack", "newfhh", "newmove", "newdens", "newmal"]

variable = [y, d,X1, X2, census]
varlis = []
for i in variable
    append!(varlis,i)
end


Any[]

In [98]:
example = DataFrame(CountyCode = data[:,"CountyCode"]);

In [156]:
rdata = DataFrame(CountyCode = data[:,"CountyCode"]);

In [158]:
for i in 1:size(varlis,1)
    rdata[!,varlis[i]]= residuals(lm(term(Symbol(varlis[i])) ~ sum(term.(Symbol.(year))) + sum(term.(Symbol.(fixed))), data))
end


In [159]:
rdata

Unnamed: 0_level_0,CountyCode,logghomr,logfssl,logrobr,logburg,burg_missing,robrate_missing
Unnamed: 0_level_1,Int64,Float64,Float64,Float64,Float64,Float64,Float64
1,1073,-0.134778,0.0961271,0.150893,-0.124395,0.0104613,-0.021229
2,1073,-0.239622,0.0808094,0.0401683,-0.134781,0.0104613,-0.0194181
3,1073,-0.0786772,0.0573399,-0.017679,-0.167909,0.0104613,-0.0220374
4,1073,-0.331465,0.0816945,-0.00963344,-0.22925,0.0104613,-0.0194181
5,1073,-0.31664,0.0253655,-0.0267151,-0.176635,0.00324793,-0.0208037
6,1073,0.105132,-0.00677726,-0.151487,-0.189069,0.0104613,0.016953
7,1073,-0.0373401,0.0773061,-0.166729,-0.117739,0.0104613,0.0245505
8,1073,-0.0520609,-0.108433,-0.0996453,-0.0833094,0.00448964,0.021457
9,1073,0.0547007,-0.0340988,0.151557,0.319282,-0.0448348,-0.0366629
10,1073,0.122094,-0.0824292,0.0476034,-0.0144728,-0.00233214,0.00765442


In [152]:
residuals(lm(term(Symbol(varlis[1])) ~ sum(term.(Symbol.(year))) + sum(term.(Symbol.(fixed))), data))

3900-element Vector{Float64}:
 -0.1347775242285678
 -0.23962152489688604
 -0.0786771589928561
 -0.3314654607497185
 -0.3166398021047705
  0.10513189826268476
 -0.03734007187753763
 -0.052060902249581265
  0.05470066749138924
  0.12209372827735177
  0.11031407805814997
  0.2562675062474984
  0.07905702835656303
  ⋮
  0.21493612083267194
 -0.4903412114754312
 -0.5494199456711729
 -0.6604078122019494
 -0.6765248029790694
 -0.6975246915205633
 -0.03840950930937048
  0.028096226715491168
  0.08014469325480622
  0.4979872093044617
  0.5685415516403597
  0.6090223022158114