## 10.2 Panel Data with Two Time Periods: "Before and After" Comparisons

- Suppose there are only T = 2 time periods t = 1982, 1988. 
- This allows us to analyze differences in changes of the the fatality rate from year 1982 to 1988. 
- We start by considering the population regression model, where the $Z_i$ are state specific characteristics that differ between states but are constant over time. 
\begin{equation}
FatalityRate_{it} = \beta_0 + \beta_1 BeerTax_{it} + \beta_2 Z_{i} + u_{it}
\end{equation}

- For t = 1982 and t = 1988 we have

\begin{align*}
  FatalityRate_{i1982} =&\, \beta_0 + \beta_1 BeerTax_{i1982} + \beta_2 Z_i + u_{i1982}, \\
  FatalityRate_{i1988} =&\, \beta_0 + \beta_1 BeerTax_{i1988} + \beta_2 Z_i + u_{i1988}.
\end{align*}

- We can eliminate the $Z_i$ by regressing the difference in the fatality rate between 1988 and 1982 on the difference in beer tax between those years:

\begin{equation}
FatalityRate_{i1988} - FatalityRate_{i1982} = \beta_1 (BeerTax_{i1988} - BeerTax_{i1982}) + u_{i1988} - u_{i1982}
\end{equation}

- This regression model yields an estimate for $β_1$ that is robust against state specific bias due to omission of the $Z_i$.
- Next we use Julia to estimate a regression based on the differenced data and plot the estimated regression function.

In [1]:
using CSV
using DataFrames
using Query
using Plots
using FixedEffects 
using FixedEffectModels
using Statistics
using LinearAlgebra

fatalities = CSV.read("C:\\Users\\jpche\\AppData\\Local\\JuliaPro-1.2.0-1\\fatalities.csv")

fatalities.fatality_rate = fatalities.fatal ./ fatalities.pop * 10000

fatalities1982 = @from i in fatalities begin
    @where i.year == 1982
    @select i
    @collect DataFrame
end

fatalities1988 = @from i in fatalities begin
    @where i.year == 1988
    @select i
    @collect DataFrame
end

diff_fatalities = DataFrame(
                    fatality_rate = fatalities1988.fatality_rate - fatalities1982.fatality_rate,
                    beertax = fatalities1988.beertax - fatalities1982.beertax
)

diff_fatalities_model = reg(
                        diff_fatalities,
                        @model(fatality_rate ~ beertax)                 
)

println(diff_fatalities_model)

x = diff_fatalities.beertax
y = diff_fatalities.fatality_rate

p1 = plot( #assign a plot object to the variable p1 using the following attributes
    x, #x series
    y, #y series
    st = :scatter, #series type
    title = "Changes in Traffic Fatality Rates and Beer Taxes in 1982-1988", #plot title
    label = "Observation", #legend labels
    xlabel = "Change in beer tax (in 1988 dollars)", #x axis label
    ylabel = "Change in fatality rate (fatalities per 10000)", #y axis label
    ylims = (-1.5, 1), #y axis limits
    yticks = -1.5:0.5:1, #y axis tick range
    ms = 4, #marker size
    mc = :blue #marker color
)

y_prediction_diff(x) = dot(coef(diff_fatalities_model), [1, x])

x = [minimum(diff_fatalities.beertax), maximum(diff_fatalities.beertax)]
y = y_prediction_diff.(x)

plot!(
    p1,
    x,
    y,
    st = :line,
    label = "OLS Difference"
)

display(p1)

println("Mean fatality rate over all states for all time : " * string(mean(fatalities.fatality_rate)))

Base.IOError: IOError: stat: invalid argument (EINVAL)

- We obtain the OLS estimated regression function

\begin{equation}
\widehat{FatalityRate_{i1988} - FatalityRate_{i1982}} = -\underset{(0.065)}{0.072} -\underset{(0.36)}{1.04} \times (BeerTax_{i1988}-BeerTax_{i1982}).
\end{equation}