## BEEM011 Exercise 9: Week 10

### Panel Data Methods

Amy Binner and Eva Poen

Answers

## Useful R code

install.packages(“plm”)

library(plm)

<u> PANEL REGRESSION WITH ENTITY FIXED EFFECTS </u>

reg <- plm(y~x,data =mydata, index = c("entity", "year"), model = "within")

<u> PANEL REGRESSION WITH ENTITY AND TIME FIXED EFFECTS </u>

reg <- plm(y~x,data =mydata, index = c("entity", "year"), model = "twoways")

<u> CLUSTERED ROBUST STANDARD ERRORS </u>

coeftest(reg, vcov. = vcovHC, type = "HC1", cluster="group")

<u> F-test with Panel Regression Models </u>

pFtest(reg, model_null, vcov = vcovHC, type = "HC1", cluster="group")

# Question 1

A researcher investigating the determinants of crime in the United Kingdom has data for 42 police regions over 22 years. She estimates by OLS the following regression

$$ ln(crime_{it}) = α_i + \lambda_t + β_1 unemp_{it} + β_2 youths_{it} + β_3 ln(punish)_{it} + u_{it}$$

$$i = 1,..., 42, t = 1,..., 22 $$

- *crime* is the crime rate per head of population, 
- *unemp* is the unemployment rate of males, 
- *youths* is the proportion of youths, 
- *punish* is the probability of punishment measured as (number of convictions)/(number of crimes reported). 
- $α_i$ and $\lambda_t$ are police region and year fixed effects, where $α_i$equals one for area $i$ and is zero otherwise for all $i$, and $\lambda_t$ is one in year $t$ and zero for all other years for $t = 2, …, 22$. 

$\beta_0$ is not included.

Note: $ln()$ is the natural logarithm.


## 1 a)

Explain the purpose of excluding $\beta_0$? Why do we include $\alpha_1$?


**Including a constant in addition to the entity and time fixed effects would result in perfect multicollinearity.**

**The sum of all 42 entity fixed effects is equal to one in each observation. The sum of all time fixed effects is also equal to one in each observation. Therefore, we need to drop one of the fixed effects (this could be one of the time FEs or one of the entity FEs) to avoid perfect collinearity.**

**As we do not have an intercept term, $\alpha_1$ is the entity fixed effect for police region 1 in $t = 1$ - this becomes the base case**

**If we wanted to have an overall intercept, we would have to drop another fixed effects (so that one time FE and one entity FE are dropped from the model). The overall number of parameters estimated would remain the same.**

## 1 b)	

What are the terms $α_i$ and $\lambda_t$ likely to pick up? Discuss the advantages of using panel data for this type of investigation.

**•	$α_i$ picks up omitted variables that are specific to police region $i$ and do not vary over time.**

Attitudes toward crime may vary between rural regions and metropolitan areas. These would be hard to capture through measurable variables.

**•	$\lambda_t$ picks up effects that are common to all police regions in a given year.** 

Common macroeconomic shocks that affect all regions equally in a given time period will be captured by the time fixed effects. 

Although some of these variables could be explicitly introduced, the list of possible variables is long. By introducing time fixed effects, the effect is captured all in one variable.

## 1 c)

Estimation by OLS using heteroskedasticity and autocorrelation-consistent standard errors results in the following output, where the coefficients of the entity and time fixed effects are not reported:

$ \widehat{ln(crime)_{it}} = 0.063 × unemp_{it} + 3.739 × youths_{it} – 0.588 × ln(punish)_{it} $
                     
&emsp; &emsp;&emsp;&emsp;&emsp;$(0.109)$&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;          $(0.179)$&emsp;&emsp;&emsp;&emsp;&emsp;$(0.024)$

$R^2 = 0.904$

Comment on the results. In particular, what is the effect of a 1 percent increase in the probability of punishment? 


**A higher male unemployment rate and a higher proportion of youths increase the crime rate, while a higher probability of punishment decreases the crime rate. The coefficients on the probability of punishment and the proportion of youths is statistically significant, while the male unemployment rate is not. The regression explains roughly 90 percent of the variation in crime rates in the sample.**

**A 1 percent increase in the number of convictions over the number of crimes reported decreases the crime rate by roughly 0.6 percent.  An individual t-test on this coefficient would reject the null hypothesis that it is equal to zero at the 5% significance level.**

## 1 d)

To test for the relevance of the police region fixed effects, your restrict the regression by dropping all entity fixed effects and a single intercept coefficient is added. The relevant F-statistic is 135.28. 

- What are the degrees of freedom? 

- What is the critical value from the F distribution using a 1% significance level?

In [58]:
n = 42*22
k = 42 + 21 + 3
# here we have dropped one time fixed effect (see the answer to 1a) above)

q = 42 - 1
# the number of restrictions is one less than the number of entity FEs, 
# because we gain an overall intercept 
cat("n-k:", n - k)
qf(0.99, df1 = q , df2 = n-k , lower.tail = TRUE)

n-k: 858

**The coefficients of the three regressors other than the entity coefficients would have been unaffected, had there been a constant in the regression and (n-1) police region specific entity variables. In this case, the entity coefficients on the police regions would have indicated deviations from the constant for the first police region. Hence there are 41 restrictions imposed by eliminating the 42 entity fixed effects and adding a constant.**

**Since there are over 42 police regions over 22 years (df2 $= n-k = 858$ degrees of freedom), the critical value for F41,∞ ≈ 1.60 at the 1% level. Hence the restrictions are rejected.**

## 1 e)

Although the test rejects the hypothesis of eliminating the fixed effects from the regression, you want to analyze what happens to the coefficients and their standard errors when the equation is re-estimated without fixed effects. 

$ \widehat{ln(crime)_{it}} = 1.340 × unemp_{it} + 3.743 × youths_{it} – 0.601 × ln(punish)_{it} $
                     
&emsp; &emsp;&emsp;&emsp;&emsp;$(0.234)$&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;          $(0.356)$&emsp;&emsp;&emsp;&emsp;&emsp;$(0.051)$

In the resulting regression most of the coefficients do not change by much, although their standard errors roughly double. However, $\hat{\beta}_1$ is now 1.340 with a standard error of 0.234. 

Explain why you think this has occurred?

**This result would make the male unemployment rate coefficient significant. It suggests that male unemployment rates change slowly over the years in a given police district and that this effect is picked up by the entity fixed effects. Of course, there are other slowly changing variables that have been omitted, such as attitudes towards crime, that are captured by these fixed effects and that may cause omitted variable bias when the entity fixed effects are excluded.**

## Question 2

In this question you will work with “Guns” -  a balanced panel containing observations on criminal and demographic variables for all US states and the years 1977-1999. The dataset comes with the package AER.

## 2 a)

Load the AER and plm packages and the Guns dataset. 

Explore the dataset using the summary() function. Use the command “?Guns” for detailed information on the variables. 

Verify that Guns is a balanced panel: extract the number of years and states from the dataset and assign them to the predefined variables years and states, respectively. Afterwards use these variables for a logical comparison: check that the panel is balanced.

In [13]:
#install.packages("plm")
library(plm)
# You may get an error when trying to install the plm package,
# you can try:
# install.packages("plm", .libPaths(), repos='http://cran.us.r-project.org')
# if there is still a problem you need to open the Anaconda powershell and install the package via
# conda forge using the following command:
# conda install -c conda-forge r-plm

In [14]:
## Edit the code below 
# Header: Exercise 9
# Author: 
# Date:
# Candidate number:

# Load the data
#install.packages("AER")
library(AER)


data("Guns")

# Obtain an overview of the dataset
summary(Guns)
?Guns

      year        violent           murder          robbery      
 1977   : 51   Min.   :  47.0   Min.   : 0.200   Min.   :   6.4  
 1978   : 51   1st Qu.: 283.1   1st Qu.: 3.700   1st Qu.:  71.1  
 1979   : 51   Median : 443.0   Median : 6.400   Median : 124.1  
 1980   : 51   Mean   : 503.1   Mean   : 7.665   Mean   : 161.8  
 1981   : 51   3rd Qu.: 650.9   3rd Qu.: 9.800   3rd Qu.: 192.7  
 1982   : 51   Max.   :2921.8   Max.   :80.600   Max.   :1635.1  
 (Other):867                                                     
   prisoners           afam              cauc            male      
 Min.   :  19.0   Min.   : 0.2482   Min.   :21.78   Min.   :12.21  
 1st Qu.: 114.0   1st Qu.: 2.2022   1st Qu.:59.94   1st Qu.:14.65  
 Median : 187.0   Median : 4.0262   Median :65.06   Median :15.90  
 Mean   : 226.6   Mean   : 5.3362   Mean   :62.95   Mean   :16.08  
 3rd Qu.: 291.0   3rd Qu.: 6.8507   3rd Qu.:69.20   3rd Qu.:17.53  
 Max.   :1913.0   Max.   :26.9796   Max.   :76.53   Max.   :22.3

In [15]:
#Verify that the dataset is balanced
cat("Is pbalanced?",
is.pbalanced(Guns, index = c('state', 'year')), "\n")

# Note if you exclude the index you will get an incorrect answer
is.pbalanced(Guns)

# Alternative method (not a great method)

years  <- length(levels(Guns$year))
states <- length(levels(Guns$state))

# Logical test of whether the panel is balanced
cat("Is balanced (logical)?",
years*states == nrow(Guns))

Is pbalanced? TRUE 


"duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")"

Is balanced (logical)? TRUE

**Answer**

The Guns dataset is balanced.

## 2 b)

There is a controversial debate whether and if to what extent the right to carry a gun influences crime. Proponents of so-called “Carrying a Concealed Weapon” (CCW) laws argue that the deterrent effect of guns prevents crime, whereas opponents argue that the public availability of guns increases their usage and thus makes it easier to commit crimes. 

In the following exercises you will investigate this topic empirically. To begin with consider the following estimated model. 

$$\widehat{ln⁡(violent_{it})}= 6.135-0.443law_{it}$$

with  $i=1,..., 51$ 
- *violent* is the violent crime rate (incidents per 100000 residents) and 
- *law* is a binary variable indicating the implementation of a CCW law (1 = yes, 0 = no), respectively. 

Extend and estimate the model by including state fixed effects using the function plm() and assign the model object to the predefined variable model_se. 

Can you think of an unobserved variable that is captured by the state fixed effects? 

Print a summary of the model which reports cluster robust standard errors. Test whether the fixed state effects are jointly significant from zero. To do so use the function pFtest(). Use ?pFtest for additional information. (Note: you may need to install the package “plm” at this point if you have not done so). 

In [59]:
# estimate a model with state (or entity) fixed effects using plm()
model_se <- plm(log(violent) ~ as.numeric(law), data = Guns, 
                index = c("state", "year"), 
                model = "within",
                effect = "individual")

# effects = "oneway" is the default option so we do not need to specify it in the plm() function

# If you want to see the (de-meaned) entity fixed effects, use the command below:
# fixef(model_se, type = "dmean", effect = "individual")

# Comparing this to the number of states 
# length(unique(Guns$state))

# print a summary using clustered standard errors
coeftest(model_se, vcov = vcovHC, type = "HC1", cluster="group")
# type = "HC1" is the default, you could exclude this from the function

# This is the same as including dummy variables for the year using the lm() function 
# model_se2 <- lm(log(violent)~as.numeric(law)+factor(state)+0, data = Guns)
# model_se2$coefficients


t test of coefficients:

                Estimate Std. Error t value Pr(>|t|)   
as.numeric(law) 0.113663   0.035689  3.1848 0.001488 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


In [57]:
# F - test 
# Specify the restricted model 
# Linear model (without fixed effects)
model <- lm(log(violent) ~ as.numeric(law), data = Guns)

# F test 
# test whether the state fixed effects are jointly significant from zero
pFtest(model_se, model, vcov = vcovHC(model_se, cluster="group"))


	F test for individual effects

data:  log(violent) ~ as.numeric(law)
F = 260.5, df1 = 50, df2 = 1121, p-value < 2.2e-16
alternative hypothesis: significant effects


**Incorporating state fixed effects control for state specific unobservable omitted variables which do not vary over time (e.g. The residents attitude towards guns). Note that Including fixed effects changes both the sign and the magnitude of the estimated coefficient. The F-test reveals that the state fixed effects are jointly significantly different from zero. There is the possibility of an omitted variable bias, e.g., due to omitted variables that change over time.**

## 2 c)

It may also be reasonable to  include time effects which is why we now consider the model

$ln⁡(violent_it )= β_1 law_{it}+α_i+ λ_t+u_{it}$

with  $i=1,..., 51$ and $t=1977,…,1999$.  

Estimate the model above and assign it to the variable model_sete using plm(). 

Print a summary of the model which reports robust standard errors. 

Test whether both state and time fixed effects are jointly significant.

In [60]:
# estimate a model with state and time fixed effects using plm()  I use as.numeric around law because some versions of fixef do
# not like categorical/factor regressors and will throw an error.
model_sete <- plm(log(violent) ~ as.numeric(law), 
                  data = Guns, 
                  index =  c("state", "year"), 
                  model = "within", 
                  effect = "twoways")

# The binary dummy variable equivalent is
# lm(log(violent)~law+factor(state)+factor(year)+0, data = Guns)

# view the combined state-year effects: 
# fixef(model_sete, type = "level", effect = "twoways")

# print a summary using clustered standard errors
coeftest(model_sete, vcov. = vcovHC, type = "HC1", cluster="group")

# test whether state and time fixed effects are jointly significant from zero
pFtest(model_sete, model,vcov = vcovHC, type = "HC1", cluster="group")


t test of coefficients:

                Estimate Std. Error t value Pr(>|t|)
as.numeric(law) 0.001885   0.039504  0.0477    0.962



	F test for twoways effects

data:  log(violent) ~ as.numeric(law)
F = 284.97, df1 = 72, df2 = 1099, p-value < 2.2e-16
alternative hypothesis: significant effects


**Including state and time effects results in a very small coefficient estimate on law which is not significantly different from zero at any common significance level.** 

**The F-test reveals that state and time fixed effects are jointly significantly different from zero since p-value < 0.05**

In [8]:
# Testing the time fixed effects only

# test whether state and time fixed effects are jointly significant from zero
pFtest(model_sete, model_se, vcov. = vcovHC, type = "HC1", cluster="group")


	F test for twoways effects

data:  log(violent) ~ law
F = 27.911, df1 = 22, df2 = 1099, p-value < 2.2e-16
alternative hypothesis: significant effects


**The F-test reveals that time fixed effects are jointly significantly different from zero since p-value < 0.05**

**Note the change in df1**

## 2 d)

Despite the evidence for state as well as time effects, there still might be a bias due to omitted variables such as sociodemographic characteristics. The following model accounts for the latter (See ?Guns for detailed information on the additional variables):
	
$$ ln⁡(violent_{it} )= β_1 law_{it} +β_2  density_{it}+ β_3  income_{it} + β_4  population_{it}
+ β_5 afam_{it} + β_6  cauc_{it}+ α_i+ λ_t+u_{it} $$

- density: population per square mile of land area, divided by 1,000.
- income: real per capita personal income in the state (US dollars).
- population: state population, in millions of people.
- afam: percent of state population that is African-American, ages 10 to 64.
- cauc: percent of state population that is Caucasian, ages 10 to 64.

Estimate the extended model and assign it to the predefined variable model_sete_ext. Print a robust summary of the estimated model. 

Use your results to interpret the effect of a CCW law.


In [61]:
# estimate the extended model
model_sete_ext <- plm(log(violent) ~ as.numeric(law) + prisoners + density 
                  + income + population + afam + cauc, 
                  data = Guns, index = c("state", "year"), 
                  model = "within", effect = "twoways")

# print a summary using clustered standard errors
coeftest(model_sete, vcov. = vcovHC, type = "HC1", cluster="group")

# view the fixed effects:
#fixef(model_sete_ext, effect = "twoways")
#fixef(model_sete_ext, effect="time", type="level")


t test of coefficients:

                Estimate Std. Error t value Pr(>|t|)
as.numeric(law) 0.001885   0.039504  0.0477    0.962


**Once control variables are included in the regression alongside the fixed effects there is no longer a statistically significant coefficient on the CCW law variable**

In [None]:
## DO NOT EDIT
# This section formats your answers for marking
answers = c(reg1a$coefficients, reg1b$coefficients, reg1c$coefficients, lh1, lh2, table1)
#answers