<a href="https://colab.research.google.com/github/cramirezda/micro/blob/main/Assignment_6_questions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ASSIGNMENT 6
# *YOUR NAME*

## ECO 50213: Advanced Microeconometrics
## May 2, 2025

Please complete your code and text answers in this notebook, then export the notebook to PDF format. Use the online submission feature of Canvas to upload your PDF before 6:00 p.m. on Friday, May 9.

You are encouraged to work together on the assignments. AI tools may be used to assist in coding, debugging, or understanding concepts. However, the final submission must be your own work, written in your own words, and reflect your own understanding.

In [7]:
import pandas as pd
import numpy as np
!pip install pyfixest
import pyfixest as pf
from pyfixest import feols



## Question 1: Concealed weapons and crime

This exercise expands on the analysis of gun laws in the United States by the economist John Lott in his influential and controversial book, *More Guns, Less Crime*. Lott claims that rates of violent crime go down when states pass "shall issue" concealed carry laws. These laws instruct local authorities to issue a concealed weapons permit to all applicants, with limited restrictions. Lott argues that these laws deter violent crime because criminals are less willing to attack someone who might be carrying a concealed weapon.

The following code imports the dataset `guns`. Each observation in the dataset is a state ($stateid$) and year ($year$). The main regressor of interest is the dummy variable $shall$, which is equal to 1 if the state has a concealed carry law in a particular year. The dependent variable is $\log(vio)$, the log of the violent crime rate in the state and year.

In [8]:
# Load data
url = "https://raw.githubusercontent.com/sdmcrae/econometrics/master/datasets/guns.csv"

df = pd.read_csv(url)
print(df.head())

   year    vio   mur    rob  incarc_rate    pb1064    pw1064    pm1029  \
0    77  414.4  14.2   96.8           83  8.384873  55.12291  18.17441   
1    78  419.1  13.3   99.1           94  8.352101  55.14367  17.99408   
2    79  413.3  13.2  109.5          144  8.329575  55.13586  17.83934   
3    80  448.5  13.2  132.1          141  8.408386  54.91259  17.73420   
4    81  470.5  11.9  126.5          149  8.483435  54.92513  17.67372   

        pop    avginc   density  stateid  shall  
0  3.780403  9.563148  0.074552        1      0  
1  3.831838  9.932000  0.075567        1      0  
2  3.866248  9.877028  0.076245        1      0  
3  3.900368  9.541428  0.076829        1      0  
4  3.918531  9.548351  0.077187        1      0  


In [9]:
df['logvio']=np.log(df['vio'])

The variables in this dataset are:
- $year$: Year (1977-1999)
- $vio$: violent crime rate (incidents per 100,000 members of the population)
- $mur$: murder rate (incidents per 100,000)
- $rob$: robbery rate (incidents per 100,000)
- $incarc\_rate$: incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year)
- $pm1029$: percent of state population that is male, ages 10 to 29
- $pw1064$: percent of state population that is white, ages 10 to 64
- $pb1064$: percent of state population that is black, ages 10 to 64
- $pop$: state population, in millions of people
- $avginc$: real per capita personal income in the state, in thousands of dollars
- $density$: population per square mile of land area, divided by 1000
- $stateid$: ID number of states (Alabama = 1, Alaska = 2, etc.)
- $shall$: 1 if the state has a shall-carry law in effect in that year, otherwise 0

You will estimate six different models for the effect of concealed carry laws on $\log(vio)$. Use `pf.feols` to run these regressions, then `etable` to report all of these results in a single regression table, with one column for each model. Use heteroskedasticity-consistent standard errors for the first five regressions. In the final regression, you will see the effect of clustering the standard errors by state.

1. **Pooled estimator**

   - Regress $\log(vio)$ on $shall$:
   
     \begin{align*}
     \log(vio_{it}) &= \beta_0 + \beta_1 shall_{it} + u_{it}
     \end{align*}

   - For panel data, this regression is known as the **pooled estimator**, because it estimates the standard OLS model with no fixed effects. Report your results in Column 1 of your regression table.
   - What is your interpretation of $\hat{\beta}_1$ for this regression?

In [10]:
# Supongamos que ya tienes un DataFrame llamado df con las columnas 'y' y 'x'
model_ols1 = feols('logvio ~ shall', data=df)
print(model_ols1.summary())


###

Estimation:  OLS
Dep. var.: logvio, Fixed effects: 0
Inference:  iid
Observations:  1173

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      6.135 |        0.021 |   296.130 |      0.000 |  6.094 |   6.176 |
| shall         |     -0.443 |        0.042 |   -10.539 |      0.000 | -0.525 |  -0.361 |
---
RMSE: 0.617 R2: 0.087 
None


2. **Pooled estimator with control variables**

   - Repeat your regression from (1), with additional control variables: $incarc\_rate$ (incarceration rate), $density$ (population density), and $avginc$ (average income):

     \begin{align*}
     \log(vio_{it}) &= \beta_0 + \beta_1 shall_{it} + \beta_2 incarc\_rate_{it} + \beta_3 density_{it} + \beta_4 avginv_{it} + u_{it}
     \end{align*}

   - Report your results in Column 2 of your regression table.
   - Does adding the control variables change the estimated effect of a shall-issue law from regression (1)?
   - Suggest a variable that varies across states but plausibly varies little over time and that could cause omitted variable bias in regression (2).

In [12]:
# Ahora añadimos dos variables de control z1 y z2
model_ols_ctrl = feols('logvio ~ shall + incarc_rate + avginc', data=df)
print(model_ols_ctrl.summary())


###

Estimation:  OLS
Dep. var.: logvio, Fixed effects: 0
Inference:  iid
Observations:  1173

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept     |      5.296 |        0.080 |    65.886 |      0.000 |  5.139 |   5.454 |
| shall         |     -0.477 |        0.033 |   -14.401 |      0.000 | -0.541 |  -0.412 |
| incarc_rate   |      0.002 |        0.000 |    21.248 |      0.000 |  0.002 |   0.002 |
| avginc        |      0.030 |        0.006 |     4.842 |      0.000 |  0.018 |   0.043 |
---
RMSE: 0.485 R2: 0.436 
None


3. **State fixed effects**

   - Repeat your regression from (2), this time with the addition of state fixed effects:

     \begin{align*}
     \log(vio_{it}) &= \beta_0 + \beta_1 shall_{it} + \beta_2 incarc\_rate_{it} + \beta_3 density_{it} + \beta_4 avginv_{it} + \alpha_i + u_{it}
     \end{align*}

   - Report your results in Column 3 of your regression table.
   - Do you find a difference in your estimate of $\beta_1$ compared to the pooled regressions (1) and (2)? What does this result imply about whether or not the $\alpha_i$ terms are distributed independently of the other regressors?

In [13]:
# Efectos fijos por región (supón que df tiene una columna 'region')
model_fe_region = feols('logvio ~ shall + incarc_rate + density + avginc | stateid', data=df)
print(model_fe_region.summary())


###

Estimation:  OLS
Dep. var.: logvio, Fixed effects: stateid
Inference:  CRV1
Observations:  1173

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| shall         |      0.021 |        0.041 |     0.525 |      0.602 | -0.061 |   0.103 |
| incarc_rate   |      0.000 |        0.000 |     1.934 |      0.059 | -0.000 |   0.001 |
| density       |      0.161 |        0.091 |     1.760 |      0.084 | -0.023 |   0.345 |
| avginc        |      0.020 |        0.013 |     1.487 |      0.143 | -0.007 |   0.047 |
---
RMSE: 0.165 R2: 0.935 R2 Within: 0.133 
None


4. **Time fixed effects**

   - Repeat your regression from (2), this time with the addition of time fixed effects instead of state fixed effects:

     \begin{align*}
     \log(vio_{it}) &= \beta_0 + \beta_1 shall_{it} + \beta_2 incarc\_rate_{it} + \beta_3 density_{it} + \beta_4 avginv_{it} + \lambda_t + u_{it}
     \end{align*}

   - Report your results in Column 4 of your regression table.
   - Do you find a difference in your estimate of $\beta_1$ compared to the pooled regressions (1) and (2)? What does this result imply about whether or not the $\lambda_t$ terms are distributed independently of the other regressors?

In [16]:
# Efectos fijos por año (digamos que df tiene una columna 'year')
model_fe_time = feols('logvio ~ shall + incarc_rate + density + avginc | year', data=df)
print(model_fe_time.summary())


###

Estimation:  OLS
Dep. var.: logvio, Fixed effects: year
Inference:  CRV1
Observations:  1173

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| shall         |     -0.343 |        0.027 |   -12.782 |      0.000 | -0.399 |  -0.287 |
| incarc_rate   |      0.002 |        0.000 |    13.075 |      0.000 |  0.002 |   0.003 |
| density       |     -0.048 |        0.021 |    -2.237 |      0.036 | -0.092 |  -0.003 |
| avginc        |      0.060 |        0.007 |     8.728 |      0.000 |  0.046 |   0.075 |
---
RMSE: 0.466 R2: 0.479 R2 Within: 0.463 
None


5. **Two-way fixed effects**

   - Repeat your regression from (2), this time with the addition of state and time fixed effects:

     \begin{align*}
     \log(vio_{it}) &= \beta_0 + \beta_1 shall_{it} + \beta_2 incarc\_rate_{it} + \beta_3 density_{it} + \beta_4 avginv_{it} + \alpha_i + \lambda_t + u_{it}
     \end{align*}

   - Report your results in Column 5 of your regression table.
   - Do you find a difference in your estimate of $\beta_1$ compared to the pooled regressions (1) and (2)?

In [20]:
# Efectos fijos por año (digamos que df tiene una columna 'year')
model_fe_time = feols('logvio ~ shall + incarc_rate + density + avginc | stateid + year', data=df)
print(model_fe_time.summary())


###

Estimation:  OLS
Dep. var.: logvio, Fixed effects: stateid+year
Inference:  CRV1
Observations:  1173

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| shall         |      0.006 |        0.040 |     0.147 |      0.884 | -0.075 |   0.087 |
| incarc_rate   |     -0.000 |        0.000 |    -0.039 |      0.969 | -0.001 |   0.000 |
| density       |     -0.057 |        0.134 |    -0.425 |      0.673 | -0.326 |   0.212 |
| avginc        |      0.005 |        0.015 |     0.349 |      0.729 | -0.024 |   0.035 |
---
RMSE: 0.139 R2: 0.954 R2 Within: 0.002 
None


6. **Two-way fixed effects with clustered standard errors**

   - Repeat your regression from (5), but cluster your standard errors by state.
   - Report your results in Column 6 of your regression table.
   - What happens to your standard errors when you cluster by state?

In [22]:
# Efectos fijos por año (digamos que df tiene una columna 'year')
model_fe_time = feols('logvio ~ shall + incarc_rate + density + avginc | stateid + year', data=df,vcov={"CRV1": "stateid"})
print(model_fe_time.summary())


###

Estimation:  OLS
Dep. var.: logvio, Fixed effects: stateid+year
Inference:  CRV1
Observations:  1173

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| shall         |      0.006 |        0.040 |     0.147 |      0.884 | -0.075 |   0.087 |
| incarc_rate   |     -0.000 |        0.000 |    -0.039 |      0.969 | -0.001 |   0.000 |
| density       |     -0.057 |        0.134 |    -0.425 |      0.673 | -0.326 |   0.212 |
| avginc        |      0.005 |        0.015 |     0.349 |      0.729 | -0.024 |   0.035 |
---
RMSE: 0.139 R2: 0.954 R2 Within: 0.002 
None


7. **Conclusion**

   - Based on your analysis, what conclusions would you draw about the effects of concealed weapons laws on crime rates? Does the data support Lott's argument that more guns leads to less crime?

## Question 2: Monte Carlo study of clustered standard errors in panel data

We mentioned in the lectures that **cluster‑robust standard errors** can be unreliable when the number of clusters is small. The goal of this Monte Carlo exercise is to measure the empirical size of the 5% two‑sided test of $H_0: \beta = 0$ in a simple panel‑data model, varying only:
- the number of clusters, $G$, and  
- the number of observations per cluster, $T$.

1. **Constructing the simulated data**

   - Simulate data from the following model:
   \begin{align*}
x_{gt} &\sim \mathcal N(0,1) \\
\alpha_g &\sim \mathcal N(0,1) \\
u_{gt} &\sim \mathcal N(0,1) \\
y_{gt} &= \beta x_{gt} + \alpha_g + u_{gt}
\end{align*}
where $g = 1,\dots,G,\; t = 1,\dots,T$. Assume the true slope $\beta = 0$.
   - Note that $\alpha_g$ is constant for all $T$ observations within a group $g$.
   - Write your code so that you can vary $G$ and $T$.

In [1]:
random.seed(1234)
g=10
t=1000
x= np.random.normal(0, 1, size=(g, t))
alpha = np.random.normal(0, 1, size=g)
u = np.random.normal(0, 1, size=(g, t))
y = alpha + u


NameError: name 'random' is not defined

2. **Estimation and hypothesis testing**
   
   - Estimate the model $y_{gt} = \beta x_{gt} + \alpha_g + u_{gt}$, where $\alpha_g$ is a group fixed effect. For example, you could use the `pf.feols` function with group fixed effects.
   - Use standard errors clustered on the group variable $g$.
   - Using a 5% significance level, record whether you reject $H_0: \beta = 0$.
   - Repeat this process 2000 times and return the average rejection rate.
   - Given that $H_0$ is true ($\beta$ really does equal 0), the rejection rate for a 5% test should be close to 5%.

3. **Varying $G$ and $T$**
   
   - Create a grid of values of $G$ and $T$. For example, $G \in \{5,10,20,30,50,100,250\}$ and $T \in \{5, 10, 20, 50, 100\}$
   - For every $(G,T)$ pair, run the 2000 replications in (2) and store the rejection rate that is returned.
   - Present your results in a table where $G$ varies across the rows and $T$ varies across the columns.

4. **Interpetation**

   - What happens to the empirical size as $T$ rises while $G$ stays fixed?
   - What happens to the empirical size as $G$ rises while $T$ stays fixed?
   - Discuss the implications of your results for empirical researchers using clustered standard errors.