# Causal Inference: Gun Reform and Lethal Violence - a Difference-in-Difference Analysis
An updated analysis of Cheng and Hoekstra [2013] review of changes to the Castle Doctrine and resulting changes to lethal violence.

# Data Wrangling

## Import Libraries

In [None]:
# Import pandas, matplotlib.pyplot, and seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Load Dataset

In [None]:
# Accessing Google Drive by mounting it locally
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/GitHub/causal_inference_gun_violence/

In [None]:
# Location on Google Drive
loc_data = '/content/drive/MyDrive/colab_notebooks/GitHub/causal_inference_gun_violence/'

In [None]:
# Load Stata datafile into pandas
castle = pd.read_stata(loc_data + 'castle.dta')

In [None]:
# Unmount Google Drive.
# drive.flush_and_unmount()

## Initial look at dataset

In [None]:
# Set Pandas options to fully display tables 
pd.set_option('max_rows', 99999)
pd.set_option('max_colwidth', 400)
pd.set_option('max_columns', 9999)

In [None]:
castle.head()

Unnamed: 0,state,year,sid,cdl,pre2_cdl,caselaw,anywhere,assumption,civil,homicide_c,robbery_gun_r,jhcitizen_c,jhpolice_c,homicide,robbery,assault,burglary,larceny,motor,murder,hc_felonywsus,jhcitizen,jhpolice,population,police,unemployrt,income,blackm_15_24,whitem_15_24,blackm_25_44,whitem_25_44,prisoner,lagprisoner,poverty,exp_subsidy,exp_pubwelfare,northeast,midwest,south,west,effyear,r20001,r20002,r20003,r20004,r20011,r20012,r20013,r20014,r20021,r20022,r20023,r20024,r20031,r20032,r20033,r20034,r20041,r20042,r20043,r20044,r20051,r20052,r20053,r20054,r20061,r20062,r20063,r20064,r20071,r20072,r20073,r20074,r20081,r20082,r20083,r20084,r20091,r20092,r20093,r20094,r20101,r20102,r20103,r20104,trend_1,trend_2,trend_3,trend_4,trend_5,trend_6,trend_7,trend_8,trend_9,trend_10,trend_11,trend_12,trend_13,trend_14,trend_15,trend_16,trend_17,trend_18,trend_19,trend_20,trend_21,trend_22,trend_23,trend_24,trend_25,trend_26,trend_27,trend_28,trend_29,trend_30,trend_31,trend_32,trend_33,trend_34,trend_35,trend_36,trend_37,trend_38,trend_39,trend_40,trend_41,trend_42,trend_43,trend_44,trend_45,trend_46,trend_47,trend_48,trend_49,trend_50,trend_51,l_murder,l_homicide,l_homicide_c,l_robbery,l_assault,l_burglary,l_larceny,l_motor,l_robbery_gun,l_jhcitizen,l_jhpolice,l_hc_felonywsus,l_robbery_gun_r,l_pop,l_police,l_income,l_prisoner,l_lagprisoner,l_exp_subsidy,l_exp_pubwelfare,treatment_date,post,time_til,lead1,lead2,lead3,lead4,lead5,lead6,lead7,lead8,lead9,lag0,lag1,lag2,lag3,lag4,lag5,_Iyear_2001,_Iyear_2002,_Iyear_2003,_Iyear_2004,_Iyear_2005,_Iyear_2006,_Iyear_2007,_Iyear_2008,_Iyear_2009,_Iyear_2010,popwt
0,Alabama,2000,1,0.0,0.0,0.0,0,0,0,329,0.210803,1,0,7.593978,131.613571,325.617798,930.920166,2940.623779,295.657349,6.532207,0.484722,0.046164,0.023082,4332380,348.838283,4.1,44851,2.222243,4.69481,3.471694,10.414437,605.3255,564.274109,14.705981,212.261169,1018.497009,0,0,1,0,2006.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.876745,2.027356,5.796058,4.87987,5.785724,6.836174,7.986377,5.689201,-1.55683,-3.075555,-3.768702,-0.72418,-1.55683,15.281628,5.854609,10.711102,6.405766,6.33554,5.357818,6.926083,2006.0,0.0,-6.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,4499293.0
1,Alabama,2001,1,0.0,0.0,0.0,0,0,0,379,0.214362,2,0,8.713443,128.379593,281.63504,934.384583,2758.689941,290.118561,7.49494,0.666728,0.068972,0.022991,4349601,351.825374,4.7,43301,2.249954,4.700201,3.440247,10.186382,614.792053,605.3255,15.750203,221.890289,1068.928589,0,0,1,0,2006.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.014228,2.164867,5.937536,4.854991,5.640612,6.839888,7.922511,5.67029,-1.540087,-2.674057,-3.772669,-0.405373,-1.540087,15.285595,5.863135,10.675931,6.421284,6.405766,5.402183,6.974412,2006.0,0.0,-5.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0,0,0,0,0,0,0,0,4499293.0
2,Alabama,2002,1,0.0,0.0,0.0,0,0,0,303,0.424019,3,2,6.933288,136.423309,274.631409,974.275696,2835.829102,317.832886,6.406999,0.594936,0.091529,0.068646,4370221,311.494545,5.4,45573,2.258261,4.685461,3.403993,9.957734,640.104919,614.792053,15.556559,257.726379,1139.962646,0,0,1,0,2006.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.857391,1.936334,5.713733,4.915762,5.61543,6.881694,7.950089,5.761526,-0.857978,-2.391104,-2.678786,-0.519302,-0.857978,15.290324,5.741382,10.727071,6.461632,6.421284,5.551898,7.038751,2006.0,0.0,-4.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,0,0,0,0,0,4499293.0
3,Alabama,2003,1,0.0,0.0,0.0,0,0,0,299,0.245446,2,1,6.818007,137.682693,258.536987,986.102661,2828.423828,341.059937,6.567177,0.319238,0.068408,0.045605,4385446,348.562039,5.5,44165,2.292948,4.688166,3.383214,9.79141,636.491699,640.104919,15.447802,270.940002,1224.665894,0,0,1,0,2006.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.882084,1.919567,5.700444,4.924952,5.555039,6.893761,7.947475,5.832058,-1.40468,-2.682264,-3.087729,-1.141819,-1.40468,15.293802,5.853816,10.695688,6.455971,6.461632,5.601897,7.110424,2006.0,0.0,-3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,1,0,0,0,0,0,0,0,4499293.0
4,Alabama,2004,1,0.0,1.0,0.0,0,0,0,254,0.261006,3,0,5.753689,136.865311,255.654068,1011.788513,2800.959229,317.676117,5.595123,0.226523,0.090609,0.022652,4414559,327.914974,5.1,42280,2.291327,4.661734,3.358841,9.599033,586.400574,636.491699,16.252654,264.961548,1194.543579,0,0,1,0,2006.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.721895,1.749841,5.537334,4.918997,5.543825,6.919475,7.937717,5.761033,-1.343211,-2.401199,-3.787493,-1.484908,-1.343211,15.300419,5.792754,10.652069,6.374003,6.455971,5.579585,7.085519,2006.0,0.0,-2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,1,0,0,0,0,0,0,4499293.0


In [None]:
castle.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 550 entries, 0 to 549
Data columns (total 185 columns):
 #    Column            Non-Null Count  Dtype  
---   ------            --------------  -----  
 0    state             550 non-null    object 
 1    year              550 non-null    int16  
 2    sid               550 non-null    int8   
 3    cdl               550 non-null    float32
 4    pre2_cdl          550 non-null    float32
 5    caselaw           550 non-null    float32
 6    anywhere          550 non-null    int8   
 7    assumption        550 non-null    int8   
 8    civil             550 non-null    int8   
 9    homicide_c        550 non-null    int16  
 10   robbery_gun_r     544 non-null    float32
 11   jhcitizen_c       550 non-null    int8   
 12   jhpolice_c        550 non-null    int16  
 13   homicide          550 non-null    float32
 14   robbery           550 non-null    float32
 15   assault           550 non-null    float32
 16   burglary          550 no

In [None]:
castle.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
year,550.0,2005.0,3.165156,2000.0,2002.0,2005.0,2008.0,2010.0
sid,550.0,26.34,14.67979,1.0,14.0,26.5,39.0,51.0
cdl,550.0,0.1522391,0.3476353,0.0,0.0,0.0,0.0,1.0
pre2_cdl,550.0,0.07636364,0.2658208,0.0,0.0,0.0,0.0,1.0
caselaw,550.0,0.2,0.4003641,0.0,0.0,0.0,0.0,1.0
anywhere,550.0,0.1436364,0.3510399,0.0,0.0,0.0,0.0,1.0
assumption,550.0,0.1109091,0.3143054,0.0,0.0,0.0,0.0,1.0
civil,550.0,0.16,0.3669398,0.0,0.0,0.0,0.0,1.0
homicide_c,550.0,320.26,403.634,4.0,43.25,181.5,430.75,2503.0
robbery_gun_r,544.0,0.3517144,0.1309903,0.008788,0.2508814,0.3550301,0.4600999,0.5973725


The 'year' category shows that the dataset goes from the year 2000 to 2010.  

In [None]:
# Convert year category from int16 to time category.
castle.year.astype('int32')
castle.year = pd.to_datetime(castle.year, format='%Y')

In [None]:
castle.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 550 entries, 0 to 549
Data columns (total 185 columns):
 #    Column            Non-Null Count  Dtype         
---   ------            --------------  -----         
 0    state             550 non-null    object        
 1    year              550 non-null    datetime64[ns]
 2    sid               550 non-null    int8          
 3    cdl               550 non-null    float32       
 4    pre2_cdl          550 non-null    float32       
 5    caselaw           550 non-null    float32       
 6    anywhere          550 non-null    int8          
 7    assumption        550 non-null    int8          
 8    civil             550 non-null    int8          
 9    homicide_c        550 non-null    int16         
 10   robbery_gun_r     544 non-null    float32       
 11   jhcitizen_c       550 non-null    int8          
 12   jhpolice_c        550 non-null    int16         
 13   homicide          550 non-null    float32       
 14   robbery 

## Characterizing Null Values within the Dataset

In [None]:
# Dataset Null Value exploration
missing_data = pd.concat([castle.isnull().sum(), 100 * castle.isnull().mean()], axis=1)
missing_data.columns=['count', '%']
missing_data.sort_values(by='count', ascending=False).head(10)

Unnamed: 0,count,%
time_til,319,58.0
treatment_date,319,58.0
effyear,319,58.0
hc_felonywsus,11,2.0
l_hc_felonywsus,11,2.0
l_robbery_gun_r,6,1.090909
l_robbery_gun,6,1.090909
robbery_gun_r,6,1.090909
trend_43,0,0.0
trend_37,0,0.0


The Null Values in 'time_til', 'treatment_date', and 'effyear' are from states that did not change their Castle Doctrine laws.

In [None]:
castle

Output hidden; open in https://colab.research.google.com to view.

In [None]:
castle.year.min()

Timestamp('2000-01-01 00:00:00')

In [None]:
castle.year.max()

Timestamp('2010-01-01 00:00:00')

In [None]:
castle.state.unique()

array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia',
       'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas',
       'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
       'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont',
       'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype=object)

In [None]:
castle.state.nunique()

50

In [None]:
castle[['state', 'year', 'time_til', 'treatment_date', 'effyear']]

Unnamed: 0,state,year,time_til,treatment_date,effyear
0,Alabama,2000-01-01,-6.0,2006.0,2006.0
1,Alabama,2001-01-01,-5.0,2006.0,2006.0
2,Alabama,2002-01-01,-4.0,2006.0,2006.0
3,Alabama,2003-01-01,-3.0,2006.0,2006.0
4,Alabama,2004-01-01,-2.0,2006.0,2006.0
5,Alabama,2005-01-01,-1.0,2006.0,2006.0
6,Alabama,2006-01-01,0.0,2006.0,2006.0
7,Alabama,2007-01-01,1.0,2006.0,2006.0
8,Alabama,2008-01-01,2.0,2006.0,2006.0
9,Alabama,2009-01-01,3.0,2006.0,2006.0


## Data Wrangling Conclusions

From 2000 to 2010, 21 states adopted laws that changed Castle Doctrine from Duty to Retreat to Stand Your Ground.  This change in the law is the treatment, in an economic definition of the term.  The treatment_date and effyear variables record when the treatment occurred.  

This dataset is derived from the Uniform Crime Reporting Program (https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/ucr) that tracks 8 criminal stats across all 50 states since the 1920s - murder and non-negligent manslaughter, forcible rape, burglary, aggravated assault, larceny, motor vehicle theft, and arson.  These statistics are converted into rates - incidents per 100,000 population.  

These variables are further converted to log rates (l_homicide for example) for use in regression.  North/South/East/West are dummy variables for census regions.  blackm_25_44, etc. are statistics to characterize population.  r20001, etc. are dummy variables for regressions.  

Null values are related to states that never changed there gun laws and are acting as a control.  Therefore, these are valid null values and do not need to be imputed.