# Gun-Related Deaths in the US and outside the US

## Abstract
Every year thousands of people die in gun-related incidents in the United States. Within the US, individual states have differing levels of gun legislation in place. The Law Center to Prevent Gun Violence in conjunction with the Brady Center evaluates the strength of these gun laws and ranks the states accordingly. A study in 2013 and again in 2016 determined states with more stringent gun legislation have lower rates of gun-related homicides and suicides. Data about gun-related deaths was retrieved from the Centers for Disease Control and Prevention. Fatality rates were calculated based on state by state population data from the US Census. Statistical tests were performed to confirmed states with the highest levels of gun laws subsequently had lower fatality rates. Outside of the US, many countries enforce very stringent gun control laws. In Australia, citizens are not allowed to own handguns. Gun-related deaths in the US were compared to two of these countries with stricter gun laws, the United Kingdom and Australia. The US had a statistically greater rate of gun-related fatalities compared to the UK and Australia. In the US, more research is needed and action to prevent gun-related deaths. This will most likely take a great change in public opinion.

## Introduction
Research related to gun violence has been seriously underfunded in the past few decades. This stems from the Dickey Amendment enacted in 1996, which prevents funding to go toward gun-related research at the CDC.<sup>1</sup> Despite this, in recent years there has been more interest in understanding the efficacy of gun legislation. A 2013 study found states with more stringent gun laws had statistically lower gun-related fatality rates.<sup>2</sup> A further review from 2016 was able to highlight stronger laws requiring background checks and permit-to-purchases were associated with decreased firearm homicide rates.<sup>3</sup> This analysis sought to test this hypothesis that stronger gun laws are associated with lower gun-related fatality rates. To further investigate this, a second hypothesis was tested to see if other countries with even more stringent gun laws, namely the United Kingdom and Australia, saw statistically fewer gun-related deaths. 

In [3]:
library(dplyr)
library(lubridate)
library(ggplot2)
#install.packages('FSA', dependencies = T)
library(FSA)
library(cluster)
#install.packages('factoextra', dependencies = T)
library(factoextra)

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
also installing the dependencies ‘flexmix’, ‘prabclus’, ‘diptest’, ‘trimcluster’, ‘fpc’, ‘flashClust’, ‘leaps’, ‘scatterplot3d’, ‘ggsci’, ‘cowplot’, ‘ggsignif’, ‘dendextend’, ‘FactoMineR’, ‘ggpubr’, ‘ggrepel’, ‘ade4’, ‘ca’, ‘mclust’

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ


## Methods
### Gun Legislation and Gun-Related Fatalities Within the US
Data regarding gun violence within the US was retrieved from the Gun Violence Archive.<sup>4</sup> The data was aggregated and filtered. The original dataset contained records from 2013 to 2018; however, the records from 2013 and 2018 were incomplete. These years were excluded from the analysis. The gun-related fatalities dataset was then merged with population estimates for each state obtained from the United States Census Bureau.<sup>5</sup> Rankings for the strength of each states’ gun laws were retrieved from the Law Center to Prevent Gun Violence.<sup>6</sup> These rankings did not include the District of Columbia, so this was excluded from the analysis. The states were split into 4 quartiles according to their rank, including high, medium high, medium low, and low. Gun-related fatality rates were calculated for each state using gun-related deaths per 100,000 population per year. 

In [None]:
US.gun.violence <- read.csv('Data/gun-violence-data-updated.csv', header=T, stringsAsFactors = F)
state.pop <- read.csv('Data/state_populations.csv', header=T, stringsAsFactors = F)
gun.law.rank <- read.csv('Data/GunLawRank.csv', header=T, stringsAsFactors = F)

state.gun.violence <- US.gun.violence %>%
  mutate(date = as.Date(date)) %>%
  mutate(Year = year(date)) %>%
  group_by(state, Year) %>%
  summarise(TotalInjured = sum(n_injured), 
            TotalKilled = sum(n_killed)) %>%
  filter(Year != 2013 & Year != 2018 &
           state != 'District of Columbia') %>%
  as.data.frame()

gun.law.rank <- gun.law.rank %>%
  arrange(GunLawStrengthRank)
high.gun.law <- gun.law.rank[1:12,'State'] 
medhigh.gun.law <- gun.law.rank[13:25, 'State']
medlow.gun.law <- gun.law.rank[26:38, 'State']
low.gun.law <- gun.law.rank[39:50, 'State']

state.yearbyyear.rate <- state.gun.violence %>%
  left_join(state.pop, by = c('state' = 'State', 'Year' = 'Year')) %>%
  mutate(Rate = TotalKilled / Population,
         Year = as.factor(Year)) %>%
  mutate(RatePer100000 = Rate * 100000,
         GunLawCategory = case_when(
           state %in% high.gun.law ~ 'high',
           state %in% medhigh.gun.law ~ 'medium high',
           state %in% medlow.gun.law ~ 'medium low',
           state %in% low.gun.law ~ 'low'
         )) %>%
  as.data.frame()

A histogram highlighted the right skewness of the data. Boxplots of the strongest (high quartile) and weakest (low quartile) gun law states illustrate a visual difference between the two extremes.

In [None]:
hist(state.yearbyyear.rate$RatePer100000)

ggplot(subset(state.yearbyyear.rate, GunLawCategory == 'high'), 
       aes(x=state, y=RatePer100000)) +
  geom_boxplot() + 
  labs(
    title = 'Distribution of Rates from 2014-2017',
    subtitle = 'State with the Highest Gun Law Strength Ranks',
    x = 'State',
    y = 'Deaths Per 100,000 Population'
  ) +
  scale_y_continuous(limits=c(0,12))

ggplot(subset(state.yearbyyear.rate, GunLawCategory == 'low'), 
       aes(x=state, y=RatePer100000)) +
  geom_boxplot() + 
  labs(
    title = 'Distribution of Rates from 2014-2017',
    subtitle = 'State with the Lowest Gun Law Strength Ranks',
    x = 'State',
    y = 'Deaths Per 100,000 Population'
  ) +
  scale_y_continuous(limits=c(0,12))

An ANOVA was initially performed; however, after checking the residual plots of the ANOVA it was determined the data violated the assumptions of an ANOVA. At this point, non-parametric tests were used to confirm if the data was statistically different. 

The first test used was the Kruskal-Wallis rank sum test, which compared all 4 categories of gun law strength. A Dunn test was then performed to identify the significant multiple pairwise comparisons. Then a Mann-Whitney-Wilcoxon sum test was used to confirm the statistical difference between the highest and lowest categories. 

In [None]:
state.yearbyyear.rate$GunLawCategory <- as.factor(state.yearbyyear.rate$GunLawCategory)
state.kw <- kruskal.test(RatePer100000 ~ GunLawCategory, data = state.yearbyyear.rate)

state.dunn <- dunnTest(RatePer100000 ~ GunLawCategory, data = state.yearbyyear.rate, method = 'bonferroni')

state.mww.df <- subset(state.yearbyyear.rate, GunLawCategory %in% c('high', 'low'))
state.mww <- wilcox.test(RatePer100000 ~ GunLawCategory, data = state.mww.df)

The overall gun-related fatality rate for each state from 2014 to 2017 was calculated. A K-means cluster analysis was run to see if the resulting clusters would match the previously determined rankings. 

In [None]:
state.overall.rate <- state.gun.violence %>%
  left_join(state.pop, by = c('state' = 'State', 'Year' = 'Year')) %>%
  group_by(state) %>%
  summarise(TotalKilled = sum(TotalKilled),
            TotalPop = sum(Population)) %>%
  mutate(Rate = TotalKilled / TotalPop,
         RatePer100000 = Rate * 100000) %>%
  as.data.frame()

state.cluster.df <- subset(state.overall.rate, select = c(RatePer100000, state))
state.cluster.df <- subset(state.overall.rate, select = RatePer100000)
rownames(state.cluster.df) <- state.overall.rate$state
distance <- get_dist(state.cluster.df)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

state.kmean <- kmeans(state.cluster.df, centers = 4, nstart = 50)

state.cluster.df <- state.cluster.df %>%
  mutate(state = state.overall.rate$state) %>%
  mutate(cluster = as.factor(state.kmean$cluster),
         GunLawCategory = case_when(
           state %in% high.gun.law ~ 'high',
           state %in% medhigh.gun.law ~ 'medium high',
           state %in% medlow.gun.law ~ 'medium low',
           state %in% low.gun.law ~ 'low'
         ))

### Gun Legislation and Gun-Related Fatalities In and Outside the US


Results

What's Next
Further analysis taking into account multiple covariates, including socioeconomic status
make sure other voilence is not up 

References
1. Gerstein
2. Fleeger
3. Lee
4. Gun violence archive.
5. US Census Bureau.
6. Law center to prevent gun violence