# Gun deaths/ownership per capita

I recently saw this image on a left-leaning political YouTube channel, and it seemed like it would be a fun thing to verify, since it seems to play into the anti-firearm narrative of the left, as there is an apparent correlation between gun ownership in general and gun deaths from all causes.

![chart](inspo.jpg)

I have pulled population/demographic data, gun ownership data, and gun fatality data to see if my results look similar. I will also color each dot by the winner of the presidential election of that state in the 2020 election.

Sources are as follows:
- [Gun fatalities (CDC)](https://www.cdc.gov/nchs/pressroom/sosmap/firearm_mortality/firearm.htm)
- [Population per state 2020 (US Census)](https://data.census.gov/cedsci/table?q=population%20by%20state&tid=DECENNIALPL2020.P1)
- [Gun ownership rates (World Population Review)](https://worldpopulationreview.com/state-rankings/gun-ownership-by-state)
- [2020 Presidential Election results](https://www.cookpolitical.com/2020-national-popular-vote-tracker)

So we are going to do some light analysis to see if we the above chart is roughly accurate. 

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
sns.set()

## Election Result Data

In [None]:
election_results = pd.read_csv("election-results.csv")
election_results.head()

We only need which party won, so while this data is interesting, we will drop the vast majority of it.

In [None]:
cols = ["state", "stateid", "called"]
election_results = election_results[cols]

# rename stateid column for ease of joining
election_results.rename(columns = {"stateid": "state_id"}, inplace=True)
election_results.head()

Maine and Nebraska split their electors, so we will color them gray in the final plot. That in mind, we will remove their split electors for ease of use with the rest of the data, which is totalled by state. Additionally, since we don't have DC in our gun ownership data, we will have to drop it from here as well.

In [None]:
is_split = election_results["state_id"].str.len() > 2
is_dc = election_results["state_id"] == "DC"
election_results = election_results[~is_split & ~is_dc]
election_results.shape

## Simplifying population data

In [None]:
population = pd.read_csv("state-population.csv")
population.head()

In [None]:
population = population.iloc[0].reset_index()
population.columns = ["state", "population"]
population = population[1:]
population.sample(5)

The population column contains commas, which mean's it's currently a column of strings. If we are going to plot it, it needs to be integers, so we can fix that.

In [None]:
population["population"] = population["population"].str.replace(",", "").astype(np.int64)
population.info()

In [None]:
population.shape

Too many rows means DC and Puerto Rico are still here. Since they aren't in our ownership data, we will have to drop them.

In [None]:
to_drop = population["state"].isin(["District of Columbia", "Puerto Rico"])
population = population[~to_drop]
population.shape

## Gun Ownership Data

In [None]:
gun_ownership = pd.read_csv("gun-ownership.csv")

# rename columns for ease of joining
gun_ownership.rename(columns = {"gunOwnership": "gun_own_rate"}, inplace=True)
gun_ownership.columns = gun_ownership.columns.str.lower()
gun_ownership.head()

We won't need the total number of guns registered per state for this analysis, so we can drop that. Note that this is the dataset that does not contain DC, hence the shape starting at 50 rows.

In [None]:
gun_ownership.drop("totalguns", axis=1, inplace=True)
gun_ownership.shape

## Gun Fatalities Data

In [None]:
gun_death = pd.read_csv("gun-deaths.csv")
gun_death.columns = gun_death.columns.str.lower()

# rename state column for ease of joining
gun_death.rename(columns = {"state": "state_id"}, inplace = True)
gun_death.head()

In [None]:
gun_death.shape

Clearly there are far too many rows here. I assume that this is because of there being multiple years, so we can find that out and clean it up. Also, we won't need the URL, RATE, or YEAR columns once we figure out why our data is so large.

In [None]:
gun_death["year"].unique()

As expected, too many years. We are working with 2020 population data, so we will keep 2020 here as well, and then drop those excess columns.

In [None]:
is_2020 = gun_death["year"] == 2020
gun_death = gun_death[is_2020]
gun_death.drop(["url", "rate", "year"], axis=1, inplace=True)
gun_death.shape

In [None]:
gun_death.columns = ["state_id", "gun_deaths"]
gun_death.head()

## Join all data into one dataframe

In [None]:
# merge population, gun ownwership, and election result data using "state" header
combined = election_results.merge(population, how="inner", on="state")
combined = combined.merge(gun_ownership, how="inner", on="state")
combined.sample(5)

In [None]:
# annotate states with split electors
combined.loc[combined["called"].isin(["ME","NE"]), "called"] = "S"

# merge gun death data with the rest using "state_id" header
combined = combined.merge(gun_death, how="inner", on="state_id")

# rename "called" column to be more descriptive
combined.rename(columns = {"called": "elec_winner"}, inplace=True)
combined

Calculating gun fatalities per 100,000 residents per the chart in question:

In [None]:
combined["deaths_per_capita"] = combined["gun_deaths"] / (combined["population"] / 100000)
combined.sample(5)

## Plot ownership/fatality correlation

In [None]:
REP = "#e94949"
DEM = "#2b90e7"
SPLIT = "#d4d4d4"
fig = plt.figure(figsize=(15,9))

sns.set(rc={
    "axes.facecolor": "#1c1c20",
    "figure.facecolor": "#1c1c20"
})

# calculate line of best fit
x = combined["gun_own_rate"]
y = combined["deaths_per_capita"]
m,b = np.polyfit(x,y,1)

# plot line of best fit
y = (m * x) + b
line = sns.lineplot(x=x ,y=y, linewidth=3, color="white", alpha=0.5)

# plot scatter
ax_rep = sns.scatterplot(
    x = "gun_own_rate",
    y = "deaths_per_capita",
    data = combined[combined["elec_winner"] == "R"],
    s = 300,
    color = REP,
    alpha = 1,
    label = "Donald Trump"
)

ax_dem = sns.scatterplot(
    x = "gun_own_rate",
    y = "deaths_per_capita",
    data = combined[combined["elec_winner"] == "D"],
    s = 300,
    color = DEM,
    alpha = 1,
    label = "Joe Biden"
)

ax_split = sns.scatterplot(
    x = "gun_own_rate",
    y = "deaths_per_capita",
    data = combined[combined["elec_winner"] == "S"],
    s = 300,
    color = SPLIT,
    alpha = 1,
    label = "Split Electors"
)

plt.title(
    "Gun Ownership vs. Gun Deaths (2020)", 
    fontsize = 35, 
    fontweight = "bold", 
    pad = 20, 
    color = "white",
    fontname = "AppleMyungjo"
)

# customize legend
legend = plt.legend(
    title = "2020 Presidential Victors", 
    fontsize=13,
    labelcolor = "white",
    frameon = False
)

# change size of marker in the legend
for handle in legend.legendHandles:
    handle.set_sizes([90])
    
plt.setp(legend.get_title(), color="white", fontsize=15)

# customize axes
ax_rep.grid(False)
ax_rep.spines[["top","right"]].set_visible(False)
xticks = ax_rep.get_xticks().tolist()
ax_rep.xaxis.set_major_locator(mtick.FixedLocator(xticks))
ax_rep.set_xticklabels(["", "20%", "30%", "40%", "50%", "60%", ""])

plt.xticks(color="white", fontsize=16)
plt.yticks(color="white", fontsize=16)
plt.ylabel("Gun Deaths per 100,000 Citizens", color="white", fontsize=16)
plt.xlabel("Percent of Population Owning Guns", color="white", fontsize=16)

plt.savefig("final.png", bbox_inches="tight")

plt.show()

## Conclusion

While our data looks different than the original, there does still seem to be a positive correlation between gun ownership rates and gun fatality rates. This does not necessarily prove causation, and many on the right argue that we should only count gun murders, not all gun fatalities. However, the data presented in the video in question was likely sound. 

The distinction between party affiliation is interesting as well, with Republican states clearly among the top in both gun ownership and fatality. The opposite extreme is held exclusively by Democratic-leaning states, while many from both parties lie towards the middle.

Overall, this was an interesting and fun project and I think the insights are valuable. My choice to neglect individual state labels came down to a matter of time, but if anyone wants to clone this notebook and label them, I would happily pull it into [the GitHub repository where this project is kept](https://github.com/MitchellHarrison/data-viz-for-social-media).

Thanks for reading! Feel free to stop by [my Twitch stream](https://twitch.tv/mitchsworkshop) where we will build more policy-focussed data visualization soon. 