# PS 88, Fall 2021
# Lab 1: Cross-Border Spillover: U.S. Gun Laws and Violence in Mexico

The goal of this lab is to give you a taste of how we can use Jupyter notebooks to replicate some of the graphs from this week's lecture. Doing so requires using some techniques that you will learn later in the semester, so **do not worry if a lot of the code doesn't make sense yet!!** We are going to "toss you in the deep end" here, but with a good floatation device.

More specifically, we are going to write almost all of the code for you in this lab, but at a few points you will have to fill in some gaps. As the class progresses you will gain more knowledge to write code from scratch. 

In particular, we'll reproduce Figure 4 from [this paper](https://omargarciaponce.com/wp-content/uploads/2013/07/cross_border_spillover.pdf). The goals of this figure is to see how the expiration of an assault weapons ban in 2004 in three states which border Mexico but **not** California affected gun-related crime in nearby Mexican Municipios. 

To begin, we need to load in the data. In the cell below, we load in our data, a table containing 6 columns:  
1) `NCAseg1`: a variable equal to 1 for Municipios adjacent to Texas, Arizona, or New Mexico, and 0 for those adjacent to California.

2) `year`: the year that the crimes occurred  

3) `homicide`: the number of total homicides in the municipios for that `year` and `NCAseg1` value  

4) `homdguns`: the number of gun-related homicides in the municipios for that `year` and `NCAseg1` value  

5) `nongunhom`: the number of homicides unrelated to guns in the municipios for that `year` and `NCAseg1` value  

6) `suicdguns`: the number of suicides by guns in the municipios for that `year` and `NCAseg1` value  

Don't worry about the details here, though we do provide some information about what is going on in the comments.

In [None]:
# Importing libraries we will use
# Lots of labs will start with cells that look like this
# Make sure to run them first!
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Reading in the table and storing it as "df"
# The data are stored in a file on datahub, and the pd_read_csv function
# imports it
df = pd.read_csv('data/nca_deaths.csv')
# Displaying the table
df

For example, this table tells us that in the Municipios adjacent to California in 2005, there were 385 total homicides, 246 of which involved guns.

In the following cell, we create a *line plot* for the total homicide count for municipios near either California (in which case `NCAseg18` is 0) or Texas, Arizona, or New Mexico (in which `NCAseg18` is 1). Again, we will discuss graphs like this later in the semester, but the short version is that this will plot the trends in homicides for these two groups of Municipios separately so we can compare the trajectory before and after the ban expired.

We also run code to create a plot title, legend, and line at 2004, the year Californian gun control legislation passed. What happened to the total number of homicides in municipios adjacent to California compared to those near the 3 other states?

In [None]:
# Plotting homicides per year for municipios adjacent to CA
sns.lineplot('year', 'homicide', data=df[df['NCAseg18']==0], label='Adjacent to CA')
# Plotting homicides per year for municipios not adjacent to CA
sns.lineplot('year', 'homicide', data=df[df['NCAseg18']==1], label='Adjacent to TX, AZ, or NM')
#Setting the y axis to go from 0 to 700
plt.ylim(0,700)
# Adding a title to the plot
plt.title("Homicides")
# Adding a legend to the plot
plt.legend()
# Making a vertical line when the ban expired
plt.axvline('2004-01-01', color='black', linestyle='--')

We are going to create some more graphs like this, and to reduce the amount of code we can create a function which sets the y axis limits, and adds a plot title/legend/dashed line at the policy change.

In [None]:
# Create a function which formats the graph as above. 
# As we will cover later in the class, we are defining a function
# with one "argument", which is the title to put on top of the graph
def formatting(title):
    plt.ylim(0,700)
    plt.title(title)
    plt.legend()
    plt.axvline('2004-01-01', color='black', linestyle='--')

Let's reproduce the previous graph with this shortcut

In [None]:
# Plotting homicides per year for municipios adjacent to CA
sns.lineplot('year', 'homicide', data=df[df['NCAseg18']==0], label='Adjacent to CA')
# Plotting homicides per year for municipios not adjacent to CA
sns.lineplot('year', 'homicide', data=df[df['NCAseg18']==1], label='Adjacent to TX, AZ, or NM')
# All the formatting in one fell swoop
formatting("Homicides")

A first thing to notice here is that there are always more homicides in the Municipios which border TX, AZ, and NM. This could reflect a higher population, or that these areas are more violent in general. More important for our purposes, this gap seems was shrinking a bit leading up to the experation of the ban in 2004, after which is started increasing quite a bit. This provides some initial evidence that making it easier to access assault weapons increased the amount of violence in the states adjacent to TX/AZ/NM.  

There are lots of ways that we can modify this code to produce different graphs. Let's suppose we want to restrict to gun-related homicides (as the authors do in the paper). Even without knowing exactly how the `sns.lineplot` function works, remember that `'homicide'` to the column in our table which has the information about the number of homicides. And `'homdguns'` refers to the number of gun-related homicides. So, let's see what happens if we run the same code as above but replace both instances of `'homicides'` with `'homdguns'`. Note that we want to put a single quotation mark around the variable names here; we'll learn more about why later in the semester.

We have to do this twice because we want to plot gun-related homicides for both groups of Municipios. We also change the title to "Gun-related Homicides" to reflect this.

In [None]:
sns.lineplot('year', 'homdguns', data=df[df['NCAseg18']==0], label='Adjacent to CA')
sns.lineplot('year', 'homdguns', data=df[df['NCAseg18']==1], label='Adjacent to TX, AZ, or NM')
formatting('Gun-related Homicides')

The modified code produces a graph which plots gun-related homicides in these two groups of Municipios.

This general idea of "pattern matching" -- or, taking some code which works to do one thing and modifying it to do something else -- will be a big part of what we ask you to do in labs. It is also a very common strategy to accomplish real data science tasks outside of class (though typically you will have much less guidance!)

Back to the political question here, let's interpret the difference between these graphs. Here and in later labs, we will ask questions you need to answer for credit in **bold text**, followed by a markdown cell where you put your answer.

**Question 1. How does this graph compare to the previous one (which included all homicides)?**

*Your answer to Question 1 here*

Now let's see if you can do a different code modification to produce a new graph.

For questions where we ask you to write code, we will always provide a code cell with a comment indicating it is where the answer should go. Sometimes, as in the following question, we will write some of the code for you, and often use "..." to indicate where you need to change things to provide an answer.

**Question 2. Use the same idea of pattern-matching to plot homicides that aren't gun related. [Hint, the column that has this information is titled `'nongunhom'`]**

In [None]:
# For question 2, modify the code here to plot non-gun related homicides
sns.lineplot('year', ..., data=df[df['NCAseg18']==0], label='Adjacent to CA')
sns.lineplot('year', ..., data=df[df['NCAseg18']==1], label='Adjacent to TX, AZ, or NM')
formatting('Non-gun Homicides')

Nice job!

**Question 3. What does this graph tell us about how non-gun homicides changed in these two groups of Municipios after the ban expired? What might this tell us about the effect of the ban expiring?**

*Answer to Question 3 here*

There are many other things we could do to produce different graphs even with this small data set. Here are some possibilities (some are more challenging than others!):
- Change the label of the orange line to "Not adjacent to CA"
- Plot the number of suicides by gun in these two groups of Municipios
- Plot the number of non-gun homicides and gun homicides in the Municipios adjacent to CA on the same graph (without the other Municipios)
- Move the vertical line from 2004 to 2005
- Plot the *difference* in gun-related homicides between the Municipios adjacent to CA and those not adjacent to CA across these years.

**Question 4 (OPTIONAL). See if you can do one or more of these in the code cells below**

In [None]:
# Code for Question 4 here

*Words for Question 4 here*

Thats all! To submit, go to file->Download As->PDF via LaTex.

After that, go to bcourses-> PS 88-> Scroll down till you find the "Gradescope" Link on the left

This will take you to Gradescope, where you will see a column for Assignments. Clicking on that will show you Lab 1 assignment already created. You need to click on that and use the tab at the bottom of the page to "Upload Submission" to submit your pdf file. When you submit your assignment, Gradescope will ask you to mark where each answer is, according to an outline of the problem set. **Remember to assign pages to questions before submitting.**  Do that accurately before submitting your assignment.