Kate Morics
# Project: Investigating the Relationship Between State Demographics and Criminal Background Checks to Determine Eligibility for Firearm Purchases


## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

The data used in this analysis comes from two datasets. The first, from the FBI's National Instant Criminal Background Check System (NICS), contains data representing the background checks performed on gun purchasers broken down by state, date, type of gun, and other variables. The dataset used in this analysis includes checks performed from November 1998 to November 2017. 

The second dataset comes from the US Census from 2016. It provides state-level population and demographic data. In this project, I will be using these two datasets to investigate the relationship between a state's demographic information and the number of criminal background checks run in that state. I will be using the Python package pandas in order to gather, clean, and analyze the data, and will be using Matplotlib to create visualizations to aid in my analysis. I will then investigate two specific research questions in order to focus my analysis of these large datasets.

In [63]:
import pandas as pd
%matplotlib inline

<a id='wrangling'></a>
## Data Wrangling

In the following section, I load both of the datasets, which are currently in the form of .csv files. I will load the NCIS data as `df_gun`, and the US Census data as `df_data`. I will then use the `.head()` function to display the top few rows of each dataset in order to get a quick snapshot of the data. Then, I will use the `.info()` function to get a clearer picture of the structure of the data. This will allow us to identify any irregularities that we need to fix before moving on to analyzing the data.

### General Properties

In [64]:
#view structure of gun data
df_gun = pd.read_csv('gun_data.csv')

df_gun.head()

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09,Alabama,16717.0,0.0,5734.0,6320.0,221.0,317,0.0,15.0,...,0.0,0.0,0.0,9.0,16.0,3.0,0.0,0.0,3.0,32019
1,2017-09,Alaska,209.0,2.0,2320.0,2930.0,219.0,160,0.0,5.0,...,0.0,0.0,0.0,17.0,24.0,1.0,0.0,0.0,0.0,6303
2,2017-09,Arizona,5069.0,382.0,11063.0,7946.0,920.0,631,0.0,13.0,...,0.0,0.0,0.0,38.0,12.0,2.0,0.0,0.0,0.0,28394
3,2017-09,Arkansas,2935.0,632.0,4347.0,6063.0,165.0,366,51.0,12.0,...,0.0,0.0,0.0,13.0,23.0,0.0,0.0,2.0,1.0,17747
4,2017-09,California,57839.0,0.0,37165.0,24581.0,2984.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,123506


In [65]:
#view info to check for missing data in gun data
df_gun.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12485 entries, 0 to 12484
Data columns (total 27 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   month                      12485 non-null  object 
 1   state                      12485 non-null  object 
 2   permit                     12461 non-null  float64
 3   permit_recheck             1100 non-null   float64
 4   handgun                    12465 non-null  float64
 5   long_gun                   12466 non-null  float64
 6   other                      5500 non-null   float64
 7   multiple                   12485 non-null  int64  
 8   admin                      12462 non-null  float64
 9   prepawn_handgun            10542 non-null  float64
 10  prepawn_long_gun           10540 non-null  float64
 11  prepawn_other              5115 non-null   float64
 12  redemption_handgun         10545 non-null  float64
 13  redemption_long_gun        10544 non-null  flo

In [66]:
#view datatypes of gun data
df_gun.dtypes

month                         object
state                         object
permit                       float64
permit_recheck               float64
handgun                      float64
long_gun                     float64
other                        float64
multiple                       int64
admin                        float64
prepawn_handgun              float64
prepawn_long_gun             float64
prepawn_other                float64
redemption_handgun           float64
redemption_long_gun          float64
redemption_other             float64
returned_handgun             float64
returned_long_gun            float64
returned_other               float64
rentals_handgun              float64
rentals_long_gun             float64
private_sale_handgun         float64
private_sale_long_gun        float64
private_sale_other           float64
return_to_seller_handgun     float64
return_to_seller_long_gun    float64
return_to_seller_other       float64
totals                         int64
d

In [67]:
#Check for duplicate rows
df_gun.duplicated().sum()

0

In [68]:
#view structure of census data
df_census = pd.read_csv('census_data.csv')
df_census.head()

Unnamed: 0,Fact,Fact Note,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,"Population estimates, July 1, 2016, (V2016)",,4863300,741894,6931071,2988248,39250017,5540545,3576452,952065,...,865454.0,6651194.0,27862596,3051217,624594,8411808,7288000,1831102,5778708,585501
1,"Population estimates base, April 1, 2010, (V2...",,4780131,710249,6392301,2916025,37254522,5029324,3574114,897936,...,814195.0,6346298.0,25146100,2763888,625741,8001041,6724545,1853011,5687289,563767
2,"Population, percent change - April 1, 2010 (es...",,1.70%,4.50%,8.40%,2.50%,5.40%,10.20%,0.10%,6.00%,...,0.063,0.048,10.80%,10.40%,-0.20%,5.10%,8.40%,-1.20%,1.60%,3.90%
3,"Population, Census, April 1, 2010",,4779736,710231,6392017,2915918,37253956,5029196,3574097,897934,...,814180.0,6346105.0,25145561,2763885,625741,8001024,6724540,1852994,5686986,563626
4,"Persons under 5 years, percent, July 1, 2016, ...",,6.00%,7.30%,6.30%,6.40%,6.30%,6.10%,5.20%,5.80%,...,0.071,0.061,7.20%,8.30%,4.90%,6.10%,6.20%,5.50%,5.80%,6.50%


In [69]:
#transpose census data so the states data is organized in rows as in the gun data.
df_census = df_census.transpose()
df_census.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,55,56,57,58,59,60,61,62,63,64
Fact,"Population estimates, July 1, 2016, (V2016)","Population estimates base, April 1, 2010, (V2...","Population, percent change - April 1, 2010 (es...","Population, Census, April 1, 2010","Persons under 5 years, percent, July 1, 2016, ...","Persons under 5 years, percent, April 1, 2010","Persons under 18 years, percent, July 1, 2016,...","Persons under 18 years, percent, April 1, 2010","Persons 65 years and over, percent, July 1, 2...","Persons 65 years and over, percent, April 1, 2010",...,"All firms, 2012","Men-owned firms, 2012","Women-owned firms, 2012","Minority-owned firms, 2012","Nonminority-owned firms, 2012","Veteran-owned firms, 2012","Nonveteran-owned firms, 2012","Population per square mile, 2010","Land area in square miles, 2010",FIPS Code
Fact Note,,,,,,,,,,,...,,,,,,,,,,
Alabama,4863300,4780131,1.70%,4779736,6.00%,6.40%,22.60%,23.70%,16.10%,13.80%,...,374153,203604,137630,92219,272651,41943,316984,94.4,50645.33,"""01"""
Alaska,741894,710249,4.50%,710231,7.30%,7.60%,25.20%,26.40%,10.40%,7.70%,...,68032,35402,22141,13688,51147,7953,56091,1.2,570640.95,"""02"""
Arizona,6931071,6392301,8.40%,6392017,6.30%,7.10%,23.50%,25.50%,16.90%,13.80%,...,499926,245243,182425,135313,344981,46780,427582,56.3,113594.08,"""04"""


In [70]:
#find datatypes in newly transposed census dataframe
df_census.dtypes

0     object
1     object
2     object
3     object
4     object
       ...  
60    object
61    object
62    object
63    object
64    object
Length: 65, dtype: object

#### Gun Data Summary
The gun data seems to be in fairly good shape. The column names are all labelled according to Python convention using all lowercase letters and no spaces. There are also no duplicate rows. However, 

There are quite a few null values in the dataframe. I will convert these to zeroes since having this missing data is not necessarily unexpected in this type of data. 

#### Census Data Summary
The census data contains a lot of information, not all of which will be necessary for our analysis. We will first need to trim the data so only the columns which contain relevant information are included. We will then need to change the titles of the columns. The "Fact" row would work better as a title row, and we should edit the titles to conform to Python convention by using all lower case letters and underscores instead of spaces.

Finally, because we will be comparing numerical values to perform our analysis, we will need to convert the `str` values to datatype `float`.

### Cleaning the Gun Data 

#### Replacing Null Values with Zeroes

In [84]:
#Use fillna function to replace null values with zeroes
df_gun = df_gun.fillna(0)

In [85]:
#Confirm changes
df_gun.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 24970 entries, 0 to 12484
Data columns (total 25 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   permit                     24970 non-null  float64
 1   permit_recheck             24970 non-null  float64
 2   handgun                    24970 non-null  float64
 3   long_gun                   24970 non-null  float64
 4   other                      24970 non-null  float64
 5   multiple                   24970 non-null  int64  
 6   admin                      24970 non-null  float64
 7   prepawn_handgun            24970 non-null  float64
 8   prepawn_long_gun           24970 non-null  float64
 9   prepawn_other              24970 non-null  float64
 10  redemption_handgun         24970 non-null  float64
 11  redemption_long_gun        24970 non-null  float64
 12  redemption_other           24970 non-null  float64
 13  returned_handgun           24970 non-null  flo

#### Fixing Datatypes

In [93]:
#Create new dataframe to get numerical columns

KeyError: "['month' 'state'] not found in axis"

In [87]:
#Convert numerical dataframe to int

In [88]:
#Drop old float values from original gun dataset


KeyError: "['month' 'state'] not found in axis"

In [95]:
#Add new int columns to gun dataset


In [94]:
#Check to confirm datatype changes
df_gun.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 37455 entries, 0 to 12484
Data columns (total 25 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   permit                     37455 non-null  float64
 1   permit_recheck             37455 non-null  float64
 2   handgun                    37455 non-null  float64
 3   long_gun                   37455 non-null  float64
 4   other                      37455 non-null  float64
 5   multiple                   37455 non-null  int64  
 6   admin                      37455 non-null  float64
 7   prepawn_handgun            37455 non-null  float64
 8   prepawn_long_gun           37455 non-null  float64
 9   prepawn_other              37455 non-null  float64
 10  redemption_handgun         37455 non-null  float64
 11  redemption_long_gun        37455 non-null  float64
 12  redemption_other           37455 non-null  float64
 13  returned_handgun           37455 non-null  flo

### Cleaning the Census Data

#### Trimming Irrelevant Columns

In [78]:
#Find columns that contain the % symbol


#### Fixing Column Headers

#### Fixing Datatypes

<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [79]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [80]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!