# Project: Investigate the FBI Gun Data

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
    <ul>
    <li><a href="#gun_explore">Gun Data Exploration</a></li>
    <li><a href="#census_explore">Census Data Exploration</a></li>
    </ul>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

Two datasets are taken into consideration for this project; the _FBI's National Instant Criminal Background Check System_ and the _U.S. Census Data_.

#### The the FBI's National Instant Criminal Background Check System 

This ([here](https://www.google.com/url?q=https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a0a4db8_gun-data/gun-data.xlsx&sa=D&ust=1532469042127000)) database was provided as an .xlsx file and it has been converted to .csv from Excel. Therefore the starting data is found in the file **gun_data.csv**.
It goes from Novembre 1998 to September 2017, collected monthly (data up to 2020 are available, but the census data here would not cover that period)

Rows: US States

Columns:
 - Every time gun shops are requested a transaction, they need to fill a form ([this](https://www.fbi.gov/services/cjis/nics)). This may result in
   - a permit check or recheck (if the buyer already owns a permit)
   - or in a background check. The background check is subdivided by
     - the type of arm: handgun, long gun, other, multiple or admin
     - the purpose: new buy (not written in the column title), pre-pawn, redemption, returned/disposition, rentals, private sale, return to seller 
 - The first two columns are for the month and the State the data are about
 - Last column is the total number of transactions/checks initiated. __Not neccessarily they have been concluded__.

NB: a lot of transactions are concluded privately or illegally and therefore there is no record about.
 

#### The _U.S. Census Data 

This database ([here](https://www.google.com/url?q=https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a0a554c_u.s.-census-data/u.s.-census-data.csv&sa=D&ust=1532469042128000)), which was already in .csv format, had the name changed in **us_census_data.csv** to be more Python friendly.
 - This dataset contains information from the 2010 census and estimates of its changes up to 2016 ([here](https://www.census.gov/quickfacts/fact/note/US/PST120218) some explanation)
 - each column represents a US State
 - Each rows a "fact" (e.g.: "Population, Census, April 1, 2010", "Population, percent change - April 1, 2010 (estimates base) to July 1, 2016")

In [28]:
# import statements

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


<a id='wrangling'></a>
## Data Wrangling

> **Tip**: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

### General Properties

<a id='gun_explore'></a>
#### Gun data exploration

In [29]:
# Load data and print out a few lines

df_gun = pd.read_csv('gun_data.csv')
df_gun.head()

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09,Alabama,16717.0,0.0,5734.0,6320.0,221.0,317,0.0,15.0,...,0.0,0.0,0.0,9.0,16.0,3.0,0.0,0.0,3.0,32019
1,2017-09,Alaska,209.0,2.0,2320.0,2930.0,219.0,160,0.0,5.0,...,0.0,0.0,0.0,17.0,24.0,1.0,0.0,0.0,0.0,6303
2,2017-09,Arizona,5069.0,382.0,11063.0,7946.0,920.0,631,0.0,13.0,...,0.0,0.0,0.0,38.0,12.0,2.0,0.0,0.0,0.0,28394
3,2017-09,Arkansas,2935.0,632.0,4347.0,6063.0,165.0,366,51.0,12.0,...,0.0,0.0,0.0,13.0,23.0,0.0,0.0,2.0,1.0,17747
4,2017-09,California,57839.0,0.0,37165.0,24581.0,2984.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,123506


In [30]:
# a little general exploration
df_gun.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12485 entries, 0 to 12484
Data columns (total 27 columns):
month                        12485 non-null object
state                        12485 non-null object
permit                       12461 non-null float64
permit_recheck               1100 non-null float64
handgun                      12465 non-null float64
long_gun                     12466 non-null float64
other                        5500 non-null float64
multiple                     12485 non-null int64
admin                        12462 non-null float64
prepawn_handgun              10542 non-null float64
prepawn_long_gun             10540 non-null float64
prepawn_other                5115 non-null float64
redemption_handgun           10545 non-null float64
redemption_long_gun          10544 non-null float64
redemption_other             5115 non-null float64
returned_handgun             2200 non-null float64
returned_long_gun            2145 non-null float64
returned_other   

##### General
There are 12485 entries and 27 columns.

0.0 are seen as nulls.

3 columns don't contain nulls: _month_, _state_ and _totals_

##### Datatypes
_Month_ and _state_ are string.
All other columns, but _multiple_ and _totals_ are floats.
_Multiple_ and _totals_ are integers.

- _State_ is OK as string,
- _month_ could be transformed in datetime,
- all others could be integers.

<a id='census_explore'></a>
#### Census Data exploration

In [31]:
# import census data
df_census = pd.read_csv('us_census_data.csv')
df_census.head()

Unnamed: 0,Fact,Fact Note,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,"Population estimates, July 1, 2016, (V2016)",,4863300,741894,6931071,2988248,39250017,5540545,3576452,952065,...,865454.0,6651194.0,27862596,3051217,624594,8411808,7288000,1831102,5778708,585501
1,"Population estimates base, April 1, 2010, (V2...",,4780131,710249,6392301,2916025,37254522,5029324,3574114,897936,...,814195.0,6346298.0,25146100,2763888,625741,8001041,6724545,1853011,5687289,563767
2,"Population, percent change - April 1, 2010 (es...",,1.70%,4.50%,8.40%,2.50%,5.40%,10.20%,0.10%,6.00%,...,0.063,0.048,10.80%,10.40%,-0.20%,5.10%,8.40%,-1.20%,1.60%,3.90%
3,"Population, Census, April 1, 2010",,4779736,710231,6392017,2915918,37253956,5029196,3574097,897934,...,814180.0,6346105.0,25145561,2763885,625741,8001024,6724540,1852994,5686986,563626
4,"Persons under 5 years, percent, July 1, 2016, ...",,6.00%,7.30%,6.30%,6.40%,6.30%,6.10%,5.20%,5.80%,...,0.071,0.061,7.20%,8.30%,4.90%,6.10%,6.20%,5.50%,5.80%,6.50%


In [53]:
l = df_census['Fact']
l[3]

'Population, Census, April 1, 2010'

In [25]:
# change month from string to datetime (it adds the day, but it is no problem)
df_gun['month'] = pd.to_datetime(df_gun['month'])

# change other columns to int
df_gun.fillna(0, inplace=True) # replace nulls, otherwise astype won't work
df_gun.iloc[:,2:26] = df_gun.iloc[:,2:26].astype('int')


In [27]:
df_gun.head()

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09-01,Alabama,16717,0,5734,6320,221,317,0,15,...,0,0,0,9,16,3,0,0,3,32019
1,2017-09-01,Alaska,209,2,2320,2930,219,160,0,5,...,0,0,0,17,24,1,0,0,0,6303
2,2017-09-01,Arizona,5069,382,11063,7946,920,631,0,13,...,0,0,0,38,12,2,0,0,0,28394
3,2017-09-01,Arkansas,2935,632,4347,6063,165,366,51,12,...,0,0,0,13,23,0,0,2,1,17747
4,2017-09-01,California,57839,0,37165,24581,2984,0,0,0,...,0,0,0,0,0,0,0,0,0,123506


### Data Cleaning (Replace this with more specific notes!)

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!