# Project: Investigating FBI Gun Data and Census Data

### By: Katia Lopes-Gilbert

_July 2023_

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

For this project, I have decided to analyze and compare FBI background check data for gun purchases to Census data. I will review demographic trends in indviduals states, general trends over time for FBI background checks, and comparing the datasets to see if certain demographic characteristics are associated with higher or lower background checks. 

### High Level Overview of the Datasets
**FBI Data** 
Data from the FBI's National Instant Criminal Background Check System (NCIS) is used to determine whether a prospective buyer is eligible to buy firearms. 

> When a person tries to buy a firearm, the seller, known as a Federal Firearms Licensee (FFL), contacts NICS electronically or by phone. The prospective buyer fills out the ATF form, and the FFL relays that information to the NICS. The NICS staff performs a background check on the buyer. That background check verifies the buyer does not have a criminal record or isn't otherwise ineligible to purchase or own a firearm. Since launching in 1998, more than 300 million checks have been done, leading to more than 1.5 million denials.
>

You can read more about NCIS firearm checks [here](https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/nics). 

**Census Data**
The data has been supplemented with state level data from the Census American Community Survey (ACS). ACS data generally contains population statistics such as demographic and socioeconomic data.  

> The ACS helps local officials, community leaders, and businesses understand the changes taking place in their communities. It is the premier source for detailed population and housing information about our nation.
>

You can read more about Census ACS data [here](https://www.census.gov/programs-surveys/acs). 

### Notes about the NCIS Data
The original PDF of the NCIS firearm checks data contains important notes and caveats. It's a good idea to read those first before diving into the data. Among the caveats is this important one — emphasis added:

> These statistics represent the number of firearm background checks initiated through the NICS. They do not represent the number of firearms sold. Based on varying state laws and purchase scenarios, a one-to-one correlation cannot be made between a firearm background check and a firearm sale.
>

### Downloading the Data

FBI NCIS Background Check Dataset: [**click here to download the FBI data.**](https://www.google.com/url?q=https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a0a4db8_gun-data/gun-data.xlsx&sa=D&source=editors&ust=1690055166882138&usg=AOvVaw2NXkKn2SPaQBva5xby3KAN)

Census ACS Dataset: [**click here to download the Census data.**](https://www.google.com/url?q=https://d17h27t6h515a5.cloudfront.net/topher/2017/November/5a0a554c_u.s.-census-data/u.s.-census-data.csv&sa=D&source=editors&ust=1690055166882803&usg=AOvVaw3oW2CN2wlOxBkVvLM8L5M5)

✅ Great, now that you have downloaded the datasets, let's get started! 🚀

## Questions We Will Explore 🔎

I have chosen to explore each dataset individually as well as conduct some comparative analysis to look at any interesting trends and correlation. 

**📓 Census Data Analysis Questions:**
1. Which states had the most population growth from 2010 to 2016?
2. What is the distribution of race or ethnicity for each state?
3. Are poverty levels correlated with specific demogrpahic factors like race, ethnicity, foreign born individuals, and educational levels?

**📓 FBI Background Check Data Analysis Questions:**
1. What are the overall trends overtime for firearm background checks for handguns and long guns?
2. What time of the year is the most common for background checks?
3. Do certain states have higher numbers of handguns? What about long guns?

**📓 Comparing the Datasets:**
1. Which states have the highest firearm checks per capita?
2. Are certain demographic characteristics correlated with more background checks?

I will answer these questions by providing visualizations and comments. 📊

<a id='wrangling'></a>
## Data Wrangling

I conducted my initial analysis in a separate Jupyter Notebook file called [Initial Analysis](https://github.com/dezertdweller/fbi-gun-background-checks/blob/main/initial-analysis.ipynb). Please refer to this document for a more in-depth review of the operations I performed to understand each dataset and what errors would need to be corrected before further analysis.

I decided to write Python scripts for the cleaning and transformation of both datasets. I will provide summaries of the operations I performed below. However, please refer to the respective documents for each dataset to see how I cleaned and transformed both. 

----

### 📋 FBI Data Overview
I was most interested in how to use the FBI Firearm Background Check data to estimate number of firearms purchased in each state. While analyzing the data, I read several papers from NCIS and independent researchers to better understand what each column signified in this dataset and which ones would be best for further analysis.

The FBI Firearm Background Check data contains 12485 records and 27 columns. Below is a description of what the first 7 columns represents:
1. _Permit_: any permit check
2. _Permit Recheck_: any permit recheck
3. _Handgun_: permit check for any firearm which has a short stock and is designed to be held and fired by the use of a single hand; and (b) any combination of parts from which a firearm described in paragraph (a) can be assembled.
4. _Long Gun_: permit check for a weapon designed or redesigned, made or remade, and intended to be fired from the shoulder, and designed or redesigned and made or remade to use the energy of the explosive in (a) a fixed metallic cartridge to fire a single projectile through a rifled bore for each single pull of the trigger; or (b) a fixed shotgun shell to fire through a smooth bore either a number of ball shot or a single projectile for each single pull of the trigger.
5. _Other_: permit check for frames, receivers, and other firearms that are neither handguns nor long guns (rifles or shotguns), such as firearms having a pistol grip that expel a shotgun shell, or National Firearms Act firearms, including silencers.
6. _Multiple_: permit check for when an individual is interested in more than one type of firearm (i.e. handgun and a long gun)
7. _Admin_: the administrative checks that are for other authorized uses of the NICS

Additional columns include handgun permits, long gun permits, and other permits for six specific categories including pre-pawn, redemption, returned/diposition, rentals, private sale, and return to seller-private sale. Definitions are included below about what each of these categories mean:
1. _Pre-Pawn Handgun_: background checks requested by an officially-licensed FFL on prospective firearm transferees seeking to pledge or pawn a firearm as security for the payment or repayment of money, prior to actually pledging or pawning the firearm.
2. _Redemption_: background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to regain possession of a firearm after pledging or pawning a firearm as security at a pawn shop.
3. _Returned/Disposition_: background checks requested by criminal justice/law enforcement agencies prior to returning a firearm in its possession to the respective transferee, to ensure the individual is not prohibited.
4. _Rentals_: background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to possess a firearm when the firearm is loaned or rented for use off the premises of the business.
5. _Private Sale_: background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to possess a firearm from a private party seller who is not an officially-licensed FFL.
6. _Return to Seller-Private Sale_: background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to possess a firearm from a private party seller who is not an officially-licensed FFL.

After reading more about the data collected from NCIS, I decided to eliminate the permit and permit recheck columns. Although these provide data about the overall production of NCIS when it comes to running firearm background checks, these data are useless for estimating number of firearms purchased in certain states. A research paper from 2013 titled _The US Firearms Industry: Production and Supply_ by Jurgen Brauer describes why using several examples below:

> _For example, from November 1998 to February 2012 NICS recorded ten million so-called ‘permit’ checks for the state of Kentucky. For the same state it also recorded more than one million addi- tional ‘handgun’ checks and 1.6 million ‘long gun’ checks. A ‘permit’ refers to a firearms-carrying licence issued by the state of Kentucky. The state checks monthly whether any of its permit holders may no longer be eligible for gun ownership, e.g. as a result of having committed a felony. Thus, Kentucky’s permit checks amount to continued eligibility checks that are wholly unrelated to a prospective customer’s intent to purchase a firearm from a licensed dealer. Similarly, Utah’s permits are checked every 90 days against FBI records. Each state maintains its own rules regarding the frequency, if any, with which its issued permits are checked against FBI records._
>

I have also decided to remove the following columns because they are unrelated to a prospective gun purchase: admin, prepawn_handgun, prepawn_long_gun, prepawn_other, redemption_handgun, redemption_long_gun, redemption_other, returned_handgun, returned_long_gun, returned_other, rentals_handgun, and rentals_long_gun.

Finally, I removed the 'totals' column because this no longer represented totals based on the data I was leaving in the dataset. 

#### Summary of FBI Data Cleaning 🫧🧽
Overall, this dataset was very clean and didn't have errors. However, to conduct further analysis of the FBI data, I would need to perform the following sets to clean the data:
* Convert the excel file to a csv file.
* Convert the month column into a datetime column so this can be used for further analysis.
* Create a new totals column based on the fields I am keeping.
* Fill the null values with '0' so that these can be analyzed.

I also wanted to simplify my comparative analysis to the Census dataset by removing geographies from the FBI data that were not present in the Census data. I removed records for the following geographic areas: District of Columbia, Guam, Mariana Islands, Puerto Rico, and Virgin Islands. 

📁 [FBI Cleaning Script](https://github.com/dezertdweller/fbi-gun-background-checks/blob/main/fbi-cleaning.py)

---

### 📋 Census Data Overview
The Census ACS data contains 85 records and 52 columns. In reviewing how the data was presented, most columns represented individual states and each state had specific facts. I knew I would want to transpose the data in long format to make it easier to analyze later. Additionally, the Census data would require much more cleaning prior to doing this transformation.

#### Summary of Census Data Cleaning 🫧🧽
The Census data had several irregularities that would need to be addressed. Below I have summarized what I would need to do to clean this dataset:
1. Remove symbols from the data including commas, percentages, and dollar signs.
2. Convert the columns with percentages into their decimal values.
3. Delete rows 64 through 80 because they mostly contain null values that were irrelevant for my analysis:
    64-FIPS Code, 
    65-NaN, 
    66-NOTE: FIPS Code values are enclosed in quotes ...,
    67-Value Notes,
    68-1,
    69-Fact Notes,
    70-(a),
    71-(b),
    72-(c),
    73-Value Flags,
    74--,
    75-D,
    76-F,
    77-FN,
    78-S,
    79-X,
    80-Z

4. Replace all null values with 0.
5. Convert all of the data for each state into numerical values for analysis. This would not apply to the information in the 'Fact' column.
6. Transpose the data so that the columns are organized for each Fact and State would be a new column with each state as a string value.
7. Shorten the names for the columns to make the dataset easier to read and digest.

After cleaning and transforming the Census data, I realized that there were several columns I was not interested in for my analysis. Since most of the demographic data was for 2010 or 2016, I removed data that was for other years (ex: Total retail sales, 2012). I left most of the data that was for 5-year estimates, as this is pretty standard for how ACS collects data over time. Additionally, I was not particularly interested in the data about housing units, mortgages, or firms, so I removed these as well. 

Since the Census data required significantly more work to clean and transform the data, I saved the cleaning and tranformation as two separate documents:

📁 [Census Cleaning Script](https://github.com/dezertdweller/fbi-gun-background-checks/blob/main/census-cleaning.py)

📁 [Census Transformation Script](https://github.com/dezertdweller/fbi-gun-background-checks/blob/main/census-transformation.py)

In [None]:
# Load your data and print out a few lines. Perform operations to inspect data
#   types and look for instances of missing or possibly errant data.


> **Tip**: You should _not_ perform too many operations in each cell. Create cells freely to explore your data. One option that you can take with this project is to do a lot of explorations in an initial notebook. These don't have to be organized, but make sure you use enough comments to understand the purpose of each code cell. Then, after you're done with your analysis, create a duplicate notebook where you will trim the excess and organize your steps so that you have a flowing, cohesive report.

> **Tip**: Make sure that you keep your reader informed on the steps that you are taking in your investigation. Follow every code cell, or every set of related code cells, with a markdown cell to describe to the reader what was found in the preceding cell(s). Try to make it so that the reader can then understand what they will be seeing in the following cell(s).

### Data Cleaning (Replace this with more specific notes!)

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!