# Arrests

## Introduction
For this project, I looked for the best neighborhood in Pittsburgh through analyzing police arrest data. I chose this data set because arrests and the safety of a neighborhood go hand in hand, with the city having the least arrests being safest.

## The Metric
My metric was the number of arrests in a neighborhood. I wanted to find the neighborhood with the least number of arrests. I used data from https://data.wprdc.org/dataset/arrest-data/resource/e03a89dd-134a-4ee8-a2bd-62c40aeebc6f, which has information on each arrest made in Pittsburgh.

## The Best Neighborhood
To find the best neighborhood based on arrest data, I first had to read the information in the dataset.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

#store the dataset into a variable
arrest_data = pd.read_csv("arrests.csv")
#view the data set
arrest_data

I believe Mt. Oliver Neighborhood is the best neighborhood because it has the least number of arrests.
Evindently, we do not need all of the columns to properly interpret our dataset. To remove them, I wrote the following code:

In [None]:
#only include the column with details on the neighborhood where the incident occurred
arrest_data = arrest_data.filter(items=["INCIDENTNEIGHBORHOOD"])

#display the resulting DataFrame
arrest_data

We have each individual arrest but we still need to group them together by neighborhood so we can compare these neighborhoods

In [None]:
#gets the data series which has the number of times each neighborhood showed up in this data set
#also sorts it in ascending order to see which neighborhood has the least number of arrests at the top
arrests_by_neighborhood = arrest_data.value_counts(ascending=True)
#name the column with the number of arrests "arrests"
arrests_by_neighborhood = pd.DataFrame(arrests_by_neighborhood, columns=["arrests"])
#view the newly labeled data set
arrests_by_neighborhood
#convert to dictionary for future calculations
arrests_dict = arrests_by_neighborhood.to_dict()

We can see that there are certain cities with much less arrests than others. By visualizing these findings, it's clear how significant this difference is.

In [None]:
#use a bar plot to visualize arrests by each neighborhood
arrests_by_neighborhood.plot.bar()

That's a pretty crowded x-axis, filtering out the neighborhoods with more arrests would help make this more readable.

In [None]:
#only include the values below 1500
arrests_by_neighborhood_mask = arrests_by_neighborhood["arrests"] < 1500
arrests_by_neighborhood[arrests_by_neighborhood_mask].plot.bar()

That's still too crowded.

In [None]:
#only include the values below 200
arrests_by_neighborhood_mask = arrests_by_neighborhood["arrests"] < 200
arrests_by_neighborhood[arrests_by_neighborhood_mask].plot.bar()

We can clearly see that the number of incidents per neighborhood varies widely.

Based on this data, it's clear that Mt. Oliver Neighborhood is the best neighborhood in Pittsburgh because it has the lowest number of arrests.

## Conclusion
Looking at these results, it appears that Mt. Oliver Neighborhood is the best neighborhood in Pittsburgh, as I predicted earlier. I believe that safety of a neighborhood is very important in determining if it is the best, and arrest rates are an accurate way of quantifying this. However, not all people arrested are guilty; some arrests are unwarranted, meaning that this data set could be biased.