# Introduction

Ever since I was a little boy, I have been interested in the factors families decide on when selecting a house. That being said, this growing interest has secretly started to creep on me as I got older, especially for life after college. When deciding where to live, families usually choose a few factors: school, safety, and overall location. That being said, I decided to focus on one of the main factors: how safe the neighborhood is. Furthermore, I chose a dataset that involves the number of arrests per neighborhood in a period of seven years.

In [None]:

import pandas as pd
import geopandas 
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

# My Metric Explanation:

## Amount of Arrests:

As a group, we each measured different databases with different information, from the amount of firearms seized to even the number of crime incidents. As a group, we all had one established metric: to evaluate the "safeness" of each neighborhood and then determine the safest neighborhood. However, I measured the number of arrests per neighborhood, specifically by using the data set named Pittsburgh Police Arrest Data. By doing so, I used a bar graph to illustrate the data best.  






# Code Explanation: Comments Found Above Each Code Segment 

In [3]:
# First I decided on importing the pandas library and then read the CSV file into a DataFrame named 'arrests'.
arrests = pd.read_csv("NeighborhoodArrest.csv")

# Then set the display option to show all rows when printing the DataFrame.

pd.set_option('display.max_rows', None)

# Next group the DataFrame 'arrests' by the column 'INCIDENTNEIGHBORHOOD' and get the count of occurrences in each group.

neighborhood_arrests = arrests.groupby('INCIDENTNEIGHBORHOOD').size()

# Next create a new DataFrame 'd2' with the count of crimes in each neighborhood.

d2 = pd.DataFrame( { "number of crimes comitted" : neighborhood_arrests } )

# Then reset the index of DataFrame 'd2'.

d2.reset_index(inplace=True)

# Next you rename the columns of DataFrame 'd2' to more meaningful names.

d2.columns = [ "Neighborhoods", "Number_of_Arrests_Made"]

# Then sort the DataFrame 'd2' based on the 'Number_of_Arrests_Made' column in ascending order.

d2 = d2.sort_values(by = 'Number_of_Arrests_Made')

# Finally I decided on printing the sorted DataFrame 'd2'.

print(d2)


                  Neighborhoods  Number_of_Arrests_Made
57      Mt. Oliver Neighborhood                       2
91       Troy Hill-Herrs Island                       6
56              Mt. Oliver Boro                      18
19            Central Northside                      23
72                Regent Square                      36
73                    Ridgemont                      37
58                New Homestead                      39
88               Swisshelm Park                      42
21               Chartiers City                      44
27                East Carnegie                      48
84                    St. Clair                      53
64               Outside County                      55
65                Outside State                      64
87                  Summer Hill                      74
62                      Oakwood                      80
37  Golden Triangle/Civic Arena                      83
39                         Hays                 

# Dataset Explanation
As you can see, this is raw data, which was pulled, counted, and then sorted to count the number of arrests made per neighborhood. I then sorted the output of this dataset from the least to the greatest amount of arrests. If you have yet to notice, this is quite hard to read, which we wouldn't want for the intended audience. That being said, it gives us a good reason to convert this sorted dataset into a bar graph.


# Result Evaluation

By examining our ugly yet functionally sorted database, we can conclude that the neighborhood of Mount Oliver has the least number of arrests in the period of seven years, which allows us to determine that, on paper, it is the safest neighborhood. On the contrary, we can then determine that the neighborhood of Central Business District (also known as downtown) allows us to determine that it is the least safest neighborhood due to its 4250 arrests. This would be highly useful for our intended audience.

# Code Explanation: Comments Found Above Each Code Segment 


In [None]:
# First convert the 'Number_of_Arrests_Made' column in DataFrame 'd2' to numeric values

d2['Number_of_Arrests_Made'] = pd.to_numeric(d2['Number_of_Arrests_Made'], errors='coerce')

# Then sort DataFrame 'd2' based on the 'Number_of_Arrests_Made' column in descending order

d2_sorted = d2.sort_values(by= 'Number_of_Arrests_Made', ascending=False)

# Next set the figure size for the horizontal bar plot based on the number of neighborhoods in the sorted DataFrame

# Next create a horizontal bar plot using the sorted DataFrame 'd2_sorted'

plt.figure(figsize=(20, d2_sorted.shape[0] * 0.3))
plt.barh(d2_sorted['Neighborhoods'], d2_sorted['Number_of_Arrests_Made'])
plt.xlabel("Total Arrests Made")

# Now finally output the bar graph 
plt.show()



# Code Explanation: Comments Found Above Each Code Segment 

In [None]:
d2['Number_of_Arrests_Made'] = pd.to_numeric(d2['Number_of_Arrests_Made'], errors='coerce')

# First sort the DataFrame by 'Number_of_Arrests_Made' in descending order
d2_sorted = d2.sort_values(by='Number_of_Arrests_Made', ascending=False)

# Now select the top 5 and bottom 5 neighborhoods
top_and_bottom_5 = pd.concat([d2_sorted.head(5), d2_sorted.tail(5)])

# Then create a bar graph with different colors for top and bottom 5
colors = ['red'] * 5 + ['blue'] * 5
plt.figure(figsize=(20, 10))
plt.barh(top_and_bottom_5['Neighborhoods'], top_and_bottom_5['Number_of_Arrests_Made'], color=colors)
plt.xlabel("Total Arrests Made")
plt.title("Top 5 (Red) and Bottom 5 (Blue) Neighborhoods by Total Arrests Made")

# Next add a legend (in terms of grphing)
top_legend = plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markersize=10, label='Top 5')
bottom_legend = plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='blue', markersize=10, label='Bottom 5')
plt.legend(handles=[top_legend, bottom_legend])

# Finally output the graph 
plt.show()


# Dataset Explanation
If you hadn't noticed, the previous bar graph outputted the raw sorted data from all neighborhoods. Although the graph has the x-axis as the total arrests made and the Y-axis as the neighborhoods, the output still resembles the same conclusion that Mount. Oliver and Downtown were both the safest and not the safest neighborhoods in the city most of us call home. Although it outputted what we were looking for, for the data set in the form of a bar graph to be easy to read, I decided on sorting and only outputting the five safest and least safe neighborhoods on my graph.

# Result Evaluation

By analyzing our updated graph, we can now examine much more quickly that the five safest neighborhoods on paper are Mt.Oliver, Troy Hill-Herrs Island, Mt.Oliver Boro, Central Northside, and Regent Square in the period of seven years. On the contrary, the neighborhoods of Central Business District, South Side Flats, Cerrick, East Allegheny, and Homewood South are labeled the least safest. By splitting the chart into two different and less amount of neighborhoods, it is more attractive to the human eye. This new and updated graph can now have the power to help our provided audience. 

# Conclusion

In conclusion, Mt.Oliver would be the prime candidate for becoming the safest neighborhood in Pittsburgh if it was just based on the number of arrests. That being said, if I were to compare it to my favorite neighborhood of Southside, my favorite neighborhood would be known as a "bad neighborhood." However, when compared to each other, their activity levels are highly different due to the number of restaurants, stores, and businesses in Southside compared to Mt.Oliver. In my opinion, you can't base the best neighborhood on just how safe they are rather than on other factors.
