## **The Metric: Neighborhood Safety**

I used the Pittsburgh Police Arrests (2024–2025) data from the WPRDC
 to evaluate neighborhood safety as one aspect of “bestness.”

--------------------------------------------------------------------------------------------------------------------------------------------------------

# **The Metrics:**
**Total Arrests**: Neighborhoods with fewer arrests are generally safer and have lower crime rates, which contributes to a higher quality of life.

**Charge Severity**: The type of charges (Felony, Misdemeanor, Infraction) reflects how serious crimes are in each neighborhood. Even if two areas have the same number of arrests, one may be safer if those arrests are mostly minor offenses.

Together, these metrics show not only how often arrests happen but also how severe those incidents tend to be, offering a balanced view of neighborhood safety.

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv("arrests_2024_to_sept_2025.csv")

# Create an empty dictionary to store counts
arrest_counts = {}

# Iterate through each row
for index, row in df.iterrows():
    neighborhood = row['Neighborhood']            
    if neighborhood not in arrest_counts:         #using the structure of https://www.py4e.com/html3/09-dictionaries
        arrest_counts[neighborhood] = 1
    else:
        arrest_counts[neighborhood] = arrest_counts[neighborhood] + 1

# Display the result
# print (arrest_counts)

arrest_df = pd.DataFrame(list(arrest_counts.items()), columns=["Neighborhood", "Arrest_Count"])
# Sort by number of arrests (ascending order)
arrest_df = arrest_df.sort_values(by="Arrest_Count", ascending=True)

# Reset the index for clarity
arrest_df = arrest_df.reset_index(drop=True)

# Display the "best" neighborhoods with this metric
print(arrest_df.head(10))




     Neighborhood  Arrest_Count
0  Swisshelm Park             7
1   Regent Square             8
2       Ridgemont             9
3     Saint Clair            11
4       Chartiers            11
5     Summer Hill            12
6   New Homestead            16
7    Mount Oliver            17
8   East Carnegie            21
9      Glen Hazel            22


In [2]:
# Display the "worst" neighborhoods with this metric
print(arrest_df.tail(10))


                 Neighborhood  Arrest_Count
82         Marshall-Shadeland           339
83           Crawford-Roberts           373
84                  Knoxville           413
85               East Liberty           491
86                    Carrick           511
87             Homewood South           512
88             East Allegheny           799
89           South Side Flats          1557
90  Central Business District          2536
91                        NaN          2949


In [6]:
# Count Felony
mask_felony = df["ArrestCharge_Felony_Misdemeanor_Description"].str.contains("felony", case=False, na=False)
df_felony = df[mask_felony]

# Count Misdemeanor
mask_misd = df["ArrestCharge_Felony_Misdemeanor_Description"].str.contains("misdemeanor", case=False, na=False)
df_misd = df[mask_misd]

# Count Infraction
mask_inf = df["ArrestCharge_Felony_Misdemeanor_Description"].str.contains("infraction", case=False, na=False)
df_inf = df[mask_inf]

# Count unlabeled (empty or missing)
mask_unlabeled = df["ArrestCharge_Felony_Misdemeanor_Description"].str.contains("felony|misdemeanor|infraction", case=False, na=False ) == False #if not contain any of the 3 types
df_unlabeled = df[mask_unlabeled]

# Group and count by neighborhood
felony_counts = df_felony["Neighborhood"].value_counts()
misd_counts = df_misd["Neighborhood"].value_counts()
inf_counts = df_inf["Neighborhood"].value_counts()
unlabeled_counts = df_unlabeled["Neighborhood"].value_counts()

#putting those data into a df
charge_summary = pd.DataFrame(
    {
    "Felony": felony_counts,
    "Misdemeanor": misd_counts,
    "Infraction": inf_counts,
    "Unlabeled": unlabeled_counts
   }
    
).fillna(0).astype(int)


#Sort the df. sort by felony count first, if the same then Misdemeanor, if the same then sort with Infraction
charge_summary = charge_summary.sort_values(
    by=["Felony", "Misdemeanor", "Infraction"],
    ascending=[True, True, True]
).reset_index()




In [4]:
#The "best" Neighborhoods
charge_summary.sort_values(by="Felony", ascending=True).head(10)


Unnamed: 0,Neighborhood,Felony,Misdemeanor,Infraction,Unlabeled
0,Swisshelm Park,0,0,0,7
1,Regent Square,0,4,3,1
2,Summer Hill,0,4,3,5
3,New Homestead,0,9,3,4
4,Chartiers,1,7,1,2
5,Esplen,1,12,5,4
6,Saint Clair,2,8,1,0
7,Glen Hazel,2,10,6,4
8,Spring Garden,2,12,5,8
9,Allegheny West,2,12,24,2


In [5]:
#The "worst" Neighborhoods
charge_summary.sort_values(by="Felony", ascending=True).tail(10)

Unnamed: 0,Neighborhood,Felony,Misdemeanor,Infraction,Unlabeled
81,Homewood North,73,173,54,37
82,Marshall-Shadeland,79,168,62,30
83,East Hills,84,145,41,27
84,Carrick,99,279,78,55
85,Homewood South,102,267,101,42
86,Knoxville,115,190,68,40
87,East Liberty,119,245,84,43
88,East Allegheny,163,440,109,87
89,South Side Flats,305,766,362,124
90,Central Business District,371,1449,371,345


## **Neighborhood Safety Summary**

Based on the arrest data from 2024–2025:

**Best** (Safest) Neighborhood: Swisshelm Park — it recorded zero felonies, zero misdemeanors, zero infractions, and seven unlabeled arrests, showing minimal reported criminal activity.

**Worst** (Least Safe) Neighborhood: Central Business District — with 371 felonies, 1449 misdemeanors, 371 infractions, and 345 unlabeled arrests.

Overall, neighborhoods with fewer and less severe arrests (like Swisshelm Park or Regent Square) appear safer and could be considered “better” under this metric.


## **The Flaws**
This finding has several flaws. It doesn’t account for population size or county area, so larger or more urban counties naturally show higher crime counts. Differences in law enforcement, reporting practices, and other factors further limit how accurately the data reflects real crime patterns.