# Best Neighborhood in Pittsburgh


## Description

Here is an analysis of our methodology for selecting the best neighborhoods in Pittsburgh. We used three metrics: number of incident, number of parks, and number of K-12 registrations. Among these, we considered security as the most crucial factor and assigned a weightage of 40% to it, and 30% to each of the other two factors. We assigned scores to each neighborhood based on its rank in the respective metric and then multiplied it by its corresponding weightage to obtain the overall score for the neighborhood. Finally, we ranked the neighborhoods based on their total scores to identify the top neighborhoods in Pittsburgh.

In [4]:
import pandas as pd
import numpy as np

education = pd.read_csv("neighborhood_iep.csv")
incident_data = pd.read_csv("Incident.csv")
parks = pd.read_csv("parks.csv")

neighborhoods_k12 = pd.DataFrame(columns=['Neighborhood', 'K12 Enrollment'])

for i in range(len(education)):
    neighborhood_str = education.iloc[i]['neighborhoods']
    k12 = education.iloc[i]['total_enrollment_k_12']
    neighborhoods_list = neighborhood_str.split(', ')
    neighborhood = neighborhoods_list[0].strip()
    
    for neighborhood in neighborhoods_list:
        neighborhoods_k12.loc[len(neighborhoods_k12)] = [neighborhood, k12]

safety_weight = 0.4
k12_weight = 0.3
park_weight = 0.3

safety_ranking = incident_data["INCIDENTNEIGHBORHOOD"].value_counts().sort_values(ascending=True)
k12_ranking = neighborhoods_k12.groupby('Neighborhood')['K12 Enrollment'].sum().sort_values(ascending=False)
park_ranking = parks["neighborhood"].value_counts().sort_values(ascending=False)

# the "scores" is a normalized measure of the number of arrests/k12/parks in each neighborhood, which allows for comparison of neighborhoods.
neighborhood_scores = pd.DataFrame(index=safety_ranking.index)
neighborhood_scores["Safety_score"] = 100 * (safety_ranking.max() - safety_ranking) / (safety_ranking.max() - safety_ranking.min())
neighborhood_scores["K12_score"] = 100 * (1 - ((k12_ranking.max() - k12_ranking) / (k12_ranking.max() - k12_ranking.min())))
neighborhood_scores["Park_score"] = 100 * (1 - ((park_ranking.max() - park_ranking) / (park_ranking.max() - park_ranking.min())))
neighborhood_scores["Total_score"] = (neighborhood_scores["Safety_score"] * safety_weight
                                       + neighborhood_scores["K12_score"] * k12_weight
                                       + neighborhood_scores["Park_score"] * park_weight)

best_neighborhoods = neighborhood_scores.sort_values("Total_score", ascending=False)
top5 = best_neighborhoods.iloc[:5]
print(top5)

# Export csv file
best_neighborhoods = best_neighborhoods.reset_index()
best_neighborhoods.columns = ["Neighborhood", "Safety_score", "K12_score", "Park_score", "Total_score"]
best_neighborhoods.to_csv("best_neighborhoods.csv", index=False)

FileNotFoundError: [Errno 2] No such file or directory: 'neighborhood_iep.csv'