# Final Ranking

This notebook combines three metrics from team members:
- **Ivan**: Food score (number of restaurants)
- **Ruoyu**: Safety score (crime # per 100k)
- **Yang**: Education-Income index



In [86]:
import pandas as pd

# Load the three CSV files from Final_Notebook_Data folder
food_df = pd.read_csv('Final_Notebook_Data/ivan_food_score.csv')
safety_df = pd.read_csv('Final_Notebook_Data/ruoyu_safety_score.csv')
education_df = pd.read_csv('Final_Notebook_Data/yang_education_score.csv')


# 1. Clean Data

In [87]:
# Rename score columns 
food_df.columns = ['neighborhood', 'food_score']
safety_df.columns = ['neighborhood', 'safety_score']
education_df.columns = ['neighborhood', 'education_score']

# keep only neighborhoods present in all three
merged = food_df.merge(safety_df, on='neighborhood', how='inner')
merged = merged.merge(education_df, on='neighborhood', how='inner')

# Remove rows with missing data
merged = merged.dropna()

print(f"Neighborhoods appearing in all 3 datasets ]: {len(merged)}")
print(merged.head())

Neighborhoods appearing in all 3 datasets ]: 67
                neighborhood  food_score  safety_score  education_score
0  Central Business District         175  4.912069e+05         0.494142
1           South Side Flats         102  3.035283e+05         0.560153
2                North Shore          83  1.818537e+06         0.687365
3              North Oakland          76  4.959383e+04         0.599985
4            Central Oakland          68  8.981450e+04         0.321219


In [88]:
# Remove outliers in safety score that has extreme values
percentile_96 = merged['safety_score'].quantile(0.96)
print(f"99th percentile of safety score: {percentile_96:.2f}")
print(f"Neighborhoods above 99th percentile: {(merged['safety_score'] > percentile_96).sum()}")
# Cap extreme values 
merged['safety_score_capped'] = merged['safety_score'].clip(upper=percentile_96)

99th percentile of safety score: 511989.44
Neighborhoods above 99th percentile: 3


# 2. Normalize Score into [0,100]

In [None]:
# Normalize all scores to [0, 100]
def normalize(series):
    return ((series - series.min()) / (series.max() - series.min())) * 100

merged['food_norm'] = normalize(merged['food_score'])
merged['safety_norm'] = normalize(merged['safety_score_capped'])  
merged['education_norm'] = merged['education_score'] * 100  # scale to 100


## 3. Calculate Final Score 

**Weights:**
- Education: **1**
- Food: **1**  
- Safety: **2**

**Formula:**

$$\text{Final Score} = \text{Education} + \text{Food} - 2 \times \text{Safety}$$

*Note: Higher Safety Score means higher crime rate, so we subtract it.*

In [None]:

# Calculate final score with weights: Safety=2, Education=1, Food=1
merged['final_score'] = merged['education_norm'] + merged['food_norm'] -2 * merged['safety_norm'] 

# Sort by final score 
result = merged.sort_values('final_score', ascending=False).reset_index(drop=True)
result['rank'] = range(1, len(result) + 1)

print("TOP 10 BEST NEIGHBORHOODS IN PITTSBURGH")
print("="*90)
print(result[['rank', 'neighborhood', 'final_score', 'safety_norm', 'education_norm', 'food_norm']].head(10).to_string(index=False))

TOP 10 BEST NEIGHBORHOODS IN PITTSBURGH
 rank        neighborhood  final_score  safety_norm  education_norm  food_norm
    1 Squirrel Hill South    91.185531     7.513776       71.155612  35.057471
    2 Squirrel Hill North    90.148110     4.995295       82.322607  17.816092
    3       North Oakland    87.708940     7.696502       59.998495  43.103448
    4           Shadyside    84.480961    10.709622       67.969169  37.931034
    5       Regent Square    65.801489     8.714810       82.656396   0.574713
    6        Point Breeze    65.658313     9.127063       80.464163   3.448276
    7          Bloomfield    54.155520    17.839184       52.477566  37.356322
    8       Highland Park    53.421009    11.169072       71.736164   4.022989
    9          Greenfield    53.143547     7.292158       59.681886   8.045977
   10           Troy Hill    47.759749     0.000000       43.162047   4.597701


## 4. Conclusion

**Based on the output above, we picked Squirrel Hill South as best neighborhood in Pittsburgh**