# Education & Income Submetric Analysis
This notebook analyzes Pittsburgh neighborhoods based on **Education and Income (2010)** data from WPRDC.

It constructs a composite **Education窶的ncome Index**, combining:
- Higher Education Rate (Bachelor's + Postgraduate)
- Median Income (2009, adjusted to 2013 dollars)
- Poverty Rate (inverted, lower is better)

### Metric definition
\[ Education窶的ncome Index = (Edu_norm + Income_norm + (1 - Poverty_norm)) / 3 \]

All indicators are min-max normalized to ensure comparability across neighborhoods.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load dataset
file_path = 'education-income.xls'
df = pd.read_excel(file_path)

# Select relevant columns
df_main = df[['Neighborhood',
              "Edu. Attainment: Bachelor's Degree (2010)",
              "Edu. Attainment: Postgraduate Degree (2010)",
              "2009 Med. Income ('13 Dollars)",
              'Est. Percent Under Poverty (2010)']].copy()

# Compute higher education rate
df_main['Higher_Edu_Rate'] = df_main["Edu. Attainment: Bachelor's Degree (2010)"] + df_main["Edu. Attainment: Postgraduate Degree (2010)"]

# Rename for simplicity
df_main.rename(columns={
    "2009 Med. Income ('13 Dollars)": 'Median_Income',
    'Est. Percent Under Poverty (2010)': 'Poverty_Rate'
}, inplace=True)

# Define min-max normalization function
def minmax_norm(series):
    return (series - series.min()) / (series.max() - series.min())

# Normalize
df_main['Edu_norm'] = minmax_norm(df_main['Higher_Edu_Rate'])
df_main['Income_norm'] = minmax_norm(df_main['Median_Income'])
df_main['Poverty_norm'] = minmax_norm(1 - df_main['Poverty_Rate'])  # lower poverty = better

# Compute composite index
df_main['Edu_Income_Index'] = (df_main['Edu_norm'] + df_main['Income_norm'] + df_main['Poverty_norm']) / 3

# Rank neighborhoods
ranked = df_main.sort_values('Edu_Income_Index', ascending=False).reset_index(drop=True)
top10 = ranked.head(10)
bottom10 = ranked.tail(10)
ranked.to_csv('education_income_index.csv', index=False)

ranked.head()

Unnamed: 0,Neighborhood,Edu. Attainment: Bachelor's Degree (2010),Edu. Attainment: Postgraduate Degree (2010),Median_Income,Poverty_Rate,Higher_Edu_Rate,Edu_norm,Income_norm,Poverty_norm,Edu_Income_Index
0,South Shore,0.0,0.47619,163772.5,0.0,0.47619,0.713252,1.0,1.0,0.904417
1,Regent Square,0.210121,0.440044,84635.23,0.010929,0.650165,0.973835,0.516785,0.989071,0.826564
2,Squirrel Hill North,0.266796,0.400838,91408.853333,0.088467,0.667633,1.0,0.558145,0.911533,0.823226
3,Point Breeze,0.259139,0.325081,95704.18,0.045509,0.58422,0.875062,0.584373,0.954491,0.804642
4,Strip District,0.334086,0.185102,70706.12,0.021322,0.519187,0.777653,0.431734,0.978678,0.729355


##  Top 10 and Bottom 10 Neighborhoods

In [None]:
# Display Top 10 and Bottom 10 neighborhoods
print('Top 10 Neighborhoods by Education Income Index:\n')
display(top10[['Neighborhood', 'Edu_Income_Index']])
print('\nBottom 10 Neighborhoods:\n')
display(bottom10[['Neighborhood', 'Edu_Income_Index']])

##  Visualization

In [None]:
plt.figure(figsize=(10,6))
plt.barh(top10['Neighborhood'], top10['Edu_Income_Index'], color='mediumseagreen')
plt.gca().invert_yaxis()
plt.title('Top 10 Neighborhoods by Education窶的ncome Index (2010)')
plt.xlabel('Education窶的ncome Index')
plt.tight_layout()
plt.show()

## Interpretation
Neighborhoods like **South Shore**, **Regent Square**, and **Squirrel Hill North** showed the highest composite Education-Income Index, meaning they had a combination of:
- Higher proportion of residents with bachelor's or postgraduate degrees
- Higher median income levels
- Lower poverty rates

On the other hand, neighborhoods such as **Glen Hazel**, **St. Clair**, and **Homewood North** ranked lower, reflecting economic and educational challenges in 2010 data.

 Note: This dataset is from 2010; current conditions may differ substantially.

##  Conclusion
This submetric quantifies how education and income jointly reflect the social and economic well-being of Pittsburgh neighborhoods.

In the final group project, this metric can be combined with others, such as safety, greenery, or accessibility, to determine the overall 'Best Neighborhood in Pittsburgh'.