
# Best Neighborhood in Pittsburgh

### Introduction

Our mission to find the best neighborhood in Pittsburgh is based fundamentally on living standard. This concept is wide ranging and can depend on many things, such as the economy, health care, and education. Since we are limited to using datasets provided on the WPRDC website, we decided to find datasets which reflect some aspect to standard of living which also is measured according to neighborhoods. 

### Datasets and metrics 
Our datasets include educational attainment, household income, and housing unit value. These datasets were all taken from the same survey, so their structure is similar, and they contain all of the same neighborhoods. 

1. Educational Attainment can be a good indicator of standard of living. Access to higher education is seen as a pillar of developement and standard of living in a developed country. This also usually correlates with other factors that lead to higher standards of living, such as higher income, which in turn leads to better access to healthcare and nutrion.

2. Household income over a 12 month period is a straightforward indicator of standard of living. We may not like it, but having access to money provides access to many other important things in life, such as healthcare, education, food, housing, and other recreational activities that are important to people's mental health. 

3. Housing Unit Values per neighborhood gives us a metric in which we can determine the housing prices in the neighborhood. In most scenarios, neighboorhoods with a higher housing price means that it is a safer, more wealthy neighboorhood. There is usually a direct relationship between Housing prices and crime, public services, etc. By choosing to use Housing Unit Values, we are able to combine lots of factors that make a neighborhood together.


### Educational Attainment 

This dataset shows the level of education for each neighborhood in Pittsburgh. The ranges are broken down by grade level, and then degree attainment according to the highest level the person achieved. Our metric is measuring the proportion of the population of the neighborhood that has a Bachelor's Degree or higher. This is an arbitrary but understandable milestone to judge the level of education. We are making the assumption that a population with higher levels of education also has a higher standard of living, because they have more opportunities to get higher paying jobs.

In [1]:
import pandas as pd

education = pd.read_csv("education.csv")

college = [
    "Estimate; Total: - Bachelor's degree",
    "Estimate; Total: - Master's degree",
    "Estimate; Total: - Professional school degree",
    "Estimate; Total: - Doctorate degree",
]

education["College Degrees"] = education[college].sum(axis=1)

education["Total Population"] = education['Estimate; Total:']
education["Ratio of College"] = education["College Degrees"] / education["Total Population"]

# Display the ratio for each neighborhood
result = education[['Neighborhood', 'Total Population', 'College Degrees', 'Ratio of College']]
result = result.sort_values(by='Ratio of College', ascending=False)
result.head(30)

Unnamed: 0,Neighborhood,Total Population,College Degrees,Ratio of College
66,Regent Square,798.0,664.0,0.83208
76,Squirrel Hill North,5321.0,4404.0,0.827664
80,Strip District,684.0,560.0,0.818713
68,Shadyside,9561.0,7638.0,0.79887
63,Point Breeze,4117.0,3255.0,0.790624
57,North Shore,171.0,132.0,0.77193
56,North Oakland,2793.0,2080.0,0.744719
33,Friendship,1483.0,1058.0,0.713419
77,Squirrel Hill South,11601.0,8189.0,0.705887
64,Point Breeze North,1451.0,933.0,0.643005


### Household Income 

This dataset lists the household incomes by income bracket for each neighborhood. They have the total number of households in the neighborhood and then columns for each income bracket. We can measure the level of affluence of a neighborhood by looking at the number of households that make more than $100,000 in income. This is an arbitrary standard, but one which many americans seem to see as a significant milestone. We can then take the ratio of households with over 100K compared to all households in the neighborhood and then compare the ratios between neighborhoods to try to nullify population differences between neighborhoods. 


In [4]:
incomes = pd.read_csv("household-income.csv")

above100k = [
    "Estimate; Total: - $100,000 to $124,999",
    "Estimate; Total: - $125,000 to $149,999",
    "Estimate; Total: - $150,000 to $199,999",
    "Estimate; Total: - $200,000 or more",
]

incomes["Income Above 100k"] = incomes[above100k].sum(axis=1)

incomes["Total Households"] = incomes['Estimate; Total:']
incomes["Ratio Above 100k"] = incomes["Income Above 100k"] / incomes["Total Households"]

# Display the ratio for each neighborhood
result = incomes[['Neighborhood', 'Total Households', 'Income Above 100k', 'Ratio Above 100k']]
result = result.sort_values(by='Ratio Above 100k', ascending=False)
result.head(30)

Unnamed: 0,Neighborhood,Total Households,Income Above 100k,Ratio Above 100k
76,Squirrel Hill North,3370.0,1720.0,0.510386
80,Strip District,520.0,253.0,0.486538
63,Point Breeze,2342.0,1115.0,0.476089
66,Regent Square,476.0,197.0,0.413866
1,Allegheny West,146.0,50.0,0.342466
16,Central Business District,1968.0,640.0,0.325203
77,Squirrel Hill South,7211.0,2163.0,0.299958
57,North Shore,154.0,43.0,0.279221
72,South Side Flats,3311.0,888.0,0.268197
39,Highland Park,2977.0,733.0,0.246221


### Housing Unit Values

This dataset consists of a set of ranges of prices for housing in each neighborhood. We took the houses with a value of 250,000 dollars for each neighboor and decided to make that the cut-off for a good and bad house. We find how many houses are worth over \$250,000 in each neighborhood and divide it by the total houses in the dataset for the neighborhood. This gives us a ratio, the higher the ratio the better.

In [6]:
houses = pd.read_csv("Housing_Unit.csv")

# Select relevant columns for house values above $250,000
above250k = [
    "Estimate; Total: - $250,000 to $299,999",
    "Estimate; Total: - $300,000 to $399,999",
    "Estimate; Total: - $400,000 to $499,999",
    "Estimate; Total: - $500,000 to $749,999",
    "Estimate; Total: - $750,000 to $999,999",
    "Estimate; Total: - $1,000,000 to $1,499,999",
    "Estimate; Total: - $1,500,000 to $1,999,999",
    "Estimate; Total: - $2,000,000 or more"
]


# Sum the values above $250,000 for each neighborhood
houses['Houses_Above_250k'] = houses[above250k].sum(axis=1)

# Calculate the ratio of houses worth over $250,000 to total houses
houses['Total_Houses'] = houses['Estimate; Total:']
houses['Ratio_Above_250k'] = houses['Houses_Above_250k'] / houses['Total_Houses']

# Display the ratio for each neighborhood
result = houses[['Neighborhood', 'Total_Houses', 'Houses_Above_250k', 'Ratio_Above_250k']]
result = result.sort_values(by='Ratio_Above_250k', ascending=False)
result.head(30)

Unnamed: 0,Neighborhood,Total_Houses,Houses_Above_250k,Ratio_Above_250k
21,Chateau,3.0,3.0,1.0
76,Squirrel Hill North,1967.0,1508.0,0.76665
1,Allegheny West,69.0,47.0,0.681159
63,Point Breeze,1623.0,1065.0,0.656192
68,Shadyside,1936.0,1170.0,0.604339
66,Regent Square,343.0,193.0,0.562682
77,Squirrel Hill South,3152.0,1743.0,0.552982
39,Highland Park,1488.0,698.0,0.469086
16,Central Business District,397.0,175.0,0.440806
80,Strip District,72.0,29.0,0.402778


## Conclusion 

In conclusion, using these datasets and metrics we can see that a few neighborhoods are at the top of all of these charts, such as Squirell Hill, Regent Square, and Point Breeze. This aligns with our own experience living in Pittsburgh. These neighborhoods have nice and expensive houses, the people who live there are educated and wealthier, there is less crime as a result, and they are generally kept clean. 

Some potential downsides to our data analytics is that the datasets all came from the same large survey in 2015. This is almost 10 years ago. We thought it was still appropriate to use this data, because even though the economy has changed since then, the relative relationships between neighborhoods likely has not. For example, housing prices have gone up since then, but I expect that the houses in Squirrel Hill have gone up approximately at the same rate as houses in Oakland, so they may both be higher numbers, but their relative values are about the same. 