
# Best Neighborhood in Pittsburgh

### Introduction

Our mission to find the best neighborhood in Pittsburgh is based fundamentally on living standard. This concept is wide ranging and can depend on many things, such as the economy, health care, and education. Since we are limited to using datasets provided on the WPRDC website, we decided to find datasets which reflect some aspect to standard of living which also is measured according to neighborhoods. 

### Datasets and metrics 
Our datasets include arrest data by neighborhood, median age of death by neighborhood, and fire incidents (?)

1. Arrest incidents show where the arrest was made. This can reflect the level of crime in the neighborhood. This is not a perfect indication, because of course there are crimes that go unreported and the criminals are never arrested. That being said, if there is a significantly higher number of arrests in one neighborhood compared to another, this is probably an indicator of the level of public safety in the area and maybe a symptom of other socioeconomic issues that are pervasive in that neighborhood. 

2. Housing Unit Values per neighborhood gives us a metric in which we can determine the housing prices in the nieghborhood. In most case scenarios, neighboorhoods with a higher housing price means that it is a safer, more wealthy neighboorhood. There is usually a direct relaitonship between Housing prices and crime, public services, etc. By choosing to use Housing Unit Values, we are able to combine lots of factors that make a neighborhood together.


3. The median age of death in a neighborhood also can reflect standard of living. When comparing countries' level of development, life expectancy is an important factor. It generally depends on the healthcare, nutrion, and lifestyle in a society, and is commonly negatively influenced by instability and conflict. We can use this idea on a smaller scale when comparing neighborhoods in the city. If a neighborhood has a significantly lower median age of death, this can be reflective of a general lower standard of living due to socioeconomic issues.




### Arrest data 

The arrest data lists an incident location and incident neighborhood. For this metric, we simply summed up the number of incidents per neighborhood. The higher the number, the worse a neighborhood it will be. The area size of a neighborhood, or the population could be factored in to consider proportionality too.

In [2]:
import pandas as pd

arrests = pd.read_csv("Arrest_data.csv")

# sort by incidentneighborhood and then count up sum of incidents 
neighborhood_counts = arrests['INCIDENTNEIGHBORHOOD'].value_counts().reset_index()

neighborhood_counts.columns = ['Neighborhood', 'Arrest Count']

neighborhood_counts = neighborhood_counts.sort_values(by='Arrest Count', ascending=True)

neighborhood_counts.head(20)


Unnamed: 0,Neighborhood,Arrest Count
97,Mt. Oliver Neighborhood,2
96,Troy Hill-Herrs Island,6
95,Mt. Oliver Boro,18
94,Central Northside,23
93,Regent Square,37
92,Ridgemont,37
91,New Homestead,39
90,Swisshelm Park,43
89,Chartiers City,46
88,East Carnegie,48


### Household Income 

This dataset lists the household incomes by income bracket for each neighborhood. They have the total number of households in the neighborhood and then columns for each income bracket. We can measure the level of affluence of a neighborhood by looking at the number of households that make more than $100,000 in income. This is an arbitrary standard, but one which many americans seem to see as a significant milestone. We can then take the ratio of households with over 100K compared to all households in the neighborhood and then compare the ratios between neighborhoods to try to nullify population differences between neighborhoods. 


In [4]:
incomes = pd.read_csv("household-income.csv")

above100k = [
    "Estimate; Total: - $100,000 to $124,999",
    "Estimate; Total: - $125,000 to $149,999",
    "Estimate; Total: - $150,000 to $199,999",
    "Estimate; Total: - $200,000 or more",
]

incomes["Income Above 100k"] = incomes[above100k].sum(axis=1)

incomes["Total Households"] = incomes['Estimate; Total:']
incomes["Ratio Above 100k"] = incomes["Income Above 100k"] / incomes["Total Households"]

# Display the ratio for each neighborhood
result = incomes[['Neighborhood', 'Total Households', 'Income Above 100k', 'Ratio Above 100k']]
result = result.sort_values(by='Ratio Above 100k', ascending=False)
result.head(30)

Unnamed: 0,Neighborhood,Total Households,Income Above 100k,Ratio Above 100k
76,Squirrel Hill North,3370.0,1720.0,0.510386
80,Strip District,520.0,253.0,0.486538
63,Point Breeze,2342.0,1115.0,0.476089
66,Regent Square,476.0,197.0,0.413866
1,Allegheny West,146.0,50.0,0.342466
16,Central Business District,1968.0,640.0,0.325203
77,Squirrel Hill South,7211.0,2163.0,0.299958
57,North Shore,154.0,43.0,0.279221
72,South Side Flats,3311.0,888.0,0.268197
39,Highland Park,2977.0,733.0,0.246221


### Third dataset  -- Housing Unit

This dataset consists of a set of ranges of prices for housing in each neighborhood. We took the houses with a value of 250,000 dollars for each neighboor and decided to make that the cut-off for a good and bad house. We find how many houses are worth over \$250,000 in each neighborhood and divide it by the total houses in the dataset for the neighborhood. This gives us a ratio, the higher the ratio the better.

In [6]:
houses = pd.read_csv("Housing_Unit.csv")

# Select relevant columns for house values above $250,000
above250k = [
    "Estimate; Total: - $250,000 to $299,999",
    "Estimate; Total: - $300,000 to $399,999",
    "Estimate; Total: - $400,000 to $499,999",
    "Estimate; Total: - $500,000 to $749,999",
    "Estimate; Total: - $750,000 to $999,999",
    "Estimate; Total: - $1,000,000 to $1,499,999",
    "Estimate; Total: - $1,500,000 to $1,999,999",
    "Estimate; Total: - $2,000,000 or more"
]


# Sum the values above $250,000 for each neighborhood
houses['Houses_Above_250k'] = houses[above250k].sum(axis=1)

# Calculate the ratio of houses worth over $250,000 to total houses
houses['Total_Houses'] = houses['Estimate; Total:']
houses['Ratio_Above_250k'] = houses['Houses_Above_250k'] / houses['Total_Houses']

# Display the ratio for each neighborhood
result = houses[['Neighborhood', 'Total_Houses', 'Houses_Above_250k', 'Ratio_Above_250k']]
result = result.sort_values(by='Ratio_Above_250k', ascending=False)
result.head(30)

Unnamed: 0,Neighborhood,Total_Houses,Houses_Above_250k,Ratio_Above_250k
21,Chateau,3.0,3.0,1.0
76,Squirrel Hill North,1967.0,1508.0,0.76665
1,Allegheny West,69.0,47.0,0.681159
63,Point Breeze,1623.0,1065.0,0.656192
68,Shadyside,1936.0,1170.0,0.604339
66,Regent Square,343.0,193.0,0.562682
77,Squirrel Hill South,3152.0,1743.0,0.552982
39,Highland Park,1488.0,698.0,0.469086
16,Central Business District,397.0,175.0,0.440806
80,Strip District,72.0,29.0,0.402778
