
# Best Neighborhood in Pittsburgh

### Introduction

Our mission to find the best neighborhood in Pittsburgh is based fundamentally on living standard. This concept is wide ranging and can depend on many things, such as the economy, health care, and education. Since we are limited to using datasets provided on the WPRDC website, we decided to find datasets which reflect some aspect to standard of living which also is measured according to neighborhoods. 

### Datasets and metrics 
Our datasets include arrest data by neighborhood, median age of death by neighborhood, and fire incidents (?)

1. Arrest incidents show where the arrest was made. This can reflect the level of crime in the neighborhood. This is not a perfect indication, because of course there are crimes that go unreported and the criminals are never arrested. That being said, if there is a significantly higher number of arrests in one neighborhood compared to another, this is probably an indicator of the level of public safety in the area and maybe a symptom of other socioeconomic issues that are pervasive in that neighborhood. 

2. Housing Unit Values per neighborhood gives us a metric in which we can determine the housing prices in the nieghborhood. In most case scenarios, neighboorhoods with a higher housing price means that it is a safer, more wealthy neighboorhood. There is usually a direct relaitonship between Housing prices and crime, public services, etc. By choosing to use Housing Unit Values, we are able to combine lots of factors that make a neighborhood together.


3. The median age of death in a neighborhood also can reflect standard of living. When comparing countries' level of development, life expectancy is an important factor. It generally depends on the healthcare, nutrion, and lifestyle in a society, and is commonly negatively influenced by instability and conflict. We can use this idea on a smaller scale when comparing neighborhoods in the city. If a neighborhood has a significantly lower median age of death, this can be reflective of a general lower standard of living due to socioeconomic issues.




### Arrest data 

The arrest data lists an incident location and incident neighborhood. For this metric, we simply summed up the number of incidents per neighborhood. The higher the number, the worse a neighborhood it will be. The area size of a neighborhood, or the population could be factored in to consider proportionality too.

In [2]:
import pandas as pd

arrests = pd.read_csv("Arrest_data.csv")

# sort by incidentneighborhood and then count up sum of incidents 
neighborhood_counts = arrests['INCIDENTNEIGHBORHOOD'].value_counts().reset_index()

neighborhood_counts.columns = ['Neighborhood', 'Arrest Count']

neighborhood_counts = neighborhood_counts.sort_values(by='Arrest Count', ascending=True)

neighborhood_counts.head(20)


Unnamed: 0,Neighborhood,Arrest Count
97,Mt. Oliver Neighborhood,2
96,Troy Hill-Herrs Island,6
95,Mt. Oliver Boro,18
94,Central Northside,23
92,Regent Square,37
93,Ridgemont,37
91,New Homestead,39
90,Swisshelm Park,43
89,Chartiers City,46
88,East Carnegie,48


### Median death age 

This dataset lists the median death age for white and black residents in each neighborhood of the city. This is conveniently organized into neighborhoods already. A higher median age for death is better, and can be factored into a metric. We will average the white and black data into one parameter. If one of the fields has no data, we will default to the one that has data. If there is no data for either attribute, we can assign the average of the whole set to that neighborhood. 


In [3]:
air_quality = pd.read_csv("Air_Qualitydata.csv")
air_quality.head() 

FileNotFoundError: [Errno 2] No such file or directory: 'Air_Qualitydata.csv'

### Third dataset  -- Housing Unit

This datasit consists of a set of ranges of prices for housing in each neighborhood. We took the houses with a value of 250,000 dollars for each neighboor and decided to make that the cut-off for a good and bad house. We find how many houses are worth over \$250,000 in each neighborhood and divide it by the total houses in the dataset for the neighborhood. This gives us a ratio, the higher the ratio the better.

In [13]:
arrests = pd.read_csv("Housing_Unit.csv")

# Select relevant columns for house values above $250,000
above250k = [
    "Estimate; Total: - $250,000 to $299,999",
    "Estimate; Total: - $300,000 to $399,999",
    "Estimate; Total: - $400,000 to $499,999",
    "Estimate; Total: - $500,000 to $749,999",
    "Estimate; Total: - $750,000 to $999,999",
    "Estimate; Total: - $1,000,000 to $1,499,999",
    "Estimate; Total: - $1,500,000 to $1,999,999",
    "Estimate; Total: - $2,000,000 or more"
]


# Sum the values above $250,000 for each neighborhood
arrests['Houses_Above_250k'] = arrests[above250k].sum(axis=1)

# Calculate the ratio of houses worth over $250,000 to total houses
arrests['Total_Houses'] = arrests['Estimate; Total:']
arrests['Ratio_Above_250k'] = arrests['Houses_Above_250k'] / arrests['Total_Houses']

# Display the ratio for each neighborhood
result = arrests[['Neighborhood', 'Total_Houses', 'Houses_Above_250k', 'Ratio_Above_250k']]
result = result.sort_values(by='Ratio_Above_250k', ascending=False)
result

Unnamed: 0,Neighborhood,Total_Houses,Houses_Above_250k,Ratio_Above_250k
21,Chateau,3.0,3.0,1.000000
76,Squirrel Hill North,1967.0,1508.0,0.766650
1,Allegheny West,69.0,47.0,0.681159
63,Point Breeze,1623.0,1065.0,0.656192
68,Shadyside,1936.0,1170.0,0.604339
...,...,...,...,...
90,Windgap,537.0,0.0,0.000000
4,Arlington Heights,0.0,0.0,
35,Glen Hazel,0.0,0.0,
57,North Shore,0.0,0.0,
