# Finding the Best Neighborhood in Pittsburgh
#### Team Pythons: Ethan Rubenstein, Gray Cleric, Teresa Davison

## In this project, we will consider three factors to determine which neighborhood in Pittsburgh is the best: amount of arrests, property values, and number of restaurants. To standardize these factors, one overarching metric will be used. Each zip code will be assigned points, based on which of the five percentiles its data falls into. For example, a zip code in the 99%th percentile of property values will earn five points, while a zip code in the 19%th percentile of arrest rates will earn 1 point.

In [60]:
import pandas as pd

In [61]:
crime_data = pd.read_csv('crime_data.csv')
property_data = pd.read_csv('property_data.csv')
restaurant_data = pd.read_csv('restaurant_data.csv')

## We hypothesize that at least one zipcode will earn the total 15 points. As a result, we will first look at only zip codes that earned five points. We will then combine them into one data frame and count the zip codes to see how many, if any, earned five points in each category.

In [62]:
# Finding all the zip codes where the total number of arrests is the lowest (5 points)
crime_data.astype(int)
crime_data_5s = crime_data.loc[crime_data['Points'] == 5]
crime_data_5s.head(5)

Unnamed: 0,Zip,Points
78,14219,5
79,15056,5
80,15071,5
81,14614,5
82,16503,5


In [63]:
# Finding all the zip codes where property values are the highest (5 points)
property_data = property_data.astype(int)
property_data_5s = property_data.loc[property_data['Points'] == 5]
property_data_5s.head(5)

Unnamed: 0,Zip,Points
0,15275,5
1,15276,5
2,15222,5
3,15086,5
4,15142,5


In [64]:
# Finding the zip codes with the most restaurants (5 points)
restaurant_data = restaurant_data.astype(int)
restaurant_data_5s = restaurant_data.loc[restaurant_data['Points'] == 5]
restaurant_data_5s.head(5)

Unnamed: 0,Zip,Points
111,15025,5
112,15220,5
113,15227,5
114,15102,5
115,15201,5


In [65]:
# Merging the crime and property data
crime_property_data_5s = pd.concat([property_data_5s,crime_data_5s], axis=0,ignore_index=True)
crime_property_data_5s

Unnamed: 0,Zip,Points
0,15275,5
1,15276,5
2,15222,5
3,15086,5
4,15142,5
...,...,...
59,15126,5
60,16693,5
61,16239,5
62,16830,5


In [66]:
# Making the final dataframe that includes all cases of zip codes that earned five points in any metric
final_data1 = pd.concat([crime_property_data_5s,restaurant_data_5s], axis=0,ignore_index=True)
final_data1.head(5)

Unnamed: 0,Zip,Points
0,15275,5
1,15276,5
2,15222,5
3,15086,5
4,15142,5


In [67]:
# Counting the zip codes to see which one(s) earned the most five-pointers
final_data1['Zip'].value_counts().head(10)


15219    2
15044    2
15212    2
15213    2
15222    2
15017    2
15217    2
15015    1
15201    1
15071    1
Name: Zip, dtype: int64

## This isn't what we expected. Seven zip codes tied for first. Now, we will narrow the best neighborhood down to those seven by looking for their corresponding rows in each dataset and adding their points together.


| Zipcode   | Neighborhood |
|-------|---               |
| 15219     |    Uptown, Herron Hill, & Schenley Heights |
| ~~15017~~     |     ~~Bridgeville~~        
~~15044~~       | ~~Gibsonia~~
15212       |  Northside
15213       | Oakland & Bellefield
15222       | Downtown
15217       | Squirrel Hill, Greenfield, & Browns Hill

## As you may notice, that Gibsonia and Bridgeville are marked out. That is because, while they are located in Allegheny County, those zip codes are not quite located in Pittsburgh, so they will be removed from the search.

## Now, we'll look to see if one zip code has more points than any of the others.

In [68]:
crime_data.query('Zip in (15219, 15212, 15213, 15222, 15217)')

Unnamed: 0,Zip,Points
3,15217,1
12,15213,1
16,15222,1
21,15212,1
22,15219,1


In [69]:
property_data.query('Zip in (15219, 15212, 15213, 15222, 15217)')

Unnamed: 0,Zip,Points
2,15222,5
5,15219,5
13,15213,5
14,15217,5
22,15212,5


In [70]:
restaurant_data.query('Zip in (15219, 15212, 15213, 15222, 15217)')

Unnamed: 0,Zip,Points
119,15217,5
135,15219,5
136,15213,5
137,15212,5
138,15222,5


## Wow, it looks like they all end up with the same number of points. Let's just check to see if there are no zip codes that earned 12 points by earning 4 points in each metric.

In [71]:
restaurant_data_4s = restaurant_data.loc[restaurant_data['Points'] == 4]
property_data_4s = property_data.loc[property_data['Points'] == 4]
crime_data_4s = crime_data.loc[crime_data['Points'] == 4]

crime_property_data_4s = pd.concat([property_data_4s,crime_data_4s], axis=0,ignore_index=True)

final_data2 = pd.concat([crime_property_data_4s,restaurant_data_4s], axis=0,ignore_index=True)
final_data2['Zip'].value_counts().head(10)


15102    2
15116    2
15101    2
15215    2
15084    2
15241    2
15668    2
15143    2
15231    2
15243    1
Name: Zip, dtype: int64

## There are no zip codes that earned 12 points. The five zip codes that we saw previously have an equal opportunity at being crowned the best neighborhood.                        

## We could continue to check that no other combination of points (like 3 in arrests, 4 in property value, and 5 in restaurants) adds up to more than 11, but at this point, something strange has come up. All five zip codes with 5 points in restaurants and property values received a score of 1 in crime. Let's look closer at how the crime metric works.

In [72]:
crime_data_5s.head(5)

Unnamed: 0,Zip,Points
78,14219,5
79,15056,5
80,15071,5
81,14614,5
82,16503,5


## The zipcodes that earned 5 points in arrests tend to be rural, smaller towns. This is beneficial for them, as the points system for arrest counts the total number of arrests and then breaks it into five brackets. Smaller towns are likely to have fewer recorded arrests, thus earning more points than more populated neighborhoods. To combat this bias, we can control for population and score based on arrest rates, not total recorded instances of arrests.

## For brevity, instead of making a new dataframe and going through the process of sorting and scoring the data again, we will determine which of the five zip codes has the lowest arrest rate.

| Zip | Population | Arrests since 2016 | Arrests/Resident|
|---------|-----|---------|---|
|15217 | 26,190 | 601 | .023
|15219 |16,514 |14,814 | .897
| 15213|28,265 | 924| .033
|15212 | 26,634|5708 | .214
|15222 | 4,700| 1822| .388

## Using the table above, we deterine that Squirrel Hill, Greenfield, & Browns Hill are the best neighborhoods, with Oakland & Bellefield coming in a close second place.

# Comparing the Overall Conclusion to Individual Conclusions

## Factor: Arrests
### text

## Factor: Property Values
### text

## Factor: Restaurants
### text