# The Best Neighborhood in Pittsburgh

Our task was to figure out which neighborhood in Pittsburgh, PA is the best out of them all. 

## Metric

We determined that the best way to quantify the 'best' neighborhood is to find the one that is the most happy! In order to determine happiness, we looked at a few key factors in determining a community's happiness. Our goal for our work here is to find a single quantified value for how happy any single neighborhood is.

Let's set up the jupyter notebook using code, algorithms, and magic:

In [None]:
import sys
import pandas as pd
!{sys.executable} -m pip install geopy --user
from geopy.geocoders import Nominatim
import geopy
%matplotlib inline
import matplotlib.pyplot as plt

## Key Happiness Factor

When it comes to happiness, leisure is a top factor. Within leisure, we found multiple amenities that contributed towards the possible leisure of a neighborhood.

## Leisure

One amenity is recreation, such as parks and playgrounds. We all know that `more slides = more happy`. So, we looked at the number of documented slides in every neighborhood. We included two types of slides.

#### Slide Type 1: Playground Slides

Who didn't love going down the slides at a playground as a kid? blah blah

Let's grab the documented playground slides from the WPRDC

In [None]:
playgrounds = pd.read_csv("https://data.wprdc.org/dataset/640add54-b0e1-4abb-a232-f5092b243ee0/resource/40097711-aa25-47d9-b0fb-920cace3afa0/download/opendata-pubworks-play-area-listing-2015.csv")
playgrounds.head(3)

That's not very useful on its own and stuff so let's pandas

In [None]:
playgrounds.groupby('Neighborhood').count().head(3)

Better! But not quite. All these numbers are kinda strange and inconsistent. But, each play area is only named once, so by counting the names of play areas, we can find the number of play areas in a given neighborhood.

In [None]:
playgrounds.groupby("Neighborhood").count().loc[:,"Play area name"][:7]

And we can sort it:

In [None]:
playground_series = playgrounds.groupby("Neighborhood").count().loc[:,"Play area name"].sort_values(ascending=False)
playground_series[:7]

Let's scale it down to a number that we can use later.

In [None]:
max_val = playground_series[0]
playground_values = playground_series.divide(max_val)
playground_values[:7]



---

#### Slide Type 2: Landslides

Slides are more fun when they're on the ground. So, we got the dataset for _land_slides. Let's take a look.

In [None]:
landslides = pd.read_csv("https://data.wprdc.org/dataset/7db7daf4-1fcc-4ad6-ad5e-6ed21a45b154/resource/dde1f413-c849-413c-b791-0f861bf219ce/download/globallandslides.csv")
pgh_landslides = landslides.groupby('ev_id').filter(lambda x: x['gaz_point'] == 'Pittsburgh')
pgh_landslides.head(3)

Cool! But what we really need from this dataset are the latitude and longitude. We can then use the package geopy that we installed earlier to get the neigborhood names from the coordinates.

In [None]:
geolocator = Nominatim(user_agent="cmpinf0999-bdr")

locations = pgh_landslides.loc[:,'latitude':'longitude'].to_string(index=False,index_names=False)
locations = [x.replace(' ', ', ') for x in locations.split("\n")[1:]]

neighborhoods = []
for loc in locations:
    response = str(geolocator.reverse(loc))
    #print(response)
    info = response.split(', ')
    if len(info[0]) == 4 or len(info[0]) == 3:
        neighborhoods.append(info[2])
    else:
        neighborhoods.append(info[max(info.index("Allegheny County") - 2,1)])
neighborhoods[:10]

Now, we can count the number of landslides that happened in each neighborhood.

In [None]:
landslide_counts = pd.Series(neighborhoods).value_counts()
landslide_counts[:7]

Let's scale it down again.

In [None]:
max_val = landslide_counts[0]
landslide_values = pd.Series(landslide_counts).divide(max_val)
landslide_values[:7]

## Steps

Another amenity we found was the number of steps in every neighborhood. Since the average human being hates steps, we concluded that `less steps = more happy`. One might argue that some people _do_ like steps, such as olympians, because exercise! That person would be right: **but**, we also concluded that the number of olympians living in the neighborhoods of Pittsburgh is insignificantly small. 

There are a total of 33 olympic athletes who were born in Pittsburgh (https://www.sports-reference.com/olympics/friv/birthplaces.cgi?id=7645). With the current population in Pittsburgh of 302,000, that means that olympians account for a maximum of 0.001% of the population (http://worldpopulationreview.com/us-cities/pittsburgh-population/). Therefore, we can count steps as a measure of unhappiness.

In [None]:
providers = pd.read_csv("https://data.wprdc.org/dataset/ae1f7cda-5e15-4a8a-a5b2-2e4803f1500a/resource/c2df1e6f-5563-4e53-9de8-b0e4c7d2cb93/download/pittsburghispsbyblock.csv",index_col='Provider_Id')
steps = pd.read_csv("https://data.wprdc.org/datastore/dump/43f40ca4-2211-4a12-8b4f-4d052662bb64",index_col='id')

In [None]:
steps.groupby("neighborhood").sum()['number_of_steps'].sort_values(ascending=False)

In [None]:

stepsDF = steps.groupby('neighborhood').sum()
stepsDF = stepsDF.sort_values(['number_of_steps'], ascending = [False])
stepsDF = stepsDF.filter(items = ['name', 'number_of_steps', 'neighborhood'])
stepsDF

In [None]:
stepsDF['metric'] = stepsDF['number_of_steps']/3666.0
stepsDF

In [None]:
plt.style.use('seaborn')
stepsDF["number_of_steps"].plot.bar(figsize = (30, 4))

## Internet

Up next, an amenity that we can all relate to is _internet speed_. Better internet makes people more happy, so we took a look at average internet speeds in each neighborhood.

In [None]:
providers.head()

In [None]:
providersDF = providers.filter(items = ['Neighborhood', 'MaxAdDown', 'MaxAdUp'])
providersDF = providersDF.groupby("Neighborhood").sum().sort_values(['MaxAdDown'], ascending = False)
providersDF['metricDown'] = providersDF['MaxAdDown']/80796.768
providersDF['metricUp'] = providersDF['MaxAdUp']/32158.964

providersDF

In [None]:
providersDF["MaxAdUp"].plot.bar(figsize = (25, 4))

# Conclusions

Add all them numbers up!

In [None]:
# add slides together
agg_slide_vals = playground_values.add(landslide_values, fill_value=0)
agg_slide_vals.head()

In [None]:
# sort so we see who's on top
agg_slide_vals = agg_slide_vals.sort_values(ascending=False)
agg_slide_vals.head(10)

In [None]:
total = agg_slide_vals.add(providersDF.loc[:,'metricDown'], fill_value=0)
total = total.subtract(stepsDF.loc[:,'metric'])

In [None]:
plt.style.use('seaborn')
total.plot.bar(figsize = (30, 4))

In [None]:
total = total.sort_values(ascending=False)
total[:10].plot.bar()

In [None]:
total[:10]