<img align=center src = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/PittSkyline082904.jpg/800px-PittSkyline082904.jpg" width = 400>

<h1 align=center><font size = 5>An Analysis of Pittsburgh Neighborhoods: A Guide for Incoming Homeowners</font></h1>

### Introduction

Once a city of dreary smog covered skies and droning steel mining, Pittsburgh has made quite the change in the last couple decades to become a booming city of healthcare, technology, and innovation. Attracting new tech company offices from Facebook, Uber, and Google, Pittsburgh has been heading in the direction of becoming a hotspot for millenials seeking new and upcoming jobs in booming industries. Pittsburgh has already been a powerhouse for healthcare and engineering with its US Steel, UPMC Health Network, and various pharmaceutical industrial complexes, so it can really hold its own in its diverse portfolio of desirable jobs.

Not only does it have the jobs, but it has the attractions to keep the populace entertained. Long known for its music, the Carnegie Theater still boasts some of the greatest performances in the nation. The sports arenas are some of the most beautiful around, sitting on the edge of the 3 rivers the city is known for, they have been the home to many championships, rightly giving the city one of its nicknames as the "City of Champions". It has even started to attract more large budget productions, such as The Dark Knight Rises, Jack Reacher, and The Perks of Being a Wallflower. 

With all these enticing options, it is no wonder that people are flocking to Pittsburgh and its surrounding areas. While I grew up there, life took my across the country, but now as I am planning to head back, I have been wondering how the various neighborhoods stack up against each other. So with this project I aim to compare the neighborhoods against each other, looking at aspects of their local attractions and housing costs to see how they rank up against each other. Hopefully this analysis will help others looking at making the move as well!

### Data Sets Background

In order to analyze the city, I first needed to collect the neighborhoods and their coordinates. By using the wikipedia page for the <a href='https://en.wikipedia.org/wiki/List_of_Pittsburgh_neighborhoods'>Neighborhoods of Pittsburgh</a> I was able to find the neighborhoods and their information. I organized them into a CSV file attached here.

First I used my CSV file and Google Maps API to get the coordinates for each neighborhood.

In [2]:
#import required libraries
import numpy as np
import pandas as pd

#read neighborhood data
neighborhoods_df=pd.read_csv(r"PittsburghNeighborhoods.csv")
neighborhoods_df["Latitude"]='Not Assigned'
neighborhoods_df["Longitude"]='Not Assigned'
neighborhoods_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allegheny Center,Not Assigned,Not Assigned
1,Allegheny West,Not Assigned,Not Assigned
2,Allentown,Not Assigned,Not Assigned
3,Arlington,Not Assigned,Not Assigned
4,Arlington Heights,Not Assigned,Not Assigned


In [3]:
#Define the function to get the coordinates for each neighborhood
import requests

def get_lat_lng(neighborhood):
    address=str(neighborhood)+',+Pittsburgh,+PA'
    API_KEY='AIzaSyBTEFjr6dsxEpetb3zpeMe4tM1Uo5YrLCA'
    URL='https://maps.googleapis.com/maps/api/geocode/json?address='+address+'&key='+API_KEY
    response=requests.get(URL)
    response=response.json()
    lat=response["results"][0]["geometry"]["location"]["lat"]
    lng=response["results"][0]["geometry"]["location"]["lng"]
    return(lat, lng)

#Test our function
lat, lng=get_lat_lng('South+Oakland')
print(lat)
print(lng)

40.4319875
-79.9598383


In [4]:
#Get the coordinates for each neighborhood
for i, row in neighborhoods_df.iterrows():
    address=row["Neighborhood"]
    address=address.replace(" ", "+")
    lat, lng=get_lat_lng(address)
    row["Latitude"]=lat
    row["Longitude"]=lng

In [5]:
#Check out our new dataframe

neighborhoods_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Allegheny Center,40.4538,-80.0074
1,Allegheny West,40.452,-80.0158
2,Allentown,40.4223,-79.9934
3,Arlington,40.4153,-79.971
4,Arlington Heights,40.4169,-79.9598


In [115]:
#Now we will use our neighborhoods data to get a map of the neighborhoods in Pittsburgh

from geopy.geocoders import Nominatim
import folium

#Create the map
address="Pittsburgh, PA"
geolocator=Nominatim()
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude

map_pittsburgh=folium.Map(location=[latitude, longitude], zoom_start=12)

#Add the neighborhoods to the map
for lat, lng, neighborhood in zip(neighborhoods_df['Latitude'], neighborhoods_df['Longitude'], neighborhoods_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_pittsburgh)  
    
map_pittsburgh

#### Methodology

Using the dataframe of neighborhoods and their respective coordinates from above, we can now use the Foursquare API in order to explore each neighborhood of Pittsburgh and find out what sort of establishments each have an abundance of. Through this, we can see what sort of neighborhood each is. Are they more abundant in coffeehouses and restaurants, or are they more about bars and storefronts? This will give us a good idea what to expect in each. We can also use the Folium library to cluster the neighborhoods together to see if there is a way to categorize each part of the city better.

In [7]:
#Now to use Foursquare to explore the neighbourhoods
CLIENT_ID = 'X50W44SZPJ3ZBHDIGOIRZEVFJD1VDS2CQZ0AEZ0RSM3YDRJ5' # your Foursquare ID
CLIENT_SECRET = 'POEBFXZSHEDGN1VFUTURXYHLAIV0CK5HEPJBBJI3RS251WAX'
VERSION = '20180605' # Foursquare API version

In [8]:
#To make it easy, we will make the neighborhood name the index
pittsburgh_data=neighborhoods_df

In [86]:
#As an example, let's start out by looking at South Oakland
neighborhood=neighborhoods_df.loc[71, "Neighborhood"]
neighborhood_latitude=neighborhoods_df.loc[71, "Latitude"]
neighborhood_longitude=neighborhoods_df.loc[71, "Longitude"]

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 550 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#Sends the get request for the venue info
results = requests.get(url).json()

In [87]:
#Lets define a function that extracts the category of the venue from the Foursquare API
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [88]:
#Lets look at what South Oakland has around it
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Hofbräuhaus Pittsburgh,German Restaurant,40.428756,-79.964508
1,Papa da Vinci,Pizza Place,40.434613,-79.955477
2,Hyatt House Pittsburgh -South Side,Hotel,40.428371,-79.963934
3,Pittsburgh Playhouse,Theater,40.436264,-79.962567
4,Panera Bread,Bakery,40.436272,-79.958116
5,Hampton Inn Pittsburgh Univercity Center,Hotel,40.435989,-79.963085
6,Bridgeside Point,Parking,40.428868,-79.957691
7,Local,American Restaurant,40.42882,-79.955634
8,Southside Works Ampitheater & Landing,Park,40.429484,-79.964812
9,Trail,Trail,40.427798,-79.962988


In [89]:
#Lets define a function to get the nearby venues for a given neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=550):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
#Now to run the analysis on all the neighbourhoods to get a dataframe with all venues

pittsburgh_venues = getNearbyVenues(names=pittsburgh_data['Neighborhood'],
                                   latitudes=pittsburgh_data['Latitude'],
                                   longitudes=pittsburgh_data['Longitude']
                                  )

Allegheny Center
Allegheny West
Allentown
Arlington
Arlington Heights
Banksville
Bedford Dwellings
Beechview
Beltzhoover
Bloomfield
Bluff/Uptown/Soho
Bon Air
Brighton Heights
Brookline
California-Kirkbride
Carrick
Central Business District/Downtown/"The Golden Triangle"
Central Business District
Central Lawrenceville
Central Northside
Central Oakland
Chartiers
Chateau
Crafton Heights
Crawford-Roberts
Duquesne Heights
East Allegheny
East Carnegie
East Hills
East Liberty
Elliot
Esplen
Fairywood
Fineview
Friendship
Garfield
Glen Hazel
Greenfield
Hays
Hazelwood
Highland Park
Homewood North
Homewood South
Homewood West
Knowville
Larimer
Lincoln-Lemington-Belmar
Lincoln Place
Lower Lawrenceville
Manchester
Marshall-Shadeland/Brightwood/Woods Run
Middle Hill
Morningside
Mount Oliver
Mount Washington
New Homestead
North Oakland
North Point Breeze
North Shore
Northview Heights
Oakwood
Overbrook
Perry North/Observatory Hill
Perry South/Perry Hilltop
Point Breeze
Polish Hill
Regent Square
Ridgemo

In [92]:
#Lets look at how many venues we have
print(pittsburgh_venues.shape)
pittsburgh_venues.head()

(1385, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allegheny Center,40.453786,-80.007377,Children's Museum of Pittsburgh,40.452793,-80.006569,Museum
1,Allegheny Center,40.453786,-80.007377,National Aviary,40.453154,-80.010049,Zoo
2,Allegheny Center,40.453786,-80.007377,El Burro,40.45586,-80.006689,Mexican Restaurant
3,Allegheny Center,40.453786,-80.007377,Federal Galley,40.451605,-80.006045,Comfort Food Restaurant
4,Allegheny Center,40.453786,-80.007377,Brugge On North,40.4554,-80.007478,Belgian Restaurant


In [93]:
#How many unique venues do we have here
print('There are {} uniques categories.'.format(len(pittsburgh_venues['Venue Category'].unique())))

There are 228 uniques categories.


### Analyzing Pittsburgh

Now we will use one hot encoding in order to analyze Pittsburgh neighborhoods and put them into corresponding clusters based on the similarities of their attractions. I chose to split them into 10 clusters in order to allow the large number of neighborhoods to not be too constrained.

In [94]:
#To analyze the venues returned in Pittsburgh, we will be using one-hot encoding
pittsburgh_onehot = pd.get_dummies(pittsburgh_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
pittsburgh_onehot['Neighborhood'] = pittsburgh_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [pittsburgh_onehot.columns[-1]] + list(pittsburgh_onehot.columns[:-1])
pittsburgh_onehot = pittsburgh_onehot[fixed_columns]

pittsburgh_onehot.head()

Unnamed: 0,Zoo Exhibit,ATM,Adult Boutique,Airport,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [95]:
pittsburgh_onehot.shape

(1385, 228)

In [96]:
#Now we will group the data by neighborhood and look at the frequency of each venue type
pittsburgh_grouped=pittsburgh_onehot.groupby("Neighborhood").mean().reset_index()
pittsburgh_grouped.head()

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Adult Boutique,Airport,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,...,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Allegheny Center,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641
1,Allegheny West,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478
2,Allentown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arlington,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Arlington Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [97]:
#We will now make a new dataframe with the venue frequencies
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [98]:
#We will put the top 10 venues found in each neighborhood
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = pittsburgh_grouped['Neighborhood']

for ind in np.arange(pittsburgh_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(pittsburgh_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allegheny Center,Park,Deli / Bodega,Gym,Liquor Store,Coffee Shop,Exhibit,Pizza Place,Burger Joint,Café,Sculpture Garden
1,Allegheny West,American Restaurant,Sandwich Place,Fast Food Restaurant,Zoo,Thai Restaurant,Pub,Pharmacy,Diner,Discount Store,Restaurant
2,Allentown,Italian Restaurant,Trail,Deli / Bodega,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Farm,Farmers Market,Fast Food Restaurant,Field
3,Arlington,Speakeasy,American Restaurant,Baseball Field,Grocery Store,Farm,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop
4,Arlington Heights,Construction & Landscaping,Baseball Field,Trail,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market


In [99]:
#We will now cluster the neighborhoods based on similarities
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 10

pittsburgh_grouped_clustering = pittsburgh_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(pittsburgh_grouped_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

pittsburgh_merged = pittsburgh_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
pittsburgh_merged = pittsburgh_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

pittsburgh_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allegheny Center,40.4538,-80.0074,1.0,Park,Deli / Bodega,Gym,Liquor Store,Coffee Shop,Exhibit,Pizza Place,Burger Joint,Café,Sculpture Garden
1,Allegheny West,40.452,-80.0158,1.0,American Restaurant,Sandwich Place,Fast Food Restaurant,Zoo,Thai Restaurant,Pub,Pharmacy,Diner,Discount Store,Restaurant
2,Allentown,40.4223,-79.9934,1.0,Italian Restaurant,Trail,Deli / Bodega,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Farm,Farmers Market,Fast Food Restaurant,Field
3,Arlington,40.4153,-79.971,0.0,Speakeasy,American Restaurant,Baseball Field,Grocery Store,Farm,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop
4,Arlington Heights,40.4169,-79.9598,0.0,Construction & Landscaping,Baseball Field,Trail,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market


In [100]:
#Ensure there are no NaN in the clusters
i=0
for row in pittsburgh_merged['Cluster Labels']:
    try:
        row=int(row)
    except:
        pittsburgh_merged['Cluster Labels'][i]=0
    i=i+1
        
print(pittsburgh_merged['Cluster Labels'])

0     1.0
1     1.0
2     1.0
3     0.0
4     0.0
5     5.0
6     1.0
7     5.0
8     5.0
9     1.0
10    1.0
11    5.0
12    1.0
13    5.0
14    8.0
15    1.0
16    1.0
17    1.0
18    5.0
19    1.0
20    1.0
21    2.0
22    1.0
23    0.0
24    1.0
25    0.0
26    1.0
27    5.0
28    5.0
29    1.0
     ... 
61    0.0
62    5.0
63    5.0
64    1.0
65    1.0
66    5.0
67    7.0
68    0.0
69    1.0
70    5.0
71    1.0
72    0.0
73    1.0
74    5.0
75    2.0
76    5.0
77    1.0
78    1.0
79    5.0
80    1.0
81    0.0
82    9.0
83    1.0
84    5.0
85    1.0
86    1.0
87    5.0
88    1.0
89    0.0
90    3.0
Name: Cluster Labels, Length: 91, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys


In [116]:
#Time to visualize the clusters on the map
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pittsburgh_merged['Latitude'], pittsburgh_merged['Longitude'], pittsburgh_merged['Neighborhood'], pittsburgh_merged['Cluster Labels']):
    cluster=int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Results

Now we will take a look at each cluster in turn and determine any commonalities, unique aspects, or patterns.

In [102]:
#Now to analyze each cluster and notice the commonalities
#Cluster 1
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Arlington,40.4153,-79.971,0.0,Speakeasy,American Restaurant,Baseball Field,Grocery Store,Farm,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop
4,Arlington Heights,40.4169,-79.9598,0.0,Construction & Landscaping,Baseball Field,Trail,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market
23,Crafton Heights,40.4462,-80.0507,0.0,American Restaurant,Baseball Field,Bar,Zoo,Fabric Shop,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market
25,Duquesne Heights,40.4351,-80.0202,0.0,American Restaurant,Seafood Restaurant,Scenic Lookout,Italian Restaurant,Bakery,Wings Joint,Breakfast Spot,Outdoor Sculpture,New American Restaurant,Tunnel
33,Fineview,40.4636,-80.0046,0.0,Golf Course,Theme Park,Wine Bar,Moving Target,American Restaurant,Baseball Field,Field,Farm,Farmers Market,Fast Food Restaurant
46,Lincoln-Lemington-Belmar,40.4734,-79.8997,0.0,,,,,,,,,,
55,New Homestead,40.3896,-79.9221,0.0,,,,,,,,,,
61,Overbrook,40.3863,-80.0004,0.0,Garden Center,Liquor Store,Bar,Baseball Field,American Restaurant,Trail,Farmers Market,Fast Food Restaurant,Field,Zoo
68,Saint Clair,40.4091,-79.9724,0.0,,,,,,,,,,
72,Southsore,40.4373,-80.0137,0.0,Scenic Lookout,American Restaurant,Fountain,Seafood Restaurant,Italian Restaurant,Soccer Stadium,New American Restaurant,History Museum,Field,Fish & Chips Shop


In [103]:
#Cluster 2
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allegheny Center,40.4538,-80.0074,1.0,Park,Deli / Bodega,Gym,Liquor Store,Coffee Shop,Exhibit,Pizza Place,Burger Joint,Café,Sculpture Garden
1,Allegheny West,40.452,-80.0158,1.0,American Restaurant,Sandwich Place,Fast Food Restaurant,Zoo,Thai Restaurant,Pub,Pharmacy,Diner,Discount Store,Restaurant
2,Allentown,40.4223,-79.9934,1.0,Italian Restaurant,Trail,Deli / Bodega,Vegetarian / Vegan Restaurant,Discount Store,Coffee Shop,Farm,Farmers Market,Fast Food Restaurant,Field
6,Bedford Dwellings,40.4503,-79.9745,1.0,Performing Arts Venue,Field,Bus Line,Coffee Shop,Gym / Fitness Center,Park,Ice Cream Shop,Fish & Chips Shop,Fish Market,Exhibit
9,Bloomfield,40.4622,-79.9445,1.0,Grocery Store,Bar,Pizza Place,Bookstore,Sandwich Place,Italian Restaurant,New American Restaurant,Art Gallery,Coffee Shop,Food Truck
10,Bluff/Uptown/Soho,40.4376,-79.9826,1.0,Pizza Place,Rental Car Location,Sandwich Place,Gas Station,Bar,Bank,Coffee Shop,Café,College Basketball Court,Grocery Store
12,Brighton Heights,40.4828,-80.0367,1.0,Pharmacy,Bus Station,Miscellaneous Shop,Wings Joint,Market,Pizza Place,Seafood Restaurant,Fabric Shop,Farm,Farmers Market
15,Carrick,40.3954,-79.9892,1.0,Sandwich Place,Gym,Bank,Breakfast Spot,Salon / Barbershop,Auto Garage,Donut Shop,Coffee Shop,Chinese Restaurant,Pharmacy
16,"Central Business District/Downtown/""The Golden...",40.4418,-80.0004,1.0,Hotel,Coffee Shop,Italian Restaurant,Pizza Place,Sandwich Place,Theater,Bar,American Restaurant,Restaurant,Mexican Restaurant
17,Central Business District,40.4418,-80.0004,1.0,Hotel,Coffee Shop,Italian Restaurant,Pizza Place,Sandwich Place,Theater,Bar,American Restaurant,Restaurant,Mexican Restaurant


In [104]:
#Cluster 3
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Chartiers,40.4594,-80.0717,2.0,Baseball Field,Ice Cream Shop,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop
49,Manchester,40.4552,-80.0241,2.0,Baseball Field,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field
75,Spring Garden,40.4742,-79.9885,2.0,Baseball Field,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field


In [106]:
#Cluster 4
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
90,Windgap,40.4565,-80.0862,3.0,Gas Station,Zoo,Food Truck,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant


In [107]:
#Cluster 5
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 4]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,Hays,40.4031,-79.9612,4.0,American Restaurant,Zoo,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field


In [108]:
#Cluster 6
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 5]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Banksville,40.4099,-80.0395,5.0,Park,Print Shop,Zoo,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
7,Beechview,40.4135,-80.0228,5.0,Bar,Playground,Supermarket,Taco Place,Light Rail Station,Park,Zoo,Fabric Shop,Food,Flower Shop
8,Beltzhoover,40.4157,-80.0046,5.0,Metro Station,Tennis Court,Park,Zoo,Exhibit,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop
11,Bon Air,40.4077,-79.999,5.0,Home Service,Food,Light Rail Station,Park,Zoo,Event Space,Flower Shop,Fish Market,Fish & Chips Shop,Field
13,Brookline,40.3974,-80.0116,5.0,Dance Studio,Soup Place,Lawyer,Zoo,Exhibit,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop
18,Central Lawrenceville,40.4732,-79.9556,5.0,Bar,Pub,BBQ Joint,Diner,Dive Bar,Ramen Restaurant,Restaurant,Sandwich Place,Seafood Restaurant,Middle Eastern Restaurant
27,East Carnegie,40.419,-80.0717,5.0,Pub,Recreation Center,Restaurant,Liquor Store,Bar,Zoo,Event Space,Flower Shop,Fish Market,Fish & Chips Shop
28,East Hills,40.4549,-79.8759,5.0,Seafood Restaurant,Park,Discount Store,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
30,Elliot,40.4402,-80.0395,5.0,Italian Restaurant,Pizza Place,Bar,Baseball Field,Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Zoo
32,Fairywood,40.4491,-80.0842,5.0,Women's Store,Moving Target,Dog Run,Zoo,Exhibit,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop


In [110]:
#Cluster 7
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 6]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,Highland Park,40.4799,-79.9165,6.0,Zoo Exhibit,Park,American Restaurant,Gift Shop,Playground,Salon / Barbershop,Aquarium,Arts & Crafts Store,Bakery,Bar
52,Morningside,40.4868,-79.9235,6.0,Zoo Exhibit,Tanning Salon,American Restaurant,Bar,Burger Joint,Convenience Store,Fabric Shop,Fast Food Restaurant,Gift Shop,Ice Cream Shop


In [111]:
#Cluster 8
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 7]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Glen Hazel,40.4066,-79.9291,7.0,Gym / Fitness Center,Zoo,Event Space,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant
67,Ridgemont,40.4282,-80.0325,7.0,Intersection,Gym / Fitness Center,Zoo,Event Space,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field


In [112]:
#Cluster 9
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 8]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,California-Kirkbride,40.46,-80.0214,8.0,Comedy Club,Bakery,Zoo,Exhibit,Food Court,Food & Drink Shop,Food,Flower Shop,Fish Market,Fish & Chips Shop


In [113]:
#Cluster 10
pittsburgh_merged.loc[pittsburgh_merged['Cluster Labels'] == 9]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
82,Swisshelm Park,40.4198,-79.9067,9.0,Playground,Department Store,Scenic Lookout,Zoo,Event Space,Food,Flower Shop,Fish Market,Fish & Chips Shop,Field


#### Discussion

Through this analysis, we have found a large variety of attractions to bring you to each part of the city. In order to categorize them better, though, we might try and narrow down what makes each cluster unique. To analyze each cluster, we will take a look at the most common venues in each cluster and see if that tells us something about what kind of neighborhoods they are.

Cluster 1: American restaurants, Zoos, Other outdoor activities
Cluster 2: Italian, Pizza, Coffee shops
Cluster 3: Zoo, Food Court
Cluster 4: Zoo, Food trucks
Cluster 5: American Restaurant, Zoo
Cluster 6: Bars, Parks, Home Service
Cluster 7: Zoo, American Restaurant
Cluster 8: Gym, Zoo, Food and drink shop
Cluster 9: Comedy Club, Bakery, Flower shops
Cluster 10: Playground, Department stores

Looking at the clusters this way, the first thing I notice is clusters 3 and 4 are quite similar, as are clusters 5 and 7. So we could essentially group these together if we would like. Once that is done, we notice certain themes among the clusters. For example, if you like American food, perhaps a neighborhood in cluster 1, 5, or 7 might suit you. If you like being close to outdoor attractions, maybe cluster 1 might be for you. While this approach isn't exact in separating the neighborhoods, it does give you a unique look at the city. 

There are a few things that could be changed in future iterations of this to possibly give better results. In future attempts, it might be of interest to vary the radius that we look at in order to prevent overlap between the neighborhoods. As of now, a radius of 550 was needed in order to prevent some of the outlying neighborhoods to return with no venues, but this caused the more densely packed venues to return a large number of venues. There might also be some tweaks that could be interesting as to how the venues are categorized. Maybe categorizing the venues differently could give us a glimpse as to which neighborhoods are more "party oriented", which are "foodie heavens", or which are "shoppalholic dreams", for example.

### Conclusion

Using the Foursquare API, we can get a unique look at different neighborhoods of a city in order to determine the unique attractions each have to offer. We can utilize it as an interesting tool in determining which neighborhood might be better suited to the needs of an incoming home owner. For me, I found Cluster 2 to be the most enticing, and while looking through I settled on Lower Lawrenceville as my favorite option. Its varied selection of foods and attractions made it quite the choice. There are certainly some shortcomings to this technique, such as overlapping and categorizing, but I think it posed a great new way to look at cities. I believe this approach could certainly help a home owner choose the right place to live!