 # Capstone Project - The Battle of Neighborhoods

## Import Libraries

In this section we import the libraries that will be required to process the data.

The first library is Pandas.
Pandas is an open source, BSD-licensed library, providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import urllib.request
import json
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
%matplotlib inline
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Libraries imported.


## Download and Explore Dataset


Download and Explore Dataset
Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Luckily, this dataset exists for free on the web. Feel free to try to find this dataset on your own, but here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

For your convenience, I downloaded the files and placed it on the server, so you can simply run a wget command and access the data. So let's go ahead and do that.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


#### Tranform the data into a *pandas* dataframe

In [3]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [4]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [5]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.

In [6]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [7]:
import folium
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Borough'], manhattan_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Foursquare venues


In [8]:
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',  
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)

In [9]:
LIMIT = 500 
radius = 5000 
CLIENT_ID = 'WZMYICZ524J4C5V3SS1FBKOUAZDFHYBD22VLCXXVJWMHXNDE'
CLIENT_SECRET = 'LMANKPL0LPSYV5TXLJBARONUURG5THXBMYTSTDZGX2YFB5IC'
VERSION = '20181020'

In [10]:
#https://developer.foursquare.com/docs/resources/categories

neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
newyork_venues_NightLifeSpots = getNearbyVenues(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'], radius=1000, categoryIds='4d4b7105d754a06376d81259')
newyork_venues_NightLifeSpots.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Applebee's Grill + Bar,40.873685,-73.908928,American Restaurant
1,Marble Hill,40.876551,-73.91066,Rowes Wharf Bar,40.8832,-73.91,Bar
2,Marble Hill,40.876551,-73.91066,Irish Eyes,40.868928,-73.917465,Pub
3,Marble Hill,40.876551,-73.91066,Indian Road Café,40.872922,-73.918459,Café
4,Marble Hill,40.876551,-73.91066,Beach Walk at Sea Bright,40.8867,-73.9116,Beach Bar


In [11]:
newyork_venues_NightLifeSpots.shape

(1986, 7)

In [12]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Neighborhood'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [13]:
map_newyork_NightLifeSpots = folium.Map(location=[latitude, longitude], zoom_start=10)
addToMap(newyork_venues_NightLifeSpots, 'red', map_newyork_NightLifeSpots)

map_newyork_NightLifeSpots

In [14]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Neighborhood').count()
    
    for n in startDf['Neighborhood']:
        try:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = 0

In [15]:
manhattan_grouped = newyork_venues_NightLifeSpots.groupby('Neighborhood').count()
manhattan_grouped
#print('There are {} uniques categories.'.format(len(newyork_venues_NightLifeSpots['Venue Category'].unique())))

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,50,50,50,50,50,50
Carnegie Hill,50,50,50,50,50,50
Central Harlem,50,50,50,50,50,50
Chelsea,50,50,50,50,50,50
Chinatown,50,50,50,50,50,50
Civic Center,50,50,50,50,50,50
Clinton,50,50,50,50,50,50
East Harlem,47,47,47,47,47,47
East Village,50,50,50,50,50,50
Financial District,50,50,50,50,50,50


## 3. Analyze Each Neighborhood

In [16]:
# one hot encoding
manhattan_onehot = pd.get_dummies(newyork_venues_NightLifeSpots[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = newyork_venues_NightLifeSpots['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Arepa Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bar,Beach Bar,Beer Bar,Beer Garden,Beer Store,Bookstore,Bowling Alley,Brewery,Burger Joint,Burrito Place,Café,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Cuban Restaurant,Diner,Dive Bar,English Restaurant,French Restaurant,Fried Chicken Joint,Gastropub,Gay Bar,General Entertainment,German Restaurant,Greek Restaurant,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lounge,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Music Venue,New American Restaurant,Nightclub,Nightlife Spot,Other Nightlife,Pet Café,Piano Bar,Pizza Place,Pool,Pub,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Sake Bar,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Smoke Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sports Bar,Steakhouse,Strip Club,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tiki Bar,Wedding Hall,Whisky Bar,Wine Bar,Wine Shop
0,Marble Hill,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [17]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Arepa Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bar,Beach Bar,Beer Bar,Beer Garden,Beer Store,Bookstore,Bowling Alley,Brewery,Burger Joint,Burrito Place,Café,Chinese Restaurant,Cocktail Bar,Coffee Shop,Comedy Club,Cuban Restaurant,Diner,Dive Bar,English Restaurant,French Restaurant,Fried Chicken Joint,Gastropub,Gay Bar,General Entertainment,German Restaurant,Greek Restaurant,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lounge,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Music Venue,New American Restaurant,Nightclub,Nightlife Spot,Other Nightlife,Pet Café,Piano Bar,Pizza Place,Pool,Pub,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Sake Bar,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Smoke Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sports Bar,Steakhouse,Strip Club,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tiki Bar,Wedding Hall,Whisky Bar,Wine Bar,Wine Shop
0,Battery Park City,0.08,0.0,0.0,0.0,0.0,0.0,0.24,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0
1,Carnegie Hill,0.02,0.0,0.0,0.0,0.0,0.0,0.36,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.08,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.08,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0
2,Central Harlem,0.06,0.0,0.02,0.0,0.0,0.02,0.18,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.1,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0
3,Chelsea,0.04,0.0,0.0,0.02,0.0,0.0,0.16,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.14,0.04,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.04,0.08,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.04,0.02,0.0,0.0,0.04,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
4,Chinatown,0.06,0.0,0.0,0.02,0.02,0.0,0.1,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.18,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.08,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0
5,Civic Center,0.02,0.0,0.0,0.02,0.0,0.0,0.16,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.18,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0
6,Clinton,0.0,0.02,0.0,0.0,0.0,0.0,0.22,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.12,0.04,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,0.1,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.12,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0
7,East Harlem,0.021277,0.0,0.0,0.0,0.0,0.0,0.234043,0.0,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.106383,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.170213,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.148936,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277
8,East Village,0.06,0.0,0.0,0.0,0.0,0.0,0.22,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.1,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
9,Financial District,0.04,0.0,0.0,0.0,0.0,0.0,0.28,0.0,0.02,0.04,0.0,0.02,0.0,0.0,0.04,0.0,0.02,0.0,0.14,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Bar,Cocktail Bar,American Restaurant,Pub,Beer Garden,Hotel Bar,Burger Joint,Wine Bar,Gastropub,Café
1,Carnegie Hill,Bar,Cocktail Bar,Pub,Sports Bar,Coffee Shop,Wine Bar,New American Restaurant,Roof Deck,Indian Restaurant,Residential Building (Apartment / Condo)
2,Central Harlem,Bar,Lounge,Cocktail Bar,Other Nightlife,Hookah Bar,American Restaurant,Tapas Restaurant,Wine Bar,Beer Bar,Nightclub
3,Chelsea,Bar,Cocktail Bar,Gay Bar,Nightclub,American Restaurant,Lounge,Mediterranean Restaurant,New American Restaurant,Gastropub,Coffee Shop
4,Chinatown,Cocktail Bar,Dive Bar,Bar,Lounge,Beer Bar,Hotel Bar,American Restaurant,Gastropub,Sports Bar,Karaoke Bar


Cluster Neighborhoods


In [20]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 1, 3, 4, 4, 3, 1, 0, 2], dtype=int32)

In [21]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Bar,Pub,Other Nightlife,Nightclub,Nightlife Spot,Lounge,Beach Bar,Cocktail Bar,Sports Bar,Speakeasy
1,Manhattan,Chinatown,40.715618,-73.994279,4,Cocktail Bar,Dive Bar,Bar,Lounge,Beer Bar,Hotel Bar,American Restaurant,Gastropub,Sports Bar,Karaoke Bar
2,Manhattan,Washington Heights,40.851903,-73.9369,1,Bar,Lounge,Other Nightlife,Nightclub,Restaurant,Speakeasy,Cocktail Bar,Karaoke Bar,Tapas Restaurant,Wine Bar
3,Manhattan,Inwood,40.867684,-73.92121,1,Bar,Lounge,Wine Bar,Cocktail Bar,Other Nightlife,Nightclub,Pub,Hookah Bar,Café,Sports Bar
4,Manhattan,Hamilton Heights,40.823604,-73.949688,1,Bar,Other Nightlife,Cocktail Bar,Lounge,Speakeasy,Wine Bar,Hotel Bar,Nightclub,Pub,Beer Bar


In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [23]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Murray Hill,Bar,American Restaurant,Lounge,Cocktail Bar,Gastropub,Speakeasy,Irish Pub,Hotel Bar,Pub,Mexican Restaurant
19,East Village,Bar,Cocktail Bar,Pub,American Restaurant,Dive Bar,Speakeasy,Lounge,Café,Gastropub,Nightclub
27,Gramercy,Bar,American Restaurant,Speakeasy,Cocktail Bar,Gastropub,Mexican Restaurant,Lounge,Pub,Hookah Bar,Sake Bar
28,Battery Park City,Bar,Cocktail Bar,American Restaurant,Pub,Beer Garden,Hotel Bar,Burger Joint,Wine Bar,Gastropub,Café
34,Sutton Place,Bar,Hotel Bar,Cocktail Bar,Lounge,Beer Bar,American Restaurant,Pub,Beer Garden,Sports Bar,Restaurant
35,Turtle Bay,Bar,American Restaurant,Pub,Cocktail Bar,Beer Garden,Lounge,Café,Seafood Restaurant,Italian Restaurant,Hotel
36,Tudor City,Bar,American Restaurant,Cocktail Bar,Pub,Korean Restaurant,Seafood Restaurant,Jazz Club,Beer Bar,Hotel,Italian Restaurant
38,Flatiron,Bar,American Restaurant,Cocktail Bar,Italian Restaurant,Nightclub,Gastropub,Pub,Sports Bar,Speakeasy,Lounge


In [24]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Bar,Pub,Other Nightlife,Nightclub,Nightlife Spot,Lounge,Beach Bar,Cocktail Bar,Sports Bar,Speakeasy
2,Washington Heights,Bar,Lounge,Other Nightlife,Nightclub,Restaurant,Speakeasy,Cocktail Bar,Karaoke Bar,Tapas Restaurant,Wine Bar
3,Inwood,Bar,Lounge,Wine Bar,Cocktail Bar,Other Nightlife,Nightclub,Pub,Hookah Bar,Café,Sports Bar
4,Hamilton Heights,Bar,Other Nightlife,Cocktail Bar,Lounge,Speakeasy,Wine Bar,Hotel Bar,Nightclub,Pub,Beer Bar
5,Manhattanville,Bar,Lounge,Other Nightlife,Cocktail Bar,Pub,Wine Bar,Hookah Bar,Strip Club,Beer Garden,Dive Bar
6,Central Harlem,Bar,Lounge,Cocktail Bar,Other Nightlife,Hookah Bar,American Restaurant,Tapas Restaurant,Wine Bar,Beer Bar,Nightclub
7,East Harlem,Bar,Lounge,Other Nightlife,Cocktail Bar,Beer Bar,Wine Shop,Residential Building (Apartment / Condo),Beer Garden,Diner,Gastropub
11,Roosevelt Island,Bar,Cocktail Bar,Wine Bar,Pub,Beer Garden,Other Nightlife,Burger Joint,Café,Restaurant,Dive Bar
12,Upper West Side,Bar,Wine Bar,Dive Bar,Italian Restaurant,Pub,Speakeasy,Sports Bar,American Restaurant,Lounge,Brewery
25,Manhattan Valley,Bar,Other Nightlife,Sports Bar,Speakeasy,Lounge,Cocktail Bar,Whisky Bar,Mexican Restaurant,Wine Bar,Diner


In [25]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,Bar,Cocktail Bar,Pub,Sports Bar,Italian Restaurant,Wine Bar,New American Restaurant,Burger Joint,Roof Deck,Ice Cream Shop
9,Yorkville,Bar,Pub,Cocktail Bar,Wine Bar,Coffee Shop,New American Restaurant,Sports Bar,Italian Restaurant,Piano Bar,Diner
10,Lenox Hill,Bar,Cocktail Bar,Pub,Gastropub,Hotel Bar,Wine Bar,Restaurant,Beer Garden,Burger Joint,Italian Restaurant
29,Financial District,Bar,Cocktail Bar,Pub,Hotel Bar,American Restaurant,Beer Garden,Mexican Restaurant,Burger Joint,Hotel,Mediterranean Restaurant
30,Carnegie Hill,Bar,Cocktail Bar,Pub,Sports Bar,Coffee Shop,Wine Bar,New American Restaurant,Roof Deck,Indian Restaurant,Residential Building (Apartment / Condo)
37,Stuyvesant Town,Bar,Cocktail Bar,Speakeasy,Wine Bar,Mexican Restaurant,Hookah Bar,Gay Bar,Dive Bar,Pub,American Restaurant


In [26]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Lincoln Square,Bar,Gay Bar,American Restaurant,Hotel,Cocktail Bar,Pub,Lounge,Mexican Restaurant,Hotel Bar,Burrito Place
14,Clinton,Bar,Cocktail Bar,Lounge,Gay Bar,Coffee Shop,Sports Bar,Bowling Alley,English Restaurant,Pub,Gastropub
15,Midtown,Bar,Cocktail Bar,Lounge,Gay Bar,American Restaurant,Hotel,Burger Joint,Bowling Alley,Sports Bar,Japanese Restaurant
17,Chelsea,Bar,Cocktail Bar,Gay Bar,Nightclub,American Restaurant,Lounge,Mediterranean Restaurant,New American Restaurant,Gastropub,Coffee Shop
33,Midtown South,Bar,American Restaurant,Lounge,Cocktail Bar,Sports Bar,Coffee Shop,Nightclub,Bowling Alley,Tiki Bar,Seafood Restaurant
39,Hudson Yards,Bar,Cocktail Bar,Lounge,Pub,Hotel Bar,Bowling Alley,Nightclub,Gay Bar,Restaurant,Dive Bar


In [27]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Cocktail Bar,Dive Bar,Bar,Lounge,Beer Bar,Hotel Bar,American Restaurant,Gastropub,Sports Bar,Karaoke Bar
18,Greenwich Village,Cocktail Bar,Lounge,American Restaurant,Bar,Burger Joint,Beer Bar,Italian Restaurant,Pub,Hotel Bar,New American Restaurant
20,Lower East Side,Cocktail Bar,Bar,Dive Bar,Lounge,Speakeasy,American Restaurant,Café,Beer Bar,Gastropub,Hotel Bar
21,Tribeca,Cocktail Bar,Bar,Burger Joint,Pub,Speakeasy,Italian Restaurant,Wine Bar,New American Restaurant,Hotel Bar,American Restaurant
22,Little Italy,Cocktail Bar,Dive Bar,American Restaurant,Beer Bar,Hotel Bar,Lounge,Bar,Gastropub,Sports Bar,Rock Club
23,Soho,Cocktail Bar,Lounge,Hotel Bar,Dive Bar,Bar,Beer Bar,American Restaurant,Speakeasy,Wine Bar,Gastropub
24,West Village,Cocktail Bar,Bar,Italian Restaurant,New American Restaurant,Gay Bar,Speakeasy,Burger Joint,French Restaurant,Nightclub,Roof Deck
31,Noho,Cocktail Bar,Bar,American Restaurant,Lounge,Dive Bar,Beer Bar,Hotel Bar,Gastropub,Rock Club,Italian Restaurant
32,Civic Center,Cocktail Bar,Bar,Burger Joint,Wine Bar,Pub,Beer Garden,Dive Bar,Hotel Bar,Hotel,Karaoke Bar
