## Battle of the Neighborhoods

### Introduction

A while ago, Amazon dropped the plans for New York Headquarters. The choice of a headquarter location is a result of serious and comprehensive consideration. One important aspect is whether the location is attractive to labors needed. 

Based on the assumption that an XYZ company is looking for a new headquarter location in either Toronto, Canada or New York City, New York. This project will focus on the similarities and dissimilarities between certain neighborhoods in the two cities, especially conerning living quality of potential employees, and determine which neighborhood is a better choice. 

### Data and Methodology

To help making the decision, we will need neighborhood data for both cities. The data are available from
* New York: https://geo.nyu.edu/catalog/nyu_2451_34572
* Toronto: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and http://cocl.us/Geospatial_data

A second source of data is the Foursquare data. 

By segmentation and clustering, we will compare the similarity and dissimilarity of both cities and provide classification information for decision.

#### Gathering required libraries

In [22]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim

import requests # library to handle requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
# !conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Gathering data for both  cities

In [12]:
# New York
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

# Toronto
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(url).text
raw_data = BeautifulSoup(source, 'lxml')

Data downloaded!


#### Forming dataframe for both cities

In [14]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
newyork = pd.DataFrame(columns = column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    newyork = newyork.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
newyork.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [15]:
column_names = ['PostalCode', 'Borough', 'Neighborhood']
toronto = pd.DataFrame(columns = column_names)

table = raw_data.find("table")
table_rows = table.tbody.find_all("tr")

for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]
    if row != [] and row[1] != "Not assigned":
        row[2] = row[2].replace("\n",'')
        if row[2] == "Not assigned":
            row[2] = row[1]
        toronto = toronto.append({'PostalCode':row[0],'Borough':row[1], 'Neighborhood':row[2]}, ignore_index = True)
toronto = toronto.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()

geo_coord = pd.read_csv('http://cocl.us/Geospatial_data')

toronto = toronto.merge(geo_coord, left_on='PostalCode', right_on = 'Postal Code')
toronto.drop("Postal Code", axis = 1, inplace = True)
toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
new_york_borough = newyork['Borough'].unique()
toronto_borough = toronto['Borough'].unique()

In [17]:
new_york_borough

array(['Bronx', 'Manhattan', 'Brooklyn', 'Queens', 'Staten Island'],
      dtype=object)

In [19]:
toronto_borough

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

Now focus on Bronx in New York and Scarborough in Toronto.

In [20]:
bronx = newyork[newyork['Borough'] == 'Bronx'].reset_index(drop=True)
scarborough = toronto[toronto['Borough'] == 'Scarborough'].reset_index(drop=True)

Let's do maps.

In [25]:
n_address = 'Bronx, NY'

geolocator = Nominatim(user_agent="ny_explorer")
n_location = geolocator.geocode(n_address)
n_latitude = location.latitude
n_longitude = location.longitude
print('The geograpical coordinate of Bronx are {}, {}.'.format(n_latitude, n_longitude))

The geograpical coordinate of Bronx are 40.85048545, -73.8404035580209.


In [26]:
map_bronx = folium.Map(location=[n_latitude, n_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(bronx['Latitude'], bronx['Longitude'], bronx['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bronx)  
    
map_bronx

In [27]:
s_latitude = 43.773077
s_longitude = -79.257774
map_scarb = folium.Map(location=[s_latitude, s_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(scarborough['Latitude'], scarborough['Longitude'], scarborough['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarb)  
    
map_scarb

#### Foursquare Credentials and Versions
Will be deleted for the purpose of privacy.

In [28]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Borrow the **get_category_type** function from the Foursquare lab.

In [29]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
# function that will loop through all the neighborhoods.
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
LIMIT = 100
radius = 500

#### Bronx neighborhood information

In [31]:
bronx_venues = getNearbyVenues(names=bronx['Neighborhood'],
                                   latitudes=bronx['Latitude'],
                                   longitudes=bronx['Longitude']
                                  )
print(bronx_venues.shape)
bronx_venues.head()

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Claremont Village
Concourse Village
Mount Eden
Mount Hope
Bronxdale
Allerton
Kingsbridge Heights
(1226, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant
4,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station


In [33]:
bronx_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,29,29,29,29,29,29
Baychester,22,22,22,22,22,22
Bedford Park,34,34,34,34,34,34
Belmont,100,100,100,100,100,100
Bronxdale,13,13,13,13,13,13
Castle Hill,8,8,8,8,8,8
City Island,25,25,25,25,25,25
Claremont Village,18,18,18,18,18,18
Clason Point,10,10,10,10,10,10
Co-op City,17,17,17,17,17,17


In [34]:
print('There are {} uniques categories.'.format(len(bronx_venues['Venue Category'].unique())))

There are 173 uniques categories.


In [35]:
# one hot encoding
bronx_onehot = pd.get_dummies(bronx_venues[['Venue Category']], prefix="", prefix_sep="")

bronx_onehot['Neighborhood'] = bronx_venues['Neighborhood'] 
fixed_columns = [bronx_onehot.columns[-1]] + list(bronx_onehot.columns[:-1])
bronx_onehot = bronx_onehot[fixed_columns]

bronx_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Boat or Ferry,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Bus Line,Bus Station,Bus Stop,Café,Candy Store,Caribbean Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Electronics Store,Eye Doctor,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Lake,Latin American Restaurant,Laundromat,Lawyer,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moving Target,Music Venue,Nightclub,Optical Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Platform,Playground,Plaza,Pool,Print Shop,Pub,Recreation Center,Rental Car Location,Restaurant,River,Road,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tattoo Parlor,Tennis Court,Tennis Stadium,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Wings Joint,Women's Store
0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [37]:
bronx_grouped = bronx_onehot.groupby('Neighborhood').mean().reset_index()
bronx_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Boat or Ferry,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Bus Line,Bus Station,Bus Stop,Café,Candy Store,Caribbean Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Electronics Store,Eye Doctor,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Lake,Latin American Restaurant,Laundromat,Lawyer,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moving Target,Music Venue,Nightclub,Optical Shop,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Platform,Playground,Plaza,Pool,Print Shop,Pub,Recreation Center,Rental Car Location,Restaurant,River,Road,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tattoo Parlor,Tennis Court,Tennis Stadium,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.068966,0.034483,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Baychester,0.045455,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bedford Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.117647,0.029412,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Belmont,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.09,0.01,0.03,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.18,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.09,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
4,Bronxdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create dataframe with the top 10 venues for each neighborhood.

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
bronx_venues_sorted = pd.DataFrame(columns=columns)
bronx_venues_sorted['Neighborhood'] = bronx_grouped['Neighborhood']

for ind in np.arange(bronx_grouped.shape[0]):
    bronx_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bronx_grouped.iloc[ind, :], num_top_venues)

bronx_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Spa,Chinese Restaurant,Pharmacy,Deli / Bodega,Supermarket,Bus Station,Spanish Restaurant,Breakfast Spot,Dessert Shop
1,Baychester,Donut Shop,Bank,Mexican Restaurant,Mattress Store,Pet Store,Sandwich Place,Spanish Restaurant,Moving Target,Fried Chicken Joint,Electronics Store
2,Bedford Park,Deli / Bodega,Diner,Mexican Restaurant,Sandwich Place,Pharmacy,Pizza Place,Spanish Restaurant,Chinese Restaurant,Supermarket,Bus Station
3,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega,Bakery,Dessert Shop,Donut Shop,Bank,Diner,Sandwich Place,Liquor Store
4,Bronxdale,Italian Restaurant,Pizza Place,Gym,Mexican Restaurant,Breakfast Spot,Spanish Restaurant,Eastern European Restaurant,Supermarket,Chinese Restaurant,Bank


#### Do the same for Scarborough, Toronto.

In [42]:
scarb_venues = getNearbyVenues(names=scarborough['Neighborhood'],
                                   latitudes=scarborough['Latitude'],
                                   longitudes=scarborough['Longitude']
                                  )
print(scarb_venues.shape)
scarb_venues.head()

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
(96, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


In [43]:
scarb_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",3,3,3,3,3,3
"Birch Cliff, Cliffside West",4,4,4,4,4,4
Cedarbrae,7,7,7,7,7,7
"Clairlea, Golden Mile, Oakridge",8,8,8,8,8,8
"Clarks Corners, Sullivan, Tam O'Shanter",12,12,12,12,12,12
"Cliffcrest, Cliffside, Scarborough Village West",2,2,2,2,2,2
"Dorset Park, Scarborough Town Centre, Wexford Heights",9,9,9,9,9,9
"East Birchmount Park, Ionview, Kennedy Park",5,5,5,5,5,5
"Guildwood, Morningside, West Hill",7,7,7,7,7,7


In [44]:
print('There are {} uniques categories.'.format(len(scarb_venues['Venue Category'].unique())))

There are 56 uniques categories.


In [45]:
# one hot encoding
scarb_onehot = pd.get_dummies(scarb_venues[['Venue Category']], prefix="", prefix_sep="")

scarb_onehot['Neighborhood'] = scarb_venues['Neighborhood'] 
fixed_columns = [scarb_onehot.columns[-1]] + list(scarb_onehot.columns[:-1])
scarb_onehot = scarb_onehot[fixed_columns]

scarb_grouped = scarb_onehot.groupby('Neighborhood').mean().reset_index()
scarb_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Bus Line,Bus Station,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,College Stadium,Convenience Store,Department Store,Discount Store,Electronics Store,Fast Food Restaurant,Fried Chicken Joint,Furniture / Home Store,General Entertainment,Grocery Store,Gym,Gym Pool,Hakka Restaurant,Hobby Shop,Indian Restaurant,Intersection,Italian Restaurant,Korean Restaurant,Latin American Restaurant,Light Rail Station,Lounge,Medical Center,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Motel,Moving Target,Nail Salon,Noodle House,Park,Pet Store,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Shopping Mall,Skating Rink,Soccer Field,Spa,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0
1,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.142857,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0
4,"Clairlea, Golden Mile, Oakridge",0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0


In [95]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
scarb_venues_sorted = pd.DataFrame(columns=columns)
scarb_venues_sorted['Neighborhood'] = scarb_grouped['Neighborhood']

for ind in np.arange(scarb_grouped.shape[0]):
    scarb_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarb_grouped.iloc[ind, :], num_top_venues)

scarb_venues_sorted.head()
scarb_venues_sorted['1st Most Common Venue'].unique()

array(['Chinese Restaurant', 'Playground', 'College Stadium',
       'Hakka Restaurant', 'Bakery', 'Pizza Place', 'American Restaurant',
       'Indian Restaurant', 'Hobby Shop', 'Intersection', 'Bar',
       'Fast Food Restaurant', 'Middle Eastern Restaurant', 'Spa',
       'Coffee Shop'], dtype=object)

#### Clustering and examining clusters

In [47]:
kcluster = 5

In [49]:
bronx_cluster = bronx_grouped.drop('Neighborhood',1)
b_kmeans = KMeans(n_clusters = kcluster, random_state=0).fit(bronx_cluster)
b_kmeans.labels_[0:10]
bronx_venues_sorted.insert(0, 'Cluster Labels', b_kmeans.labels_)
bronx_merged = bronx
bronx_merged = bronx_merged.join(bronx_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')
bronx_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,0,Ice Cream Shop,Laundromat,Dessert Shop,Pharmacy,Donut Shop,Sandwich Place,Caribbean Restaurant,Food,Gas Station,Food Truck
1,Bronx,Co-op City,40.874294,-73.829939,0,Fast Food Restaurant,Baseball Field,Pizza Place,Grocery Store,Mattress Store,Chinese Restaurant,Liquor Store,Basketball Court,Gift Shop,Restaurant
2,Bronx,Eastchester,40.887556,-73.827806,3,Bus Station,Caribbean Restaurant,Metro Station,Bus Stop,Diner,Convenience Store,Seafood Restaurant,Fast Food Restaurant,Juice Bar,Pizza Place
3,Bronx,Fieldston,40.895437,-73.905643,2,Plaza,River,Bus Station,Playground,Dessert Shop,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Deli / Bodega
4,Bronx,Riverdale,40.890834,-73.912585,4,Park,Bus Station,Plaza,Bank,Playground,Food Truck,Home Service,Health & Beauty Service,Dog Run,Fast Food Restaurant


In [51]:
map_bronx_cluster = folium.Map(location=[n_latitude, n_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kcluster)
ys = [i + x + (i*x)**2 for i in range(kcluster)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bronx_merged['Latitude'], bronx_merged['Longitude'], bronx_merged['Neighborhood'], bronx_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_bronx_cluster)
       
map_bronx_cluster

In [52]:
bronx_merged.loc[bronx_merged['Cluster Labels'] == 0, bronx_merged.columns[[1] + list(range(5, bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wakefield,Ice Cream Shop,Laundromat,Dessert Shop,Pharmacy,Donut Shop,Sandwich Place,Caribbean Restaurant,Food,Gas Station,Food Truck
1,Co-op City,Fast Food Restaurant,Baseball Field,Pizza Place,Grocery Store,Mattress Store,Chinese Restaurant,Liquor Store,Basketball Court,Gift Shop,Restaurant
5,Kingsbridge,Pizza Place,Sandwich Place,Mexican Restaurant,Bar,Latin American Restaurant,Supermarket,Deli / Bodega,Caribbean Restaurant,Spanish Restaurant,Bakery
9,Baychester,Donut Shop,Bank,Mexican Restaurant,Mattress Store,Pet Store,Sandwich Place,Spanish Restaurant,Moving Target,Fried Chicken Joint,Electronics Store
12,Bedford Park,Deli / Bodega,Diner,Mexican Restaurant,Sandwich Place,Pharmacy,Pizza Place,Spanish Restaurant,Chinese Restaurant,Supermarket,Bus Station
14,Morris Heights,Pharmacy,Playground,Bank,Recreation Center,Moving Target,Bus Station,Spanish Restaurant,Food Truck,Latin American Restaurant,Buffet
15,Fordham,Donut Shop,Pizza Place,Fast Food Restaurant,Mobile Phone Shop,Gym / Fitness Center,Bank,Shoe Store,Fried Chicken Joint,Spanish Restaurant,Supplement Shop
19,Melrose,Pharmacy,Pizza Place,Discount Store,Gym / Fitness Center,Supermarket,Department Store,Mexican Restaurant,Bus Stop,Martial Arts Dojo,Sandwich Place
20,Mott Haven,Pizza Place,Gym,Donut Shop,Bakery,Fish & Chips Shop,Burger Joint,Food,Mobile Phone Shop,Spanish Restaurant,Storage Facility
21,Port Morris,Food Truck,Restaurant,Storage Facility,Bar,Music Venue,Latin American Restaurant,Donut Shop,Furniture / Home Store,Spanish Restaurant,Distillery


In [54]:
bronx_merged.loc[bronx_merged['Cluster Labels'] == 1, bronx_merged.columns[[1] + list(range(5, bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Woodlawn,Deli / Bodega,Bar,Pizza Place,Playground,Pub,Plaza,Rental Car Location,Donut Shop,Bus Stop,Food & Drink Shop
7,Norwood,Pizza Place,Park,Bank,Mobile Phone Shop,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Caribbean Restaurant,Sandwich Place,Spanish Restaurant
10,Pelham Parkway,Pizza Place,Frozen Yogurt Shop,Chinese Restaurant,Italian Restaurant,Deli / Bodega,Coffee Shop,Metro Station,Sandwich Place,Gas Station,Sushi Restaurant
11,City Island,Harbor / Marina,Deli / Bodega,Seafood Restaurant,Thrift / Vintage Store,Park,History Museum,Smoke Shop,French Restaurant,Spanish Restaurant,Music Venue
13,University Heights,Pizza Place,Chinese Restaurant,Bakery,History Museum,Bank,Fast Food Restaurant,Food,Burger Joint,Sandwich Place,Laundromat
16,East Tremont,Pizza Place,Bank,Café,Mobile Phone Shop,Lounge,Chinese Restaurant,Fast Food Restaurant,Donut Shop,Restaurant,Outdoors & Recreation
18,High Bridge,Pharmacy,Pizza Place,Deli / Bodega,Sandwich Place,Bus Station,Market,Chinese Restaurant,Donut Shop,Latin American Restaurant,Spanish Restaurant
23,Hunts Point,Pizza Place,BBQ Joint,Food,Farmers Market,Shipping Store,Café,Spanish Restaurant,Restaurant,Gourmet Shop,Grocery Store
27,Throgs Neck,Pizza Place,Asian Restaurant,Sports Bar,Liquor Store,Coffee Shop,Baseball Field,Juice Bar,Italian Restaurant,Bar,American Restaurant
28,Country Club,Sandwich Place,Playground,Italian Restaurant,Fried Chicken Joint,Liquor Store,Comic Shop,Chinese Restaurant,Distillery,Dry Cleaner,Fast Food Restaurant


In [56]:
bronx_merged.loc[bronx_merged['Cluster Labels'] == 2, bronx_merged.columns[[1] + list(range(5, bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Fieldston,Plaza,River,Bus Station,Playground,Dessert Shop,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Deli / Bodega


In [57]:
bronx_merged.loc[bronx_merged['Cluster Labels'] == 3, bronx_merged.columns[[1] + list(range(5, bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Eastchester,Bus Station,Caribbean Restaurant,Metro Station,Bus Stop,Diner,Convenience Store,Seafood Restaurant,Fast Food Restaurant,Juice Bar,Pizza Place
8,Williamsbridge,Soup Place,Nightclub,Bar,Caribbean Restaurant,Women's Store,Electronics Store,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
17,West Farms,Bus Station,Bus Stop,Donut Shop,Bank,Bus Line,Food,Diner,Lounge,Sandwich Place,Supermarket
24,Morrisania,Discount Store,Bus Station,Pizza Place,Donut Shop,Fast Food Restaurant,Grocery Store,Fish Market,Pharmacy,Bowling Alley,Sandwich Place
25,Soundview,Grocery Store,Chinese Restaurant,Pharmacy,Bus Stop,Breakfast Spot,Fried Chicken Joint,Basketball Court,Latin American Restaurant,Bus Station,Video Store
40,Olinville,Bakery,Caribbean Restaurant,Supermarket,Laundromat,Metro Station,Food,Fried Chicken Joint,Basketball Court,Chinese Restaurant,Mexican Restaurant
42,Concourse,Bus Station,Playground,Grocery Store,Caribbean Restaurant,Sandwich Place,Chinese Restaurant,Spanish Restaurant,Liquor Store,Donut Shop,Bakery
44,Edenwald,Grocery Store,Fish Market,Bus Station,Gas Station,Supermarket,Athletics & Sports,Women's Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
45,Claremont Village,Bus Station,Pizza Place,Chinese Restaurant,Bakery,Grocery Store,Deli / Bodega,Gym,Caribbean Restaurant,Supermarket,Gift Shop


In [58]:
bronx_merged.loc[bronx_merged['Cluster Labels'] == 4, bronx_merged.columns[[1] + list(range(5, bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Riverdale,Park,Bus Station,Plaza,Bank,Playground,Food Truck,Home Service,Health & Beauty Service,Dog Run,Fast Food Restaurant
26,Clason Point,Park,South American Restaurant,Scenic Lookout,Pool,Bus Stop,Boat or Ferry,Grocery Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
34,Spuyten Duyvil,Park,Pizza Place,Asian Restaurant,Pharmacy,Scenic Lookout,Bank,Tennis Stadium,Thai Restaurant,Tennis Court,Intersection


In [96]:
kcluster = 3
scarb_cluster = scarb_grouped.drop('Neighborhood',1)
s_kmeans = KMeans(n_clusters = kcluster, random_state=0).fit(scarb_cluster)
s_kmeans.labels_[0:10]
scarb_venues_sorted.insert(0, 'Cluster Labels', s_kmeans.labels_)
scarb_merged = scarborough
scarb_merged = scarb_merged.join(scarb_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')
scarb_merged.head()


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,2.0,Fast Food Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Hakka Restaurant,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,0.0,Bar,Moving Target,Vietnamese Restaurant,Convenience Store,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Intersection,Breakfast Spot,Rental Car Location,Pizza Place,Electronics Store,Medical Center,Mexican Restaurant,Vietnamese Restaurant,Convenience Store,Grocery Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1.0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Convenience Store,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym Pool,Gym


In [97]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 0, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,0.0,Bar,Moving Target,Vietnamese Restaurant,Convenience Store,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint
2,Scarborough,0.0,Intersection,Breakfast Spot,Rental Car Location,Pizza Place,Electronics Store,Medical Center,Mexican Restaurant,Vietnamese Restaurant,Convenience Store,Grocery Store
4,Scarborough,0.0,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym Pool,Gym
5,Scarborough,0.0,Spa,Playground,Convenience Store,Vietnamese Restaurant,College Stadium,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store
6,Scarborough,0.0,Hobby Shop,Department Store,Bus Station,Chinese Restaurant,Coffee Shop,Hakka Restaurant,Gym Pool,Gym,Grocery Store,General Entertainment
7,Scarborough,0.0,Bakery,Bus Line,Soccer Field,Fast Food Restaurant,Metro Station,Park,Vietnamese Restaurant,Convenience Store,Gym,Grocery Store
8,Scarborough,0.0,American Restaurant,Motel,Hobby Shop,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant
9,Scarborough,0.0,College Stadium,Skating Rink,General Entertainment,Café,Vietnamese Restaurant,Gym Pool,Gym,Grocery Store,Furniture / Home Store,Fried Chicken Joint
10,Scarborough,0.0,Indian Restaurant,Pet Store,Chinese Restaurant,Furniture / Home Store,Thrift / Vintage Store,Latin American Restaurant,Light Rail Station,Vietnamese Restaurant,Rental Car Location,Playground
11,Scarborough,0.0,Middle Eastern Restaurant,Shopping Mall,Breakfast Spot,Sandwich Place,Vietnamese Restaurant,Bakery,Auto Garage,Grocery Store,Convenience Store,Gym


In [98]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 1, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,1.0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Convenience Store,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint
14,Scarborough,1.0,Playground,Park,Coffee Shop,Vietnamese Restaurant,College Stadium,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store


In [99]:
scarb_merged.loc[scarb_merged['Cluster Labels'] == 2, scarb_merged.columns[[1] + list(range(5, scarb_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,2.0,Fast Food Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Hakka Restaurant,Gym Pool,Gym,Grocery Store,General Entertainment,Furniture / Home Store,Fried Chicken Joint


In [100]:
scarb_merged['Cluster Labels']

0     2.0
1     0.0
2     0.0
3     1.0
4     0.0
5     0.0
6     0.0
7     0.0
8     0.0
9     0.0
10    0.0
11    0.0
12    0.0
13    0.0
14    1.0
15    0.0
16    NaN
Name: Cluster Labels, dtype: float64

In [101]:
scarb_merged.loc[16,:]

PostalCode                        M1X
Borough                   Scarborough
Neighborhood              Upper Rouge
Latitude                      43.8361
Longitude                    -79.2056
Cluster Labels                    NaN
1st Most Common Venue             NaN
2nd Most Common Venue             NaN
3rd Most Common Venue             NaN
4th Most Common Venue             NaN
5th Most Common Venue             NaN
6th Most Common Venue             NaN
7th Most Common Venue             NaN
8th Most Common Venue             NaN
9th Most Common Venue             NaN
10th Most Common Venue            NaN
Name: 16, dtype: object

In [102]:
map_scarb_cluster = folium.Map(location=[s_latitude, s_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kcluster)
ys = [i + x + (i*x)**2 for i in range(kcluster)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        scarb_merged['Latitude'],
        scarb_merged['Longitude'],
        scarb_merged['Neighborhood'],
        scarb_merged['Cluster Labels']) :
    if cluster!=cluster:
        continue
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_scarb_cluster)
       
map_scarb_cluster