# David Skoff - IBM Data Science Professional Coursera Capstone

## Clustering Neighborhoods in Toronto and New York

### This notebook explores and clusters the neighborhoods in Toronto and New York

In [1]:
# import libraries

import pandas as pd
import numpy as np

## 1. Scrape Toronto Borough and Neighborhood Table from Wikipedia and Pre-process

In [2]:
# scrape postal code, borough, neighborhood table for Toronto from Wikipedia

df_scrape = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [3]:
# multiple tables are scraped from Wikipedia page
# select the only table of interest

df_scrape = df_scrape[0]

In [4]:
df_scrape

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [5]:
df_scrape['Borough'].value_counts()

Not assigned        77
North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East York            5
York                 5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

### There are 77 Boroughs equal to 'Not assigned' to remove from the dataframe

In [6]:
# make a dataframe that does not include rows where the Borough is Not assigned

column_names = ['Postal Code', 'Borough', 'Neighborhood']

# instantiate the dataframe
df_removed_na = pd.DataFrame(columns=column_names)

# for loop to fill dataframe with any row where Borough does not equal Not assigned

for i in range(0,len(df_scrape)):
    if df_scrape.iloc[i,1] != 'Not assigned':
        df_removed_na = df_removed_na.append({'Postal Code': df_scrape.iloc[i,0], 
                                      'Borough': df_scrape.iloc[i,1], 
                                      'Neighborhood': df_scrape.iloc[i,2]}, ignore_index=True)
    elif df_scrape.iloc[i,1] == 'Not assigned':
        pass
    else:
        print('Some borough is neither equal to or not equal to "Not assigned"')                

In [7]:
print('Number of "Not assigned" rows removed:', (len(df_scrape) - len(df_removed_na)))

Number of "Not assigned" rows removed: 77


In [8]:
df_scrape.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [9]:
df_removed_na.head(6)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"


In [10]:
# check to see if any Neighborhoods are equal to 'Not assigned'

count_borough = df_removed_na['Borough'].str.count('Not assigned').sum()

count_neigh = df_removed_na['Neighborhood'].str.count('Not assigned').sum()

print('# of Boroughs with Not assigned:', count_borough)
print('# of Neighborhoods with Not assigned:', count_neigh)

# of Boroughs with Not assigned: 0
# of Neighborhoods with Not assigned: 0


### No boroughs or neighborhoods are equal to 'Not assigned' in df_removed_na

In [11]:
# check to see if all postal codes are unique

postal_code_check = df_removed_na['Postal Code'].value_counts().max()
postal_code_check

1

In [12]:
# print the shape of the final dataframe

df_removed_na.shape

(103, 3)

### The final dataframe has 103 rows (77 removed during pre-processing)

## 2. Merge Toronto Latitude and Longitude Data to Each Postal Code

In [13]:
# sensitive code removed for sharing

# import latitude and longitude dataset

df_lat_lon = pd.read_csv(body)
df_lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
df_lat_lon.shape

(103, 3)

In [15]:
df_removed_na.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [16]:
df_lat_lon.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [17]:
# compare length of df_removed_na to df_lat_lon

diff_in_len = len(df_removed_na) - len(df_lat_lon)
print('Difference in length:', diff_in_len)

Difference in length: 0


In [18]:
df_toronto_final = pd.merge(df_removed_na, df_lat_lon, how='inner', on='Postal Code', validate='one_to_one')
df_toronto_final.shape

(103, 5)

In [19]:
df_toronto_final.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## 3. Explore the neighborhoods in Toronto

In [20]:
df_toronto_final['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East York            5
York                 5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

### 3a. Create a map of Toronto with neighborhoods superimposed on top.

In [23]:
# install and import folium

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

In [237]:
# Toronto latitude and logitude
toronto_location = [43.6532, -79.3832]

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=toronto_location, zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto_final['Latitude'], df_toronto_final['Longitude'], df_toronto_final['Borough'], df_toronto_final['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Let's simplify the above map and segment and cluster only the boroughs that contain the word Toronto. So let's slice the original dataframe and create a new dataframe of the Toronto data.

In [25]:
downtown = df_toronto_final[df_toronto_final['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
central = df_toronto_final[df_toronto_final['Borough'] == 'Central Toronto'].reset_index(drop=True)
west = df_toronto_final[df_toronto_final['Borough'] == 'West Toronto'].reset_index(drop=True)
east = df_toronto_final[df_toronto_final['Borough'] == 'East Toronto'].reset_index(drop=True)

toronto_data = pd.concat([downtown, central, west, east], ignore_index=True, sort=False)

In [26]:
toronto_data.shape

(39, 5)

In [27]:
toronto_data['Borough'].value_counts()

Downtown Toronto    19
Central Toronto      9
West Toronto         6
East Toronto         5
Name: Borough, dtype: int64

In [28]:
toronto_data.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


In [238]:
# Toronto latitude and logitude
toronto_location = [43.6532, -79.3832]

# create map of Toronto using latitude and longitude values
map_just_toronto = folium.Map(location=toronto_location, zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_just_toronto)  
    
map_just_toronto

## 4. Explore the neighborhoods in New York City

In [30]:
# sensitive code removed for sharing

# import new york city geospatial dataset

df_nyc = pd.read_csv(body)
df_nyc.head()

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,0,Bronx,Wakefield,40.894705,-73.847201
1,1,Bronx,Co-op City,40.874294,-73.829939
2,2,Bronx,Eastchester,40.887556,-73.827806
3,3,Bronx,Fieldston,40.895437,-73.905643
4,4,Bronx,Riverdale,40.890834,-73.912585


In [33]:
df_nyc.drop(columns=['Unnamed: 0'], inplace=True)
df_nyc.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [36]:
df_downtown_toronto = toronto_data.drop(columns=['Postal Code'])
df_downtown_toronto.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [42]:
# compare the number of boroughs and neighborhoods in toronto and new york city

print('The Toronto dataframe has {} boroughs and {} neighborhoods.'.format(len(df_downtown_toronto['Borough'].unique()), df_downtown_toronto.shape[0]))
print('The NYC dataframe has {} boroughs and {} neighborhoods.'.format(len(df_nyc['Borough'].unique()), df_nyc.shape[0]))

The Toronto dataframe has 4 boroughs and 39 neighborhoods.
The NYC dataframe has 5 boroughs and 306 neighborhoods.


### 4a. Create a map of New York with neighborhoods superimposed on top.

In [239]:
# New York City latitude and logitude
nyc_location = [40.7127281, -74.0060152]

# create map of New York City using latitude and longitude values
map_nyc = folium.Map(location=nyc_location, zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_nyc['Latitude'], df_nyc['Longitude'], df_nyc['Borough'], df_nyc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)  
    
map_nyc

### 4b. Merge Toronto and NYC dataframes to collect venue information and cluster

In [48]:
df_nyc.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.80553
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631
305,Staten Island,Fox Hills,40.617311,-74.08174


In [46]:
df_downtown_toronto.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [47]:
df_tor_nyc = pd.concat([df_downtown_toronto, df_nyc], ignore_index=True)
df_tor_nyc.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [49]:
df_tor_nyc.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
340,Manhattan,Hudson Yards,40.756658,-74.000111
341,Queens,Hammels,40.587338,-73.80553
342,Queens,Bayswater,40.611322,-73.765968
343,Queens,Queensbridge,40.756091,-73.945631
344,Staten Island,Fox Hills,40.617311,-74.08174


In [54]:
df_tor_nyc.shape

(345, 4)

## 5. Use Foursquare to Collect Information on Nearby Venues

In [55]:
# sensitive code removed for sharing

In [56]:
# import libraries

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [57]:
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [58]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [97]:
df_tor_nyc_set_1 = df_tor_nyc.iloc[0:100, :]
df_tor_nyc_set_2 = df_tor_nyc.iloc[100:200, :]
df_tor_nyc_set_3 = df_tor_nyc.iloc[200:300, :]
df_tor_nyc_set_4 = df_tor_nyc.iloc[300:(len(df_tor_nyc)), :]

In [98]:
print('Set 1:', df_tor_nyc_set_1.shape)
print('Set 2:', df_tor_nyc_set_2.shape)
print('Set 3:', df_tor_nyc_set_3.shape)
print('Set 4:', df_tor_nyc_set_4.shape)

Set 1: (100, 4)
Set 2: (100, 4)
Set 3: (100, 4)
Set 4: (45, 4)


In [103]:
# call foursquare API on each neighborhood (set 1 - index 0 to 99)

LIMIT = 100 # limit of number of venues returned by Foursquare API

tor_nyc_venues_set_1 = getNearbyVenues(names=df_tor_nyc_set_1['Neighborhood'], 
                                       latitudes=df_tor_nyc_set_1['Latitude'], 
                                       longitudes=df_tor_nyc_set_1['Longitude']
                                      )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High Park, The Junction South
Parkdale, Roncesval

In [107]:
# call foursquare API on each neighborhood (set 2 - index 100 to 199)

LIMIT = 100 # limit of number of venues returned by Foursquare API

tor_nyc_venues_set_2 = getNearbyVenues(names=df_tor_nyc_set_2['Neighborhood'], 
                                       latitudes=df_tor_nyc_set_2['Latitude'], 
                                       longitudes=df_tor_nyc_set_2['Longitude']
                                      )

Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Astoria
Woodside
Jackson Heights
Elmhurst
Howard Beach
Corona
Forest Hills
Kew Gardens
Richmond Hill
Flushing
L

In [111]:
# call foursquare API on each neighborhood (set 3 - index 200 to 299)

LIMIT = 100 # limit of number of venues returned by Foursquare API

tor_nyc_venues_set_3 = getNearbyVenues(names=df_tor_nyc_set_3['Neighborhood'], 
                                       latitudes=df_tor_nyc_set_3['Latitude'], 
                                       longitudes=df_tor_nyc_set_3['Longitude']
                                      )

Oakland Gardens
Queens Village
Hollis
South Jamaica
St. Albans
Rochdale
Springfield Gardens
Cambria Heights
Rosedale
Far Rockaway
Broad Channel
Breezy Point
Steinway
Beechhurst
Bay Terrace
Edgemere
Arverne
Rockaway Beach
Neponsit
Murray Hill
Floral Park
Holliswood
Jamaica Estates
Queensboro Hill
Hillcrest
Ravenswood
Lindenwood
Laurelton
Lefrak City
Belle Harbor
Rockaway Park
Somerville
Brookville
Bellaire
North Corona
Forest Hills Gardens
St. George
New Brighton
Stapleton
Rosebank
West Brighton
Grymes Hill
Todt Hill
South Beach
Port Richmond
Mariner's Harbor
Port Ivory
Castleton Corners
New Springville
Travis
New Dorp
Oakwood
Great Kills
Eltingville
Annadale
Woodrow
Tottenville
Tompkinsville
Silver Lake
Sunnyside
Ditmas Park
Wingate
Rugby
Park Hill
Westerleigh
Graniteville
Arlington
Arrochar
Grasmere
Old Town
Dongan Hills
Midland Beach
Grant City
New Dorp Beach
Bay Terrace
Huguenot
Pleasant Plains
Butler Manor
Charleston
Rossville
Arden Heights
Greenridge
Heartland Village
Chelsea
Bloo

In [125]:
# call foursquare API on each neighborhood (set 4 - index 300 to 344)

LIMIT = 100 # limit of number of venues returned by Foursquare API

tor_nyc_venues_set_4 = getNearbyVenues(names=df_tor_nyc_set_4['Neighborhood'], 
                                       latitudes=df_tor_nyc_set_4['Latitude'], 
                                       longitudes=df_tor_nyc_set_4['Longitude']
                                      )

Paerdegat Basin
Mill Basin
Jamaica Hills
Utopia
Pomonok
Astoria Heights
Claremont Village
Concourse Village
Mount Eden
Mount Hope
Sutton Place
Hunters Point
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Sunnyside Gardens
Blissville
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Manor Heights
Willowbrook
Sandy Ground
Egbertville
Roxbury
Homecrest
Middle Village
Prince's Bay
Lighthouse Hill
Richmond Valley
Malba
Highland Park
Madison
Bronxdale
Allerton
Kingsbridge Heights
Erasmus
Hudson Yards
Hammels
Bayswater
Queensbridge
Fox Hills


In [152]:
# check to see if all latitudes are unique in df_tor_nyc

df_tor_nyc['Latitude'].value_counts().max()

1

In [158]:
# combine venues data set 1 to 4

venue_data_list = [tor_nyc_venues_set_1, tor_nyc_venues_set_2, tor_nyc_venues_set_3, tor_nyc_venues_set_4]

tor_nyc_venues_all = pd.concat(venue_data_list, ignore_index=True)
tor_nyc_venues_all.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Corktown Common,43.655618,-79.356211,Park


In [160]:
print(tor_nyc_venues_all.shape)

(11715, 7)


In [163]:
# check how many venues were returned for each neighborhood latitude

tor_nyc_venues_all.groupby('Neighborhood Latitude').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood Latitude,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
40.505334,6,6,6,6,6,6
40.506082,6,6,6,6,6,6
40.519541,12,12,12,12,12,12
40.524699,20,20,20,20,20,20
40.526264,8,8,8,8,8,8
40.530531,31,31,31,31,31,31
40.531912,8,8,8,8,8,8
40.538114,13,13,13,13,13,13
40.541140,11,11,11,11,11,11
40.541968,20,20,20,20,20,20


Let's find out how many unique categories can be curated from all the returned venues

In [164]:
print('There are {} uniques categories.'.format(len(tor_nyc_venues_all['Venue Category'].unique())))

There are 455 uniques categories.


### 5a. Analyze Each Neighborhood

In [173]:
# one hot encoding
tor_nyc_onehot = pd.get_dummies(tor_nyc_venues_all[['Venue Category']], prefix="", prefix_sep="")

# rename venue category which is also named 'Neighborhood'
tor_nyc_onehot.rename(columns={'Neighborhood':'NeighborhoodVenue'}, inplace=True)

# method 1 - add neighborhood column back to dataframe
tor_nyc_onehot = pd.concat([tor_nyc_venues_all['Neighborhood Latitude'], tor_nyc_onehot], axis=1).reindex(tor_nyc_onehot.index)

tor_nyc_onehot.head()

Unnamed: 0,Neighborhood Latitude,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,...,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,43.65426,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,43.65426,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,43.65426,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,43.65426,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,43.65426,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [174]:
tor_nyc_onehot.shape

(11715, 456)

Next, let's group rows by neighborhood latitude and take the mean of the frequency of occurrence of each category

In [176]:
tor_nyc_grouped = tor_nyc_onehot.groupby('Neighborhood Latitude').mean().reset_index()
tor_nyc_grouped.head()

Unnamed: 0,Neighborhood Latitude,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,...,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,40.505334,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,40.506082,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,40.519541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,40.524699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
4,40.526264,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [177]:
tor_nyc_grouped.shape

(344, 456)

In [181]:
# function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [183]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Latitude']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Latitude'] = tor_nyc_grouped['Neighborhood Latitude']

for ind in np.arange(tor_nyc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tor_nyc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40.505334,Cosmetics Shop,Deli / Bodega,Bus Stop,Mexican Restaurant,Thrift / Vintage Store,Italian Restaurant,Cupcake Shop,Fast Food Restaurant,Entertainment Service,Ethiopian Restaurant
1,40.506082,Baseball Field,Pool,Convenience Store,Bus Stop,Yoga Studio,Farmers Market,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
2,40.519541,Fast Food Restaurant,Convenience Store,Coffee Shop,Bank,Train Station,Smoothie Shop,Sandwich Place,Deli / Bodega,Construction & Landscaping,Food
3,40.524699,Bank,Donut Shop,Pizza Place,Yoga Studio,Discount Store,Flower Shop,Bus Stop,Fast Food Restaurant,Bar,Bakery
4,40.526264,Pizza Place,Chinese Restaurant,Bagel Shop,Italian Restaurant,Pharmacy,Pet Store,Sushi Restaurant,Event Space,Exhibit,Ethiopian Restaurant


## 6. Cluster Neighborhoods

In [184]:
# import k-means from clustering stage

from sklearn.cluster import KMeans

Run *k*-means to cluster the neighborhood into 8 clusters.

In [185]:
# set number of clusters
kclusters = 8

tor_nyc_grouped_clustering = tor_nyc_grouped.drop('Neighborhood Latitude', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tor_nyc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 3, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

In [192]:
df_tor_nyc.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [193]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,40.505334,Cosmetics Shop,Deli / Bodega,Bus Stop,Mexican Restaurant,Thrift / Vintage Store,Italian Restaurant,Cupcake Shop,Fast Food Restaurant,Entertainment Service,Ethiopian Restaurant
1,40.506082,Baseball Field,Pool,Convenience Store,Bus Stop,Yoga Studio,Farmers Market,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
2,40.519541,Fast Food Restaurant,Convenience Store,Coffee Shop,Bank,Train Station,Smoothie Shop,Sandwich Place,Deli / Bodega,Construction & Landscaping,Food
3,40.524699,Bank,Donut Shop,Pizza Place,Yoga Studio,Discount Store,Flower Shop,Bus Stop,Fast Food Restaurant,Bar,Bakery
4,40.526264,Pizza Place,Chinese Restaurant,Bagel Shop,Italian Restaurant,Pharmacy,Pet Store,Sushi Restaurant,Event Space,Exhibit,Ethiopian Restaurant


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [194]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [195]:
tor_nyc_merged = df_tor_nyc

# merge based on neighborhood latitude
tor_nyc_merged = tor_nyc_merged.join(neighborhoods_venues_sorted.set_index('Latitude'), on='Latitude')

tor_nyc_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2.0,Coffee Shop,Bakery,Park,Breakfast Spot,Café,Theater,Pub,Yoga Studio,Electronics Store,Performing Arts Venue
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2.0,Coffee Shop,Diner,Yoga Studio,Gym,Park,Bank,Bar,Sandwich Place,Beer Bar,Arts & Crafts Store
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2.0,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Bubble Tea Shop,Cosmetics Shop,Hotel,Diner,Pizza Place,Electronics Store
3,Downtown Toronto,St. James Town,43.651494,-79.375418,2.0,Restaurant,Coffee Shop,Café,Cosmetics Shop,Cocktail Bar,American Restaurant,Hotel,Seafood Restaurant,Creperie,Beer Bar
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,2.0,Coffee Shop,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Restaurant,Bakery,Farmers Market,Café,Seafood Restaurant


In [197]:
tor_nyc_merged.describe()

Unnamed: 0,Latitude,Longitude,Cluster Labels
count,345.0,345.0,344.0
mean,41.036623,-74.56162,2.293605
std,0.944966,1.730017,0.799683
min,40.505334,-79.48445,0.0
25%,40.626928,-74.093483,2.0
50%,40.716805,-73.953868,2.0
75%,40.821012,-73.867041,2.0
max,43.72802,-73.708847,7.0


In [202]:
tor_nyc_merged.shape

(345, 15)

In [200]:
tor_nyc_merged.index[tor_nyc_merged['Cluster Labels'].isnull()].tolist()

[296]

In [205]:
tor_nyc_merged.iloc[296,:]

Borough                   Staten Island
Neighborhood               Howland Hook
Latitude                        40.6384
Longitude                      -74.1862
Cluster Labels                      NaN
1st Most Common Venue               NaN
2nd Most Common Venue               NaN
3rd Most Common Venue               NaN
4th Most Common Venue               NaN
5th Most Common Venue               NaN
6th Most Common Venue               NaN
7th Most Common Venue               NaN
8th Most Common Venue               NaN
9th Most Common Venue               NaN
10th Most Common Venue              NaN
Name: 296, dtype: object

Staten Island - Howland Hook did not return any venues and therefore was not clustered. We will drop this location from the analysis.

In [206]:
tor_nyc_merged_drop = tor_nyc_merged.drop(index=296)
tor_nyc_merged_drop.shape

(344, 15)

In [209]:
tor_nyc_merged_drop.describe()

Unnamed: 0,Latitude,Longitude,Cluster Labels
count,344.0,344.0,344.0
mean,41.037781,-74.562711,2.293605
std,0.946098,1.732418,0.799683
min,40.505334,-79.48445,0.0
25%,40.626646,-74.089004,2.0
50%,40.717306,-73.953562,2.0
75%,40.821256,-73.866856,2.0
max,43.72802,-73.708847,7.0


In [213]:
tor_nyc_merged_drop.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 344 entries, 0 to 344
Data columns (total 15 columns):
Borough                   344 non-null object
Neighborhood              344 non-null object
Latitude                  344 non-null float64
Longitude                 344 non-null float64
Cluster Labels            344 non-null float64
1st Most Common Venue     344 non-null object
2nd Most Common Venue     344 non-null object
3rd Most Common Venue     344 non-null object
4th Most Common Venue     344 non-null object
5th Most Common Venue     344 non-null object
6th Most Common Venue     344 non-null object
7th Most Common Venue     344 non-null object
8th Most Common Venue     344 non-null object
9th Most Common Venue     344 non-null object
10th Most Common Venue    344 non-null object
dtypes: float64(3), object(12)
memory usage: 43.0+ KB


In [216]:
tor_nyc_merged_final = tor_nyc_merged_drop.astype({'Cluster Labels': 'int64'})
tor_nyc_merged_final.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Bakery,Park,Breakfast Spot,Café,Theater,Pub,Yoga Studio,Electronics Store,Performing Arts Venue
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Diner,Yoga Studio,Gym,Park,Bank,Bar,Sandwich Place,Beer Bar,Arts & Crafts Store
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Bubble Tea Shop,Cosmetics Shop,Hotel,Diner,Pizza Place,Electronics Store
3,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Restaurant,Coffee Shop,Café,Cosmetics Shop,Cocktail Bar,American Restaurant,Hotel,Seafood Restaurant,Creperie,Beer Bar
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,2,Coffee Shop,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Restaurant,Bakery,Farmers Market,Café,Seafood Restaurant


In [217]:
tor_nyc_merged_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 344 entries, 0 to 344
Data columns (total 15 columns):
Borough                   344 non-null object
Neighborhood              344 non-null object
Latitude                  344 non-null float64
Longitude                 344 non-null float64
Cluster Labels            344 non-null int64
1st Most Common Venue     344 non-null object
2nd Most Common Venue     344 non-null object
3rd Most Common Venue     344 non-null object
4th Most Common Venue     344 non-null object
5th Most Common Venue     344 non-null object
6th Most Common Venue     344 non-null object
7th Most Common Venue     344 non-null object
8th Most Common Venue     344 non-null object
9th Most Common Venue     344 non-null object
10th Most Common Venue    344 non-null object
dtypes: float64(2), int64(1), object(12)
memory usage: 43.0+ KB


Finally, let's visualize the resulting clusters.

In [218]:
# Matplotlib and associated plotting modules

import matplotlib.cm as cm
import matplotlib.colors as colors

In [240]:
# Toronto latitude and logitude
toronto_location = [43.6532, -79.3832]

# create map
map_clusters = folium.Map(location=toronto_location, zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tor_nyc_merged_final['Latitude'], tor_nyc_merged_final['Longitude'], tor_nyc_merged_final['Neighborhood'], tor_nyc_merged_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [241]:
# New York City latitude and logitude
nyc_location = [40.7127281, -74.0060152]

# create map
map_clusters_ny = folium.Map(location=nyc_location, zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tor_nyc_merged_final['Latitude'], tor_nyc_merged_final['Longitude'], tor_nyc_merged_final['Neighborhood'], tor_nyc_merged_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_ny)
       
map_clusters_ny

### 6a. Examine Clusters

#### Cluster 0

In [228]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 0, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
211,Queens,Breezy Point,Beach,Trail,Monument / Landmark,Bus Stop,Yoga Studio,Farmers Market,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
218,Queens,Neponsit,Beach,Yoga Studio,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit
243,Staten Island,South Beach,Beach,Pier,Deli / Bodega,Athletics & Sports,Yoga Studio,Farm,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
341,Queens,Hammels,Beach,NeighborhoodVenue,Bus Station,Dog Run,Fast Food Restaurant,Gym / Fitness Center,Bus Stop,Diner,Shoe Store,Deli / Bodega


#### Cluster 1

In [230]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 1, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,Rosedale,Park,Trail,Playground,Yoga Studio,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant
19,Central Toronto,Lawrence Park,Park,Swim School,Bus Line,Yoga Studio,Farm,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
26,Central Toronto,"Moore Park, Summerhill East",Park,Lawyer,Trail,Restaurant,Yoga Studio,Factory,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service
66,Bronx,Clason Point,Park,Pool,Grocery Store,Home Service,South American Restaurant,Bus Stop,Boat or Ferry,Business Service,Convenience Store,Financial or Legal Service
187,Queens,South Ozone Park,Park,Food Truck,Deli / Bodega,Fast Food Restaurant,Sandwich Place,Hotel,Donut Shop,Bar,Food,Entertainment Service
231,Queens,Somerville,Park,Yoga Studio,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit
242,Staten Island,Todt Hill,Park,Trail,Yoga Studio,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service
283,Staten Island,Chelsea,Park,Steakhouse,Spanish Restaurant,Sandwich Place,Factory,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service
295,Staten Island,Randall Manor,Home Service,Park,Bus Stop,Playground,Bagel Shop,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service


#### Cluster 2

In [231]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 2, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",Coffee Shop,Bakery,Park,Breakfast Spot,Café,Theater,Pub,Yoga Studio,Electronics Store,Performing Arts Venue
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",Coffee Shop,Diner,Yoga Studio,Gym,Park,Bank,Bar,Sandwich Place,Beer Bar,Arts & Crafts Store
2,Downtown Toronto,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Japanese Restaurant,Bubble Tea Shop,Cosmetics Shop,Hotel,Diner,Pizza Place,Electronics Store
3,Downtown Toronto,St. James Town,Restaurant,Coffee Shop,Café,Cosmetics Shop,Cocktail Bar,American Restaurant,Hotel,Seafood Restaurant,Creperie,Beer Bar
4,Downtown Toronto,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Pharmacy,Beer Bar,Restaurant,Bakery,Farmers Market,Café,Seafood Restaurant
5,Downtown Toronto,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Japanese Restaurant,Bubble Tea Shop,Burger Joint,Thai Restaurant,Salad Place,Bar
6,Downtown Toronto,Christie,Grocery Store,Café,Park,Restaurant,Baby Store,Italian Restaurant,Nightclub,Diner,Candy Store,Coffee Shop
7,Downtown Toronto,"Richmond, Adelaide, King",Coffee Shop,Café,Hotel,Restaurant,Thai Restaurant,Gym,Clothing Store,Deli / Bodega,Cosmetics Shop,Sushi Restaurant
8,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Aquarium,Café,Hotel,Brewery,Sporting Goods Shop,Fried Chicken Joint,Scenic Lookout,Restaurant,History Museum
9,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Hotel,Café,Seafood Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Salad Place,American Restaurant,Tea Room


#### Cluster 3

In [232]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 3, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
67,Bronx,Throgs Neck,Deli / Bodega,Sports Bar,American Restaurant,Bar,Liquor Store,Coffee Shop,Asian Restaurant,Chinese Restaurant,Pizza Place,Baseball Field
71,Bronx,Van Nest,Deli / Bodega,Pizza Place,Middle Eastern Restaurant,Board Shop,Supermarket,Donut Shop,Bus Station,Film Studio,Diner,Hookah Bar
80,Bronx,Olinville,Deli / Bodega,Caribbean Restaurant,Fried Chicken Joint,Supermarket,Basketball Court,Laundromat,Convenience Store,Food,Ethiopian Restaurant,Farm
84,Bronx,Edenwald,Grocery Store,Supermarket,Deli / Bodega,Yoga Studio,Farmers Market,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space
111,Brooklyn,East New York,Deli / Bodega,Music Venue,Event Service,Fast Food Restaurant,Pizza Place,Plaza,Spanish Restaurant,Fried Chicken Joint,Caribbean Restaurant,Metro Station
113,Brooklyn,Canarsie,Food,Caribbean Restaurant,Gym,Deli / Bodega,Asian Restaurant,Yoga Studio,Farm,English Restaurant,Entertainment Service,Ethiopian Restaurant
115,Brooklyn,Mill Island,Pool,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space
122,Brooklyn,Marine Park,Soccer Field,Deli / Bodega,Chinese Restaurant,Basketball Court,Coffee Shop,Athletics & Sports,Baseball Field,Pizza Place,Gym,Ice Cream Shop
128,Brooklyn,Ocean Hill,Deli / Bodega,Supermarket,Food,Southern / Soul Food Restaurant,Fried Chicken Joint,Convenience Store,Seafood Restaurant,Grocery Store,Chinese Restaurant,Bakery
183,Queens,Glendale,Pizza Place,Brewery,Food & Drink Shop,Deli / Bodega,Arts & Crafts Store,Farmers Market,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service


#### Cluster 4

In [233]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 4, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,Bronx,Co-op City,Baseball Field,Bus Station,Bagel Shop,Restaurant,Chinese Restaurant,Basketball Court,Park,Pharmacy,Discount Store,Fast Food Restaurant
41,Bronx,Eastchester,Bus Station,Caribbean Restaurant,Deli / Bodega,Diner,Food & Drink Shop,Chinese Restaurant,Bakery,Seafood Restaurant,Metro Station,Donut Shop
42,Bronx,Fieldston,Cosmetics Shop,River,Plaza,Bus Station,Farm,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant
43,Bronx,Riverdale,Park,Bus Station,Plaza,Medical Supply Store,Gym,Baseball Field,Bank,Food Truck,Department Store,Design Studio
48,Bronx,Williamsbridge,Bar,Caribbean Restaurant,Nightclub,Soup Place,Metro Station,Yoga Studio,Farm,Entertainment Service,Ethiopian Restaurant,Event Service
54,Bronx,Morris Heights,Food,Spanish Restaurant,Bank,Bus Station,Pizza Place,Supermarket,Pharmacy,Deli / Bodega,Recreation Center,Grocery Store
57,Bronx,West Farms,Bus Station,Park,Chinese Restaurant,Sandwich Place,Art Gallery,Lounge,Bank,Scenic Lookout,Outdoors & Recreation,Coffee Shop
58,Bronx,High Bridge,Pharmacy,Pizza Place,Supermarket,Bus Station,Chinese Restaurant,Gym,Check Cashing Service,Park,Latin American Restaurant,Donut Shop
64,Bronx,Morrisania,Discount Store,Bus Station,Donut Shop,Liquor Store,Grocery Store,Fast Food Restaurant,Pizza Place,Food,Pharmacy,Metro Station
65,Bronx,Soundview,Chinese Restaurant,Grocery Store,Video Store,Playground,Pharmacy,Discount Store,Bus Station,Liquor Store,Bus Stop,Fried Chicken Joint


#### Cluster 5

In [234]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 5, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
342,Queens,Bayswater,Playground,Yoga Studio,Farmers Market,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space


#### Cluster 6

In [235]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 6, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
265,Staten Island,Graniteville,Grocery Store,Yoga Studio,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit


#### Cluster 7

In [236]:
tor_nyc_merged_final.loc[tor_nyc_merged_final['Cluster Labels'] == 7, tor_nyc_merged_final.columns[[0] + [1] + list(range(5, tor_nyc_merged_final.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
246,Staten Island,Port Ivory,Bar,Yoga Studio,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit


### 6b. Cluster Observations

These are the trends of each cluster:

Cluster 0 - Small cluster of beach venues all in NYC.

Cluster 1 - Small cluster between Toronto and NYC with commonalities of parks, trails and ethnic resturants.

Cluster 2 - Largest cluster with predominantly coffee shops/cafe, resturants and bars all over both Toronto and NYC.

Cluster 3 - New York City exclusive cluster that is heavy in deils and bodegas.

Cluster 4 - New York City exclusive cluster that has a lot of resturants specifically Caribbean resturants and other ethic resturants.

Cluster 5 - Queens NYC cluster that is a singlet cluster with most common being playgrounds, yoga studios and farmers markets.

Cluster 6 - Staten Island NYC cluster that is a singlet cluster with most common being grocery stores, yoga studios and resturants.

Cluster 7 - Staten Island NYC cluster that is a singlet cluster with most common being bars, yoga studios and resturants.