# Where should you live?

Clustering Neighbourhoods of Toronto and New York using Machine Learning 

## 1. Introduction

Everyone in the world needs to move at some point for better education, for better career oppurtunities or for just a better life. How does one make that decision? This project will look at two of the most happening cities in the world Toronto and New York. 

## 2. Business Problem

The aim of this project is to help people choose where they should move to depending on lifestype and experiences. What neighbourhoods in the cities offer what they want to have. This also helps people make decisions if they are thinking about migrating to Toronto or New York or even if they want to relocate neighbourhoods within the city. Our findings will help stakeholders make informed decisions and address any concerns they have including the different kinds of cuisines, provision stores and what the city has to offer.

## 3. Data Section

In order to look at the different cities we need geographical location data. Postal codes in each city will be the starting point. Using these postal codes, we will be able to find out our neighbourhoods, borough, venues, popular venues. 

### Toronto 

We will need to scrape data from the following site: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

This Wikipedia page has information about all the neighbourhoods; we limit it to London.

1. Neighbourhood: Name of Neighbourhood
2. Borough: Name of the borough
3. post_code: Postal codes for London.

This Wikipedia page lacks information about geographical locations. To solve this problem, we use data from: https://cocl.us/Geospatial_data

### New York

We will need to get our data from the following: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json

The JSON file has data about all the neighbourhoods in New York. 

1. Neighbourhood: Name of Neighbourhood
2. Borough: Name of the borough
3. Latitude: Geographical location
4. Longitude: Geographical location

### Foursquare API Data

We will need data about different venues in different neighbourhoods. To get this information we will use Foursquare location information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside each neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.
The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. 

The information obtained per venue as follows:

1. Neighbourhood: Name of the Neighborhood
2. Neighbourhood Latitude: Latitude of the Neighborhood
3. Neighbourhood Longitude: Longitude of the Neighborhood
4. Venue: Name of the Venue
5. Venue Latitude: Latitude of Venue
6. Venue Longitude: Longitude of Venue
7. Venue Category: Category of Venue

Based on all the information collected for both Toronto and New York, we have sufficient data to build our model. We cluster the neighbourhoods together based on similar venue categories. We then present our observations and findings. Using this data, our stakeholders can take the necessary decision.


## 4. Methodology

We will be creating our model with the help of Python so we start off by importing all the required packages.

In [45]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import geocoder
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

Package breakdown:

Pandas : To collect and manipulate data 
    
requests : Handle http requests    
    
matplotlib : Detailing the generated maps   
    
folium : Generating maps 
    
sklearn : To import Kmeans which is the machine learning model that we are using.

The approach taken here is to explore each of the cities individually, plot the map to show the neighbourhoods being considered and then build our model by clustering all of the similar neighbourhoods together and finally plot the new map with the clustered neighbourhoods. We draw insights and then compare and discuss our findings.

### 4.1 Data Collection


In the data collection stage, we begin with collecting the required data for the cities of Toronto and New York.

To collect data for Toronto, we scrape the List of areas of Toronto wikipedia page to take the 1st table using the following code:

In [4]:
#Getting data from wikipedia website
Toronto_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

#Cleaning Data
Toronto_df = pd.DataFrame(Toronto_df[0])

The data looks like this:

In [5]:
Toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


We will use the wikipedia link ("https://cocl.us/Geospatial_data") to get data for each postal code (latitude and longitude)

In [6]:
Toronto_postalcode_df = pd.read_csv("https://cocl.us/Geospatial_data")
Toronto_postalcode_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


We will use the Toronto_postalcode data frame and combine it with the Toronto data frame that contains the postal code, boroughs, neighbourhoods to get latitude and longitude. 

In [7]:
#Combining data frames

Toronto_df = Toronto_df.join(Toronto_postalcode_df.set_index('Postal Code'), on = 'Postal Code', how = 'inner')

Toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


To get our data for New York we need to read a JSON file - https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json. The data has all the information we need for our analysis. We will load the data and put it into a dataframe called NewYork_df. 

In [9]:
#Json library is needed to handle JSON files

import json 

!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [11]:
#Loading the data

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Now that our data is downloaded let us see what the features in the data are and convert it into a dataframe. 


In [19]:
NewYork_data = newyork_data['features']
NewYork_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [35]:
# define the dataframe columns
column_names = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
NewYork_df = pd.DataFrame(columns = column_names)

Lets take a look at the empty dataframe to confirm that the columns are as intended.

In [36]:
NewYork_df

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude


Now we will loop through the data and fill the dataframe one row at a time.

In [37]:
for data in NewYork_data:
    borough = neighbourhood_name = data['properties']['borough'] 
    neighbourhood_name = data['properties']['name']
    neighbourhood_latlon = data['geometry']['coordinates']
    neighbourhood_lat = neighbourhood_latlon[1]
    neighbourhood_lon = neighbourhood_latlon[0]
    
    NewYork_df = NewYork_df.append({'Borough': borough,
                                          'Neighbourhood': neighbourhood_name,
                                          'Latitude': neighbourhood_lat,
                                          'Longitude': neighbourhood_lon}, ignore_index=True)

Lets take a look at our final data. 

In [38]:
NewYork_df.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### 4.2 Data Preprocessing

We are going to reset the index for our Toronto Data Frame. 

In [39]:
# Resetting the index
Toronto_df = Toronto_df.reset_index()
Toronto_df.drop(['index'], axis = 'columns', inplace = True)
Toronto_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


Lets print our NewYord Dataframe to see if we need to change anything.

In [40]:
NewYork_df

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


### 4.3 Visualizing Neighbourhoods for Toronto and NewYork

#### 4.3.1 Toronto 

Use Geopy too get the latitude and longitude values of Toronto, Ontario

In [46]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent = "toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Ontario are 43.6534817, -79.3839347.


Now we will create a map of Toronto with neighborhoods superimposed on top

In [47]:
#create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location = [latitude, longitude], zoom_start = 10)

#add markers
for lat, lng, borough, neighbourhood in zip(Toronto_df['Latitude'],Toronto_df['Longitude'], Toronto_df['Borough'], Toronto_df['Neighbourhood']):
    label = '{} , {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
    [lat, lng],
    radius = 5,
    popup = label,
    color = 'blue',
    fill = True,
    fill_color = '#3186cc',
    fill_opacity = 0.7,
    parse_html = False).add_to(map_toronto)

map_toronto

#### 4.3.2  New York

Use Geopy too get the latitude and longitude values of New York City, NY. 

In [48]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Now we will create a map of New York City with neighborhoods superimposed on top

In [49]:
#create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

#add markers to map
for lat, lng, borough, neighborhood in zip(NewYork_df['Latitude'], NewYork_df['Longitude'], NewYork_df['Borough'], NewYork_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### 4.4 Get Venues Data

Initializing Foursqare API Credentials 

Now that we have visualized the neighbourhoods, we need to find out what each neighbourhood is like and what are the common venue and venue categories within a 500m radius. This is where Foursquare comes into play.

In [50]:
CLIENT_ID = 'KAMUXP5GYY2BHIAN4YP4QIJQFVJQ3SLUABDGIZFPHTW35GHH'
CLIENT_SECRET = 'EHRCZVRSOJCC0YMSZEBUQPT44KF0GULCY4VVRGTOZ2YESRBB'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KAMUXP5GYY2BHIAN4YP4QIJQFVJQ3SLUABDGIZFPHTW35GHH
CLIENT_SECRET:EHRCZVRSOJCC0YMSZEBUQPT44KF0GULCY4VVRGTOZ2YESRBB


#### 4.4.1 Toronto Venues Data

In [53]:
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #create API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
        
        #make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        #return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [54]:
#getting venues data
venues_in_toronto = getNearbyVenues(Toronto_df['Neighbourhood'], Toronto_df['Latitude'], Toronto_df['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [55]:
#Checking Data
venues_in_toronto.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Brookbanks Pool,Pool
2,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant


#### 4.4.2 New York City Venues

In [56]:
#getting venues data
venues_in_newyork = getNearbyVenues(NewYork_df['Neighbourhood'], NewYork_df['Latitude'], NewYork_df['Longitude'])

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [57]:
#Checking Data
venues_in_newyork.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Walgreens,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',Donut Shop


### 4.5 One Hot Encoding

Since we are trying to find out what are the different kinds of venue categories present in each neighbourhood and then calculate the top 10 common venues to base our similarity on, we use the One Hot Encoding to work with our categorical datatype of the venue categories. This helps to convert the categorical data into numeric data.
We perform one hot encoding and then calculate the mean of the grouped venue categories for each of the neighbourhoods.

#### 4.5.1 Toronto

In [58]:
#one hot encoding
toronto_onehot = pd.get_dummies(venues_in_toronto[['Venue Category']], prefix="", prefix_sep="")

#add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = venues_in_toronto['Neighbourhood'] 

#move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Truck Stop,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [59]:
#group rows by neighbourhood and take mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Truck Stop,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
94,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### 4.5.2 New York

In [60]:
#one hot encoding
NewYork_onehot = pd.get_dummies(venues_in_newyork[['Venue Category']], prefix="", prefix_sep="")

#add neighborhood column back to dataframe
NewYork_onehot['Neighbourhood'] = venues_in_newyork['Neighbourhood'] 

#move neighborhood column to the first column
fixed_columns = [NewYork_onehot.columns[-1]] + list(NewYork_onehot.columns[:-1])
NewYork_onehot = NewYork_onehot[fixed_columns]

NewYork_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,...,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Wakefield,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [61]:
#group rows by neighbourhood and take mean of the frequency of occurrence of each category
newyork_grouped = NewYork_onehot.groupby('Neighbourhood').mean().reset_index()
newyork_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,...,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
1,Annadale,0.0,0.0,0.0,0.0,0.0,0.10,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
2,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
3,Arlington,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
4,Arrochar,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
296,Woodhaven,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
297,Woodlawn,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
298,Woodrow,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
299,Woodside,0.0,0.0,0.0,0.0,0.0,0.10,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0


### 4.6 Top 10 Venues

In our next step, We need to rank and label the top venue categories in our neighbourhood.

#### 4.6.1 Toronto

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [80]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

#create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

#create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Skating Rink,Breakfast Spot,Yoga Studio,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop
1,"Alderwood, Long Branch",Pizza Place,Pharmacy,Skating Rink,Pub,Sandwich Place,Coffee Shop,Gym,Gas Station,Garden Center,Dog Run
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Grocery Store,Mobile Phone Shop,Bridal Shop,Diner,Sandwich Place,Deli / Bodega,Restaurant,Supermarket
3,Bayview Village,Bank,Chinese Restaurant,Japanese Restaurant,Café,Yoga Studio,Dim Sum Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Grocery Store,Cosmetics Shop,Indian Restaurant,Butcher,Sushi Restaurant,Café,Pub


#### 4.6.2 New York

In [98]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

#create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

#create a new dataframe
newyork_venues_sorted = pd.DataFrame(columns=columns)
newyork_venues_sorted['Neighbourhood'] = newyork_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    newyork_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newyork_grouped.iloc[ind, :], num_top_venues)

newyork_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Deli / Bodega,Supermarket,Intersection,Pharmacy,Check Cashing Service,Martial Arts School,Fast Food Restaurant,Chinese Restaurant,Electronics Store
1,Annadale,Park,American Restaurant,Pizza Place,Diner,Restaurant,Liquor Store,Food,Bakery,Dance Studio,Train Station
2,Arden Heights,Pharmacy,Coffee Shop,Bus Stop,Pizza Place,Yoga Studio,Field,Event Space,Eye Doctor,Factory,Falafel Restaurant
3,Arlington,Bus Stop,Intersection,American Restaurant,Deli / Bodega,Fish Market,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market
4,Arrochar,Bus Stop,Italian Restaurant,Deli / Bodega,Bagel Shop,Athletics & Sports,Hotel,Mediterranean Restaurant,Middle Eastern Restaurant,Food Truck,Outdoors & Recreation


### 4.7 Model Building - KMEANS

We will be using KMeans Clustering Machine learning algorithm to cluster similar neighbourhoods together. We will be going with the number of clusters as 5.

#### 4.7.1 Toronto

In [66]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(toronto_grouped_clustering)

#Add cluster labels column to the top 10 common venue categories
toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#joining toronto_grouped with toronto_df data on neighbourhoods to add latitude and longitude for each neighbourhood 
toronto_merged = Toronto_df
toronto_merged = toronto_merged.join(toronto_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Park,Pool,Food & Drink Shop,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Pizza Place,Portuguese Restaurant,Hockey Arena,French Restaurant,Coffee Shop,Eastern European Restaurant,Drugstore,Donut Shop,Deli / Bodega,Distribution Center
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Bakery,Park,Breakfast Spot,Yoga Studio,Event Space,Café,Farmers Market,Pub,Dessert Shop
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Accessories Store,Furniture / Home Store,Coffee Shop,Miscellaneous Shop,Boutique,Event Space,Vietnamese Restaurant,Dance Studio,Drugstore
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Sushi Restaurant,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café,Portuguese Restaurant


In [69]:
#drop all NaN values
toronto_merged_nonan = toronto_merged.dropna(subset=['Cluster Labels'])

#getting latitude and longitude to Toronto
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent = "toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

#create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

#set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_nonan['Latitude'], toronto_merged_nonan['Longitude'], toronto_merged_nonan['Neighbourhood'], toronto_merged_nonan['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster 1 - Toronto

In [70]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 0, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Pizza Place,Portuguese Restaurant,Hockey Arena,French Restaurant,Coffee Shop,Eastern European Restaurant,Drugstore,Donut Shop,Deli / Bodega,Distribution Center
2,Downtown Toronto,0.0,Coffee Shop,Bakery,Park,Breakfast Spot,Yoga Studio,Event Space,Café,Farmers Market,Pub,Dessert Shop
3,North York,0.0,Clothing Store,Accessories Store,Furniture / Home Store,Coffee Shop,Miscellaneous Shop,Boutique,Event Space,Vietnamese Restaurant,Dance Studio,Drugstore
4,Downtown Toronto,0.0,Coffee Shop,Sushi Restaurant,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café,Portuguese Restaurant
7,North York,0.0,Gym,Coffee Shop,Beer Store,Japanese Restaurant,Restaurant,Caribbean Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop,Baseball Field
9,Downtown Toronto,0.0,Café,Theater,Coffee Shop,Japanese Restaurant,Hotel,Fast Food Restaurant,Steakhouse,Ramen Restaurant,Sporting Goods Shop,Burrito Place
10,North York,0.0,Pub,Bakery,Japanese Restaurant,Italian Restaurant,Yoga Studio,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore
13,North York,0.0,Gym,Coffee Shop,Beer Store,Japanese Restaurant,Restaurant,Caribbean Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop,Baseball Field
15,Downtown Toronto,0.0,Café,Coffee Shop,Gastropub,Park,Farmers Market,Japanese Restaurant,Thai Restaurant,Diner,Restaurant,Middle Eastern Restaurant
17,Etobicoke,0.0,Pharmacy,Coffee Shop,Beer Store,Shopping Plaza,Liquor Store,Café,Pet Store,Pizza Place,Comfort Food Restaurant,Dessert Shop


Cluster 2 - Toronto

In [71]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 1, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1.0,Park,Pool,Food & Drink Shop,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
21,York,1.0,Park,Women's Store,Bar,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop
35,East York,1.0,Park,Convenience Store,Yoga Studio,Department Store,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
52,North York,1.0,Park,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
64,York,1.0,Park,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
66,North York,1.0,Park,Convenience Store,Yoga Studio,Department Store,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
83,Central Toronto,1.0,Gym,Park,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
85,Scarborough,1.0,Playground,Park,Intersection,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
91,Downtown Toronto,1.0,Park,Playground,Trail,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
98,Etobicoke,1.0,Park,River,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center


Cluster 3 - Toronto

In [72]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 2, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,North York,2.0,Cafeteria,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store


Cluster 4 - Toronto

In [73]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 3, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,3.0,Baseball Field,Yoga Studio,Falafel Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
101,Etobicoke,3.0,Baseball Field,Yoga Studio,Falafel Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run


Cluster 5 - Toronto

In [74]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 4, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,4.0,Fast Food Restaurant,Print Shop,Yoga Studio,Department Store,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
8,East York,4.0,Pizza Place,Pharmacy,Intersection,Bank,Flea Market,Breakfast Spot,Athletics & Sports,Furniture / Home Store,Café,Gastropub
12,Scarborough,4.0,Construction & Landscaping,Bar,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop
14,East York,4.0,Beer Store,Park,Skating Rink,Curling Ice,Dance Studio,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Drugstore
16,York,4.0,Playground,Hockey Arena,Field,Trail,Yoga Studio,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
18,Scarborough,4.0,Restaurant,Breakfast Spot,Rental Car Location,Medical Center,Intersection,Mexican Restaurant,Bank,Yoga Studio,Discount Store,Diner
26,Scarborough,4.0,Athletics & Sports,Thai Restaurant,Hakka Restaurant,Fried Chicken Joint,Bakery,Caribbean Restaurant,Gas Station,Bank,Discount Store,Diner
27,North York,4.0,Golf Course,Athletics & Sports,Mediterranean Restaurant,Pool,Dog Run,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Distribution Center
29,East York,4.0,Indian Restaurant,Yoga Studio,Supermarket,Bank,Burger Joint,Coffee Shop,Discount Store,Fast Food Restaurant,Gas Station,Gym
31,West Toronto,4.0,Pharmacy,Bakery,Middle Eastern Restaurant,Music Venue,Park,Pizza Place,Pool,Portuguese Restaurant,Café,Brewery


#### 4.7.2 New York City

In [99]:
# set number of clusters
kclusters = 5

newyork_grouped_clustering = newyork_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(newyork_grouped_clustering)

#Add cluster labels column to the top 10 common venue categories
newyork_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#joining toronto_grouped with toronto_df data on neighbourhoods to add latitude and longitude for each neighbourhood 
newyork_merged = NewYork_df
newyork_merged = newyork_merged.join(newyork_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
newyork_merged = newyork_merged.dropna()
newyork_merged.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bronx,Co-op City,40.874294,-73.829939,0.0,Fast Food Restaurant,Discount Store,Liquor Store,Bagel Shop,Pharmacy,Park,Bus Station,Pizza Place,Grocery Store,Baseball Field
2,Bronx,Eastchester,40.887556,-73.827806,0.0,Deli / Bodega,Caribbean Restaurant,Diner,Bus Station,Pizza Place,Seafood Restaurant,Metro Station,Fast Food Restaurant,Automotive Shop,Donut Shop
3,Bronx,Fieldston,40.895437,-73.905643,3.0,Wine Shop,River,Bus Station,Plaza,Yoga Studio,Filipino Restaurant,Ethiopian Restaurant,Event Space,Eye Doctor,Factory
10,Bronx,Baychester,40.866858,-73.835798,3.0,Donut Shop,Convenience Store,Shopping Mall,Baseball Field,Mattress Store,Men's Store,Mexican Restaurant,Supermarket,Spanish Restaurant,Bus Station
12,Bronx,City Island,40.847247,-73.786488,3.0,Deli / Bodega,Seafood Restaurant,Thrift / Vintage Store,Italian Restaurant,History Museum,Spanish Restaurant,French Restaurant,Café,Smoke Shop,Boat or Ferry


In [100]:
#drop all NaN values
newyork_merged_nonan = newyork_merged.dropna(subset=['Cluster Labels'])

#getting latitude and longitude to Toronto
address = 'New York City, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

#create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

#set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newyork_merged_nonan['Latitude'], newyork_merged_nonan['Longitude'], newyork_merged_nonan['Neighbourhood'], newyork_merged_nonan['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster 1

In [101]:
newyork_merged_nonan.loc[newyork_merged_nonan['Cluster Labels'] == 0, newyork_merged_nonan.columns[[1] + list(range(5, newyork_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Co-op City,Fast Food Restaurant,Discount Store,Liquor Store,Bagel Shop,Pharmacy,Park,Bus Station,Pizza Place,Grocery Store,Baseball Field
2,Eastchester,Deli / Bodega,Caribbean Restaurant,Diner,Bus Station,Pizza Place,Seafood Restaurant,Metro Station,Fast Food Restaurant,Automotive Shop,Donut Shop
13,Bedford Park,Diner,Pizza Place,Mexican Restaurant,Chinese Restaurant,Sandwich Place,Deli / Bodega,Bus Station,Smoke Shop,Coffee Shop,Pub
17,East Tremont,Pizza Place,Puerto Rican Restaurant,Fried Chicken Joint,Breakfast Spot,Supermarket,Mobile Phone Shop,Café,Fast Food Restaurant,Shoe Store,Lounge
34,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega,Dessert Shop,Bakery,Food & Drink Shop,Bar,Market,Grocery Store,Liquor Store
39,Edgewater Park,Italian Restaurant,Deli / Bodega,Pizza Place,Asian Restaurant,Donut Shop,Coffee Shop,Park,Chinese Restaurant,Spa,Sports Bar
40,Castle Hill,Supermarket,Diner,Bus Station,Baseball Field,Bank,Pharmacy,Pizza Place,Market,Dim Sum Restaurant,Fish Market
55,Crown Heights,Pizza Place,Museum,Café,Bakery,Coffee Shop,Burger Joint,Supermarket,Bookstore,Candy Store,Farmers Market
72,East New York,Fried Chicken Joint,Plaza,Food Truck,Spanish Restaurant,Fast Food Restaurant,Metro Station,Caribbean Restaurant,Pizza Place,Chinese Restaurant,Deli / Bodega
80,Borough Park,Bank,Pizza Place,Fast Food Restaurant,Deli / Bodega,Pharmacy,Hotel,Chinese Restaurant,Coffee Shop,Farmers Market,Restaurant


Cluster 2

In [102]:
newyork_merged_nonan.loc[newyork_merged_nonan['Cluster Labels'] == 1, newyork_merged_nonan.columns[[1] + list(range(5, newyork_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
172,Breezy Point,Beach,Trail,Monument / Landmark,Yoga Studio,Filipino Restaurant,Event Space,Eye Doctor,Factory,Falafel Restaurant,Farm


Cluster 3

In [103]:
newyork_merged_nonan.loc[newyork_merged_nonan['Cluster Labels'] == 2, newyork_merged_nonan.columns[[1] + list(range(5, newyork_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Clason Point,Park,Convenience Store,Grocery Store,Pool,Boat or Ferry,Bus Stop,South American Restaurant,Field,Event Space,Eye Doctor
91,Bergen Beach,Harbor / Marina,Baseball Field,Park,Athletics & Sports,Playground,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Eye Doctor,Factory
303,Bayswater,Playground,Yoga Studio,Empanada Restaurant,Ethiopian Restaurant,Event Space,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market


Cluster 4

In [104]:
newyork_merged_nonan.loc[newyork_merged_nonan['Cluster Labels'] == 3, newyork_merged_nonan.columns[[1] + list(range(5, newyork_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Fieldston,Wine Shop,River,Bus Station,Plaza,Yoga Studio,Filipino Restaurant,Ethiopian Restaurant,Event Space,Eye Doctor,Factory
10,Baychester,Donut Shop,Convenience Store,Shopping Mall,Baseball Field,Mattress Store,Men's Store,Mexican Restaurant,Supermarket,Spanish Restaurant,Bus Station
12,City Island,Deli / Bodega,Seafood Restaurant,Thrift / Vintage Store,Italian Restaurant,History Museum,Spanish Restaurant,French Restaurant,Café,Smoke Shop,Boat or Ferry
29,Country Club,Sandwich Place,Chinese Restaurant,Athletics & Sports,Playground,Fast Food Restaurant,Entertainment Service,Ethiopian Restaurant,Event Space,Eye Doctor,Factory
43,Concourse,Grocery Store,Spanish Restaurant,Fried Chicken Joint,Bus Station,Fast Food Restaurant,Chinese Restaurant,Pharmacy,Supermarket,Food,Caribbean Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
268,Concourse Village,Sandwich Place,Pharmacy,Mexican Restaurant,Diner,Sporting Goods Shop,Bus Station,Shopping Mall,Chinese Restaurant,Coffee Shop,Convenience Store
282,Broadway Junction,Hotel,Donut Shop,Gas Station,Diner,Caribbean Restaurant,Fried Chicken Joint,Burger Joint,Breakfast Spot,Nightclub,Ice Cream Shop
283,Dumbo,Yoga Studio,Art Gallery,Bakery,Bookstore,Boxing Gym,Gym,Furniture / Home Store,Salad Place,Roof Deck,Playground
287,Egbertville,Italian Restaurant,Bagel Shop,Clothing Store,Cosmetics Shop,African Restaurant,Fish Market,Eye Doctor,Factory,Falafel Restaurant,Farm


Cluster 5

In [105]:
newyork_merged_nonan.loc[newyork_merged_nonan['Cluster Labels'] == 4, newyork_merged_nonan.columns[[1] + list(range(5, newyork_merged_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## 5. Results and Discussion

The neighbourhoods of Toronto are very diverse. There are a lot of different cafes and resturants with various backgrounds. Toronto has a lot of cafes which suggests that it maybe better suited for young individuals. It also offers decent amount of parks and gyms as well for physical activities. It has a lot of shopping options too with that of the Flea markets, flower shops, fish markets, Fishing stores, clothing stores. The main modes of transport seem to be Buses and trains.

Overall Toronto offers a multicultural and entertaining experience. 

New York doesn't offer as many cafes as Toronto but it sure offers a lot of resturants. New York city is huge compared to Toronto and it offers a variety of parks, big and small grocery store. New York City is more suited for the individuals that are between 35 and 55 years of age. The main mode of transport seems to be buses and trains. 

## 6. Conclusion

The purpose of this project was to explore Toronto and New York City and see what each city can offer to someone who is moving from their home and has a choice between New York City and Toronto. Although New York City offers more tourism activities, Toronto offers a well connected city with cafes, parks, gyms and various universities for education. 

We could see that each of the neighbourhoods in both the cities have a wide variety of experiences to offer which is unique in it's own way. The cultural diversity is quite evident which also gives the feeling of a sense of inclusion.

Overall, both cities would be fantastic to live in and it is upto the stakeholders to decide which kind of experience would they enjoy more. 

## 7. References

1. Report: https://medium.com/@oludayo.oguntoyinbo/the-battle-of-neighbourhood-my-londons-perspective-d363163771e0
2. Pandas Library: https://pandas.pydata.org/pandas-docs/stable/index.html
3. Folium Library: https://python-visualization.github.io/folium/
4. Geoplot Library: https://residentmario.github.io/geoplot/index.html
5. Foursqare API: https://developer.foursquare.com/developer/
6. Matplotlib Library: https://matplotlib.org/