# The Battle of Neighborhoods: Amazon Problem - Bolun Wu
## Project Phase I: Preparation
### Step 1: Introduction/Business Problem

#### Suppose Amazon wants to open a warehouse in Toronto to speed up delivery speed on some common items from Up to 2 Days to Same Day Delivery and, since the company already has a warehouse facilities in NYC and has proven the location works very well for implementing this strategy, Amazon wants the warehouse to be opened at a similar location in Toronto, comparing to the facilities in NYC. After this implementation, Amazon has forecasted it's revenue increase at about 10% in Toronto area.  

### Step 2: Dataset to be used
#### We are going to use the Toronto zipcode data used in the Week 3 project, along with the geolocation and FourSquare data. We are also going to reference the analysis we did in Lab 3 to NYC, as Amazon opened its' warehouse in Cluster 1 group (Murray Hill Cluster 1). So we are going to find a location in Toronto that's similar to the Cluster 1 group in NYC.

### Step 3: Install necessary packages and libraries

In [1]:
import numpy as np 
import pandas as pd 
import seaborn as sns
!conda install -c anaconda xlrd --yes
!conda install beautifulsoup4 --yes
!conda install lxml --yes
!conda install -c conda-forge geopy --yes
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('\nInstallation Complete\n')

print('Hello Capstone Project Course!')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - xlrd


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.10.14 |                0         128 KB  anaconda
    certifi-2020.6.20          |           py36_0         160 KB  anaconda
    openssl-1.1.1h             |       h7b6447c_0         3.8 MB  anaconda
    xlrd-1.2.0                 |           py36_0         188 KB  anaconda
    ------------------------------------------------------------
                                           Total:         4.3 MB

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    conda-forge::ca-certificates-2020.12.~ --> anaconda::ca-certificates-2020.10.14-0
  certifi            conda-forge::certifi-2020

#### Step 4: Define Foursquare Credentials and Version

In [2]:
CLIENT_ID = '0ZED3FE4CKCHPDQUQF2KCBJUCCAN2DMQDD42HPBND4QIRJAU' # your Foursquare ID
CLIENT_SECRET = 'YS22VTWHYMTNL3KCJERUUBZJJXY3RL4T3D21RVR2BIPLIRRP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0ZED3FE4CKCHPDQUQF2KCBJUCCAN2DMQDD42HPBND4QIRJAU
CLIENT_SECRET:YS22VTWHYMTNL3KCJERUUBZJJXY3RL4T3D21RVR2BIPLIRRP


##### The name of the Zipcode table is: wikitable sortable
##### The url: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

## Project Phase II: Import the Data of Toronto
### Step 1: Load "Beautiful Soup" and URL

In [3]:
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
wikiURL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
print('BeautifulSoup is loaded.')

BeautifulSoup is loaded.


In [4]:
# Check connection
s = requests.Session()
response = s.get(wikiURL, timeout = 5)
response

<Response [200]>

### Step 2: Construct the Dataframe and Load the Toronto Zipcode Data in

In [5]:
# Define Soup
soup = BeautifulSoup(wikiURL, 'lxml')



In [6]:
#Get the Table
web=urlopen(wikiURL)
source=BeautifulSoup(web, 'html.parser')
table=source.find('table', {'class': 'wikitable sortable'})
abbs=table.find_all()
values = [ele.text.strip() for ele in abbs]
print('Table Data Loaded')

Table Data Loaded


In [7]:
# Construct the dataframe
c1 = []
c2 = []
c3 = []
for row in source.findAll("tr"):
    cells = row.findAll('td')
    if len(cells)==3:
        c1.append(cells[0].find(text=True))
        c2.append(cells[1].find(text=True))
        c3.append(cells[2].find(text=True))

In [8]:
# Assign data into the dataframe
import pandas as pd
wiki_df=pd.DataFrame(c1,columns=['Postal Code'])
wiki_df['Borough']=c2
wiki_df['Neighbourhood']=c3
wiki_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


### Step 3: Clean the Data and Polish the Dataframe

In [9]:
# Get rid of /n
wiki_df = wiki_df.replace('\n','', regex=True)
print('Before cleaning the data, it contains',wiki_df.shape[0],'rows of data')

# Get rid "Not assigned" in Borough
wiki_df = wiki_df[wiki_df.Borough != 'Not assigned']
print('After cleaning the data, it contains', wiki_df.shape[0],'rows of data')
wiki_df.head()

Before cleaning the data, it contains 180 rows of data
After cleaning the data, it contains 103 rows of data


Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [10]:
# Merge rows if they share the same Postal Code
wiki_df.groupby('Postal Code')['Borough','Neighbourhood'].agg(','.join).reset_index()

  


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


### Step 5: Add Location Data to Dataframe

In [11]:
# Load location data since the API doesn't work
df_location_data = pd.read_csv('https://cocl.us/Geospatial_data')
df_location_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
# Insert the data to the main dataframe
wiki_new = wiki_df.join(df_location_data.set_index('Postal Code'), on = 'Postal Code')

# Fixing the index
wiki_new.reset_index(drop=True, inplace=True)
wiki_new.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


#### Now the data of Toronto has been fully imported
## Project Phase III: Import the Data of Manhattan
### Step 1: Import Data Set of NYC

In [13]:
# Download Data Set of NYC
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [14]:
# Load the Data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

# Give it a name
neighborhoods_data = newyork_data['features']

In [15]:
# Add data to an dataframe
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
nyc_neighborhoods = pd.DataFrame(columns=column_names)

# Fill the dataframe using Loops
for data in neighborhoods_data:
    nyc_borough = neighborhood_name = data['properties']['borough'] 
    nyc_neighborhood_name = data['properties']['name']
        
    nyc_neighborhood_latlon = data['geometry']['coordinates']
    nyc_neighborhood_lat = nyc_neighborhood_latlon[1]
    nyc_neighborhood_lon = nyc_neighborhood_latlon[0]
    
    nyc_neighborhoods = nyc_neighborhoods.append({'Borough': nyc_borough,
                                          'Neighborhood': nyc_neighborhood_name,
                                          'Latitude': nyc_neighborhood_lat,
                                          'Longitude': nyc_neighborhood_lon}, ignore_index=True)

print('NYC Data is loaded into Dataframe\n')
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(nyc_neighborhoods['Borough'].unique()),
        nyc_neighborhoods.shape[0]
    )
)

NYC Data is loaded into Dataframe

The dataframe has 5 boroughs and 306 neighborhoods.


#### Step 2: Get the Location Coordinates of General NYC

In [16]:
nyc_address = 'New York City, NY'

nyc_geolocator = Nominatim(user_agent="ny_explorer")
nyc_location = nyc_geolocator.geocode(nyc_address)
nyc_latitude = nyc_location.latitude
nyc_longitude = nyc_location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(nyc_latitude, nyc_longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Step 3: Fetch the Data of Manhattan and Get the Location Coordinates

In [17]:
manhattan_data = nyc_neighborhoods[nyc_neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

man_address = 'Manhattan, NY'

man_geolocator = Nominatim(user_agent="ny_explorer")
man_location = man_geolocator.geocode(man_address)
man_latitude = man_location.latitude
man_longitude = man_location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(man_latitude, man_longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [18]:
# This is the Location of Amazon Warehouse in Midtown Manhattan
nyc_amazon_warehouse_lat = 40.7495135
nyc_amazon_warehouse_lon = -73.9853705
nyc_amazon_location = nyc_amazon_warehouse_lat,nyc_amazon_warehouse_lon

#### Now the data of Manhattan has been fully imported
## Project Phase IV: Explore Manhattan and Apply Clustering and Segmentation
### Step 1: Create a Function to Repeat the Same Explore Process to All the Neighborhoods in Manhattan

In [19]:
def nyc_getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    nyc_venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        nyc_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(nyc_url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        nyc_venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nyc_nearby_venues = pd.DataFrame([item for nyc_venue_list in nyc_venues_list for item in nyc_venue_list])
    nyc_nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nyc_nearby_venues)

manhattan_venues = nyc_getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [20]:
# Find Out the Number of Unique Venues in Manhattan
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 331 uniques categories.


### Step 2: Analyze Each Neighborhood

In [63]:
# One Hot Encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# Add Neighborhood Column Back to Dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# Move Neighborhood Column to the First Column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

In [64]:
# Group Them by Neighborhood and Get the Mean of Occurance of Each Category
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

In [23]:
# Create a Function to Sort the Venues in Descending Order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Print Each Neighborhood Along With the Top 10 Most Common Venues
nyc_num_top_venues = 10

nyc_indicators = ['st', 'nd', 'rd']

# Create Columns According to Number of Top Venues
columns = ['Neighborhood']
for ind in np.arange(nyc_num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a New Dataframe and Put the Data in
nyc_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
nyc_neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    nyc_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], nyc_num_top_venues)

nyc_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Coffee Shop,Park,Hotel,Clothing Store,Gym,Memorial Site,Playground,Plaza,Shopping Mall,Burger Joint
1,Carnegie Hill,Coffee Shop,Café,Yoga Studio,Bookstore,French Restaurant,Wine Shop,Cosmetics Shop,Pizza Place,Gym,Gym / Fitness Center
2,Central Harlem,Cosmetics Shop,African Restaurant,Chinese Restaurant,Seafood Restaurant,American Restaurant,French Restaurant,Bar,Southern / Soul Food Restaurant,Caribbean Restaurant,Library
3,Chelsea,Coffee Shop,Art Gallery,Bakery,French Restaurant,American Restaurant,Wine Shop,Ice Cream Shop,Seafood Restaurant,Hotel,Park
4,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,Ice Cream Shop,Bubble Tea Shop,Optical Shop,American Restaurant,Hotpot Restaurant,Spa,Salon / Barbershop


### Step 3: Cluster Manhattan Neighborhoods

In [24]:
# Set Number of Clusters
from sklearn.metrics.pairwise import euclidean_distances
nyc_kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# Run K-Means Clustering
nyc_kmeans = KMeans(n_clusters=nyc_kclusters, random_state=0).fit(manhattan_grouped_clustering)

nyc_dists = euclidean_distances(nyc_kmeans.cluster_centers_)

# Check Cluster Labels Generated for Each Row in the Dataframe
nyc_kmeans.labels_[0:10] 

array([1, 2, 0, 2, 0, 2, 2, 3, 0, 2], dtype=int32)

In [25]:
# Add Clustering Labels
nyc_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', nyc_kmeans.labels_)

manhattan_merged = manhattan_data

# Merge Manhattan_grouped with Manhattan_data to Add Latitude/longitude for Each Neighborhood
manhattan_merged = manhattan_merged.join(nyc_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,2,Gym,Discount Store,Sandwich Place,Coffee Shop,Yoga Studio,Pizza Place,Supplement Shop,Steakhouse,Shopping Mall,Seafood Restaurant
1,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bakery,Cocktail Bar,Ice Cream Shop,Bubble Tea Shop,Optical Shop,American Restaurant,Hotpot Restaurant,Spa,Salon / Barbershop
2,Manhattan,Washington Heights,40.851903,-73.9369,3,Café,Bakery,Grocery Store,Spanish Restaurant,Chinese Restaurant,Sandwich Place,Tapas Restaurant,Italian Restaurant,Mobile Phone Shop,Coffee Shop
3,Manhattan,Inwood,40.867684,-73.92121,3,Mexican Restaurant,Café,Restaurant,Lounge,Pharmacy,Spanish Restaurant,Caribbean Restaurant,Chinese Restaurant,Frozen Yogurt Shop,Bakery
4,Manhattan,Hamilton Heights,40.823604,-73.949688,3,Pizza Place,Café,Coffee Shop,Mexican Restaurant,Deli / Bodega,Yoga Studio,Sushi Restaurant,Caribbean Restaurant,School,Chinese Restaurant


In [26]:
# Pull Out Neighborhood and Position Data for Calculating the Distance Later
nyc_cluster_pos = manhattan_merged[['Neighborhood','Latitude', 'Longitude']]
nyc_cluster_list = nyc_cluster_pos.set_index('Neighborhood').T.to_dict('list')

### Step 4: Show Visualization Maps

In [27]:
# Create Map
nyc_map_clusters = folium.Map(location=[man_latitude, man_longitude], zoom_start=11)

# Set Color Scheme For the Clusters
x1 = np.arange(nyc_kclusters)
ys1 = [i + x1 + (i*x1)**2 for i in range(nyc_kclusters)]
colors_array1 = cm.rainbow(np.linspace(0, 1, len(ys1)))
rainbow1 = [colors.rgb2hex(i) for i in colors_array1]

# Add Markers to the Map
markers_colors1 = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow1[cluster-1],
        fill=True,
        fill_color=rainbow1[cluster-1],
        fill_opacity=0.7).add_to(nyc_map_clusters)

# Add Marker for Amazon Warehouse
folium.Marker(
        [nyc_amazon_warehouse_lat,nyc_amazon_warehouse_lon],
        popup='Amazon Warehouse',
        icon=folium.Icon(icon='star')
        ).add_to(nyc_map_clusters)
       
nyc_map_clusters

In [28]:
# Calculate the Distance from the Center of the Cluster to the Amazon Warehouse

from geopy.distance import distance

for city, coord in nyc_cluster_list.items():
    d = int(distance(nyc_amazon_location, coord).m)
    print('From',city,'to the Amazon Warehouse is',d,'meters\n')


From Marble Hill to the Amazon Warehouse is 15451 meters

From Chinatown to the Amazon Warehouse is 3838 meters

From Washington Heights to the Amazon Warehouse is 12083 meters

From Inwood to the Amazon Warehouse is 14195 meters

From Hamilton Heights to the Amazon Warehouse is 8761 meters

From Manhattanville to the Amazon Warehouse is 7850 meters

From Central Harlem to the Amazon Warehouse is 8193 meters

From East Harlem to the Amazon Warehouse is 5883 meters

From Upper East Side to the Amazon Warehouse is 3581 meters

From Yorkville to the Amazon Warehouse is 4363 meters

From Lenox Hill to the Amazon Warehouse is 3045 meters

From Roosevelt Island to the Amazon Warehouse is 3364 meters

From Upper West Side to the Amazon Warehouse is 4293 meters

From Lincoln Square to the Amazon Warehouse is 2666 meters

From Clinton to the Amazon Warehouse is 1399 meters

From Midtown to the Amazon Warehouse is 654 meters

From Murray Hill to the Amazon Warehouse is 609 meters

From Chelsea t

### Step 5: Examine Clusters

#### Cluster 1:

In [51]:
nyc_Cluster1 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
nyc_Cluster1.head(10)

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,Ice Cream Shop,Bubble Tea Shop,Optical Shop,American Restaurant,Hotpot Restaurant,Spa,Salon / Barbershop
6,Central Harlem,Cosmetics Shop,African Restaurant,Chinese Restaurant,Seafood Restaurant,American Restaurant,French Restaurant,Bar,Southern / Soul Food Restaurant,Caribbean Restaurant,Library
12,Upper West Side,Italian Restaurant,Wine Bar,Bar,Bakery,Café,Coffee Shop,Thai Restaurant,Mediterranean Restaurant,Indian Restaurant,Ice Cream Shop
19,East Village,Bar,Pizza Place,Mexican Restaurant,Wine Bar,Salon / Barbershop,Cocktail Bar,Vietnamese Restaurant,Coffee Shop,Korean Restaurant,Bagel Shop
20,Lower East Side,Chinese Restaurant,Pharmacy,Café,Bakery,Art Gallery,Coffee Shop,Japanese Restaurant,Sandwich Place,Mediterranean Restaurant,Bubble Tea Shop
25,Manhattan Valley,Coffee Shop,Bar,Yoga Studio,French Restaurant,Thai Restaurant,Pizza Place,Mexican Restaurant,Indian Restaurant,Juice Bar,Hawaiian Restaurant
27,Gramercy,Bar,Bagel Shop,American Restaurant,Pizza Place,Italian Restaurant,Wine Shop,Grocery Store,Playground,Thai Restaurant,Mexican Restaurant


#### Cluster 2:

In [52]:
nyc_Cluster2 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
nyc_Cluster2.head(10)

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Midtown,Hotel,Clothing Store,Coffee Shop,Theater,Gym,Café,Sporting Goods Shop,Bookstore,Bakery,Steakhouse
23,Soho,Clothing Store,Italian Restaurant,Coffee Shop,Boutique,Mediterranean Restaurant,Sporting Goods Shop,Café,Salon / Barbershop,Bakery,Hotel
28,Battery Park City,Coffee Shop,Park,Hotel,Clothing Store,Gym,Memorial Site,Playground,Plaza,Shopping Mall,Burger Joint
33,Midtown South,Korean Restaurant,Hotel,Japanese Restaurant,Dessert Shop,American Restaurant,Hotel Bar,Gym / Fitness Center,Bakery,Burger Joint,Coffee Shop


#### Cluster 3:

In [31]:
nyc_Cluster3 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
nyc_Cluster3.head()

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Gym,Discount Store,Sandwich Place,Coffee Shop,Yoga Studio,Pizza Place,Supplement Shop,Steakhouse,Shopping Mall,Seafood Restaurant
8,Upper East Side,Italian Restaurant,Coffee Shop,Exhibit,Bakery,American Restaurant,Gym / Fitness Center,Juice Bar,Spa,French Restaurant,Yoga Studio
9,Yorkville,Italian Restaurant,Coffee Shop,Gym,Deli / Bodega,Bar,Sushi Restaurant,Wine Shop,Japanese Restaurant,Diner,Pub
10,Lenox Hill,Italian Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Cocktail Bar,Café,Burger Joint,Gym,Gym / Fitness Center,Thai Restaurant
13,Lincoln Square,Café,Plaza,Performing Arts Venue,Gym / Fitness Center,Theater,Concert Hall,Italian Restaurant,Indie Movie Theater,Gym,Bakery


#### Cluster 4:

In [32]:
nyc_Cluster4 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
nyc_Cluster4.head()

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Café,Bakery,Grocery Store,Spanish Restaurant,Chinese Restaurant,Sandwich Place,Tapas Restaurant,Italian Restaurant,Mobile Phone Shop,Coffee Shop
3,Inwood,Mexican Restaurant,Café,Restaurant,Lounge,Pharmacy,Spanish Restaurant,Caribbean Restaurant,Chinese Restaurant,Frozen Yogurt Shop,Bakery
4,Hamilton Heights,Pizza Place,Café,Coffee Shop,Mexican Restaurant,Deli / Bodega,Yoga Studio,Sushi Restaurant,Caribbean Restaurant,School,Chinese Restaurant
5,Manhattanville,Seafood Restaurant,Coffee Shop,Deli / Bodega,Chinese Restaurant,Mexican Restaurant,Italian Restaurant,Gastropub,Latin American Restaurant,Bike Trail,Lounge
7,East Harlem,Mexican Restaurant,Bakery,Thai Restaurant,Park,Sandwich Place,Deli / Bodega,Latin American Restaurant,Spa,Grocery Store,Gym


#### Cluster 5:

In [33]:
nyc_Cluster5 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
nyc_Cluster5

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Stuyvesant Town,Park,Boat or Ferry,Bar,Coffee Shop,Heliport,Fountain,Farmers Market,Gas Station,Skating Rink,Bistro


## Project Phase IV: Analyze the Data of Toronto
### Step 1: Analyze Every Neighborhood

In [84]:
tor_address = 'Toronto, ON'

tor_geolocator = Nominatim(user_agent="ny_explorer")
tor_location = tor_geolocator.geocode(tor_address)
tor_latitude = tor_location.latitude
tor_longitude = tor_location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(tor_latitude, tor_longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [85]:
# Now for all other area
def getNearbyVenues(names, neighborhood_latitude, neighborhood_longitude, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, neighborhood_latitude, neighborhood_longitude):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Toronto_venues = getNearbyVenues(names=wiki_new['Borough'],
                                   neighborhood_latitude=wiki_new['Latitude'],
                                   neighborhood_longitude=wiki_new['Longitude']
                                  )

North York
North York
Downtown Toronto
North York
Downtown Toronto
Etobicoke
Scarborough
North York
East York
Downtown Toronto
North York
Etobicoke
Scarborough
North York
East York
Downtown Toronto
York
Etobicoke
Scarborough
East Toronto
Downtown Toronto
York
Scarborough
East York
Downtown Toronto
Downtown Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
North York
North York
Scarborough
North York
North York
East Toronto
North York
York
North York
Scarborough
North York
North York
Central Toronto
Central Toronto
York
York
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Etobicoke
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Mississauga
Etobicoke
Scarborough
Central Toronto
Downtown Toronto
West Toron

In [86]:
Toronto_venues.groupby('Borough').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,105,105,105,105,105,105
Downtown Toronto,1223,1223,1223,1223,1223,1223
East Toronto,122,122,122,122,122,122
East York,75,75,75,75,75,75
Etobicoke,75,75,75,75,75,75
Mississauga,12,12,12,12,12,12
North York,241,241,241,241,241,241
Scarborough,86,86,86,86,86,86
West Toronto,154,154,154,154,154,154
York,17,17,17,17,17,17


In [87]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 269 uniques categories.


In [88]:
# One Hot Encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Borough'] = Toronto_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Borough,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [89]:
Toronto_grouped = Toronto_onehot.groupby('Borough').mean().reset_index()
Toronto_grouped.head()

Unnamed: 0,Borough,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009524,0.0,...,0.0,0.0,0.0,0.0,0.009524,0.0,0.0,0.0,0.0,0.009524
1,Downtown Toronto,0.0,0.000818,0.000818,0.000818,0.001635,0.001635,0.001635,0.0139,0.001635,...,0.0,0.01063,0.001635,0.0,0.003271,0.0,0.006541,0.0,0.0,0.005724
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02459,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393
3,East York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.013333
4,Etobicoke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0


In [90]:
num_top_venues = 5

for hood in Toronto_grouped['Borough']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Borough'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
            venue  freq
0     Coffee Shop  0.08
1  Sandwich Place  0.07
2            Café  0.06
3            Park  0.06
4     Pizza Place  0.04


----Downtown Toronto----
                 venue  freq
0          Coffee Shop  0.10
1                 Café  0.05
2                Hotel  0.03
3  Japanese Restaurant  0.03
4           Restaurant  0.03


----East Toronto----
                venue  freq
0    Greek Restaurant  0.07
1         Coffee Shop  0.06
2             Brewery  0.04
3  Italian Restaurant  0.04
4      Ice Cream Shop  0.03


----East York----
          venue  freq
0          Bank  0.05
1  Intersection  0.05
2   Coffee Shop  0.05
3  Burger Joint  0.04
4          Park  0.04


----Etobicoke----
            venue  freq
0     Pizza Place  0.09
1  Sandwich Place  0.07
2        Pharmacy  0.05
3     Coffee Shop  0.04
4          Bakery  0.04


----Mississauga----
                      venue  freq
0               Coffee Shop  0.25
1                     Hotel  0.17


In [91]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [99]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough'] = Toronto_grouped['Borough']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Café,Park,Sushi Restaurant,Pizza Place,Dessert Shop,Restaurant,Diner,Italian Restaurant
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
2,East Toronto,Greek Restaurant,Coffee Shop,Brewery,Italian Restaurant,Ice Cream Shop,Park,Pizza Place,Restaurant,Fast Food Restaurant,American Restaurant
3,East York,Intersection,Coffee Shop,Bank,Pizza Place,Burger Joint,Sporting Goods Shop,Sandwich Place,Park,Pet Store,Pharmacy
4,Etobicoke,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store


### Step 7: Clustering 

In [100]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 3, 4, 2, 1, 3, 1, 0], dtype=int32)

In [101]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Toronto_merged = wiki_new

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Borough'), on='Borough')

In [103]:
Toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,1,Greek Restaurant,Coffee Shop,Brewery,Italian Restaurant,Ice Cream Shop,Park,Pizza Place,Restaurant,Fast Food Restaurant,American Restaurant
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store


### Step 8: Visualize the Resulting Clusters

In [96]:
# create map
map_clusters = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Borough'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Step 9: Analyze the Clusters

#### Cluster 1:

In [106]:
tor_Cluster1 = Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
tor_Cluster1

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,York,0,Park,Convenience Store,Grocery Store,Field,Skating Rink,Sandwich Place,Garden,Hockey Arena,Pool,Discount Store
21,York,0,Park,Convenience Store,Grocery Store,Field,Skating Rink,Sandwich Place,Garden,Hockey Arena,Pool,Discount Store
56,York,0,Park,Convenience Store,Grocery Store,Field,Skating Rink,Sandwich Place,Garden,Hockey Arena,Pool,Discount Store
63,York,0,Park,Convenience Store,Grocery Store,Field,Skating Rink,Sandwich Place,Garden,Hockey Arena,Pool,Discount Store
64,York,0,Park,Convenience Store,Grocery Store,Field,Skating Rink,Sandwich Place,Garden,Hockey Arena,Pool,Discount Store


#### Cluster 2:

In [125]:
tor_Cluster2 = Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
tor_Cluster2.head(33)

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
1,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
2,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
3,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
4,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
7,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
9,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym
10,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
13,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
15,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Italian Restaurant,Bakery,Park,Seafood Restaurant,Gym


In [126]:
tor_Cluster2.tail(30)

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
53,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
54,East Toronto,1,Greek Restaurant,Coffee Shop,Brewery,Italian Restaurant,Ice Cream Shop,Park,Pizza Place,Restaurant,Fast Food Restaurant,American Restaurant
55,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
57,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
59,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
60,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
61,Central Toronto,1,Coffee Shop,Sandwich Place,Café,Park,Sushi Restaurant,Pizza Place,Dessert Shop,Restaurant,Diner,Italian Restaurant
62,Central Toronto,1,Coffee Shop,Sandwich Place,Café,Park,Sushi Restaurant,Pizza Place,Dessert Shop,Restaurant,Diner,Italian Restaurant
66,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant


In [141]:
tor_Cluster2_target = tor_Cluster2[~tor_Cluster2.Borough.str.contains("Downtown Toronto")]
tor_Cluster2_target

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
1,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
3,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
7,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
10,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
13,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
19,East Toronto,1,Greek Restaurant,Coffee Shop,Brewery,Italian Restaurant,Ice Cream Shop,Park,Pizza Place,Restaurant,Fast Food Restaurant,American Restaurant
27,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
28,North York,1,Coffee Shop,Clothing Store,Japanese Restaurant,Restaurant,Park,Pizza Place,Grocery Store,Sandwich Place,Bank,Fast Food Restaurant
31,West Toronto,1,Café,Bar,Coffee Shop,Restaurant,Italian Restaurant,Bakery,Breakfast Spot,Grocery Store,Park,Pizza Place


#### Cluster 3:

In [127]:
tor_Cluster3 = Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
_Cluster3

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,Mississauga,2,Coffee Shop,Hotel,Fried Chicken Joint,Burrito Place,Gym,Mediterranean Restaurant,American Restaurant,Sandwich Place,Middle Eastern Restaurant,Diner


#### Cluster 4:

In [128]:
tor_Cluster4 = Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
tor_Cluster4

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,3,Fast Food Restaurant,Coffee Shop,Bank,Bakery,Pizza Place,Intersection,Chinese Restaurant,Breakfast Spot,Gas Station,Thai Restaurant
8,East York,3,Intersection,Coffee Shop,Bank,Pizza Place,Burger Joint,Sporting Goods Shop,Sandwich Place,Park,Pet Store,Pharmacy
12,Scarborough,3,Fast Food Restaurant,Coffee Shop,Bank,Bakery,Pizza Place,Intersection,Chinese Restaurant,Breakfast Spot,Gas Station,Thai Restaurant
14,East York,3,Intersection,Coffee Shop,Bank,Pizza Place,Burger Joint,Sporting Goods Shop,Sandwich Place,Park,Pet Store,Pharmacy
18,Scarborough,3,Fast Food Restaurant,Coffee Shop,Bank,Bakery,Pizza Place,Intersection,Chinese Restaurant,Breakfast Spot,Gas Station,Thai Restaurant
22,Scarborough,3,Fast Food Restaurant,Coffee Shop,Bank,Bakery,Pizza Place,Intersection,Chinese Restaurant,Breakfast Spot,Gas Station,Thai Restaurant
23,East York,3,Intersection,Coffee Shop,Bank,Pizza Place,Burger Joint,Sporting Goods Shop,Sandwich Place,Park,Pet Store,Pharmacy
26,Scarborough,3,Fast Food Restaurant,Coffee Shop,Bank,Bakery,Pizza Place,Intersection,Chinese Restaurant,Breakfast Spot,Gas Station,Thai Restaurant
29,East York,3,Intersection,Coffee Shop,Bank,Pizza Place,Burger Joint,Sporting Goods Shop,Sandwich Place,Park,Pet Store,Pharmacy
32,Scarborough,3,Fast Food Restaurant,Coffee Shop,Bank,Bakery,Pizza Place,Intersection,Chinese Restaurant,Breakfast Spot,Gas Station,Thai Restaurant


#### Cluster 5:

In [129]:
tor_Cluster5 = Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]
tor_Cluster5

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
11,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
17,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
70,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
77,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
88,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
89,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
93,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
94,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store
98,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Grocery Store,Fast Food Restaurant,Bakery,Gym,Coffee Shop,Liquor Store,Beer Store


## Project Phase V: Conclusion

#### It looks like Cluster 1 area in Toronto shares very simillar venues to the Cluster 2 area in Manhattan and the most intuitive location should be at the center of the city, so the wareshouse should be built somewhere near Downtown Toronto; however, by comparing the actual location of the Amazon warehouses in Toronto, they are typically located at the North, West, and East side to the Downtown area. An explanation to this is the cost to rent/build a warehouse in Downtown area of Toronto exceeds the expected profitability and the logistics in Toronto is very different than in Manhattan. Therefore, using this methodology alone to pick a logistic location for a business is incompetent and further analysis with more inputs and other methodology is required. As a side note, the analysis shows us which area has the most food places so at least we know where to eat if we visit Toronto in the future, even we haven't been here before.