<h4>Coursera Capstone Report by Amit Bendre<h4>

<B>The Battle of Neighborhoods<B>

<B>1. Introduction<B>

In this project, we study two important cities Manhattan & Toronto. The objective of the study is to segment areas of Manhattan & Toronto into most common places using Foursquare API.

Using segmentation and clustering, we aim to find out similarities and dissimilarities of both the cities.

The target audience of this project are tourists.

<B>2. Data<B>

We will use the Foursquare API to explore neighborhoods in New York City. You will use the explore function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. 

For New York dataset exists for free on the web. Link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data.

<B> Import Packages<B>

In [9]:
import numpy as np 

import pandas as pd 
import json 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

from bs4 import BeautifulSoup

print('Download Complete!')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Download Complete!


<b>Load and Explore Data<b>

In [15]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


In [24]:
neighborhoods_data = newyork_data['features']

In [26]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [27]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [28]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Use geopy library to get the latitude and longitude values of Manhattan.

In [37]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


#### Let's get the geographical coordinates of Manhattan

In [38]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


In [39]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

#### Define Foursquare Credentials and Version

In [40]:
CLIENT_ID = '21J4ATBEIFC4S3DQT3SDBWGJGUAY3GZ34TOEYLVMCFHN4KAR' # your Foursquare ID
CLIENT_SECRET = 'QRWAQABWRS3FJNJRISCKRFAJ4WSIFE1SQQC5VRDMBB5B5YQF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 21J4ATBEIFC4S3DQT3SDBWGJGUAY3GZ34TOEYLVMCFHN4KAR
CLIENT_SECRET:QRWAQABWRS3FJNJRISCKRFAJ4WSIFE1SQQC5VRDMBB5B5YQF


#### Let's explore the first neighborhood in our dataframe.

In [44]:
manhattan_data.loc[0, 'Neighborhood']
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


In [53]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius



# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=21J4ATBEIFC4S3DQT3SDBWGJGUAY3GZ34TOEYLVMCFHN4KAR&client_secret=QRWAQABWRS3FJNJRISCKRFAJ4WSIFE1SQQC5VRDMBB5B5YQF&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=500&limit=100'

In [54]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
 # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
# type your answer here

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


#### Analyze each neighborhood

In [56]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,...,Volleyball Court,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [57]:
manhattan_onehot.shape
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped
manhattan_grouped.shape

(40, 332)

In [58]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
                venue  freq
0         Coffee Shop  0.08
1                Park  0.07
2               Hotel  0.05
3           Wine Shop  0.03
4  Italian Restaurant  0.03


----Carnegie Hill----
            venue  freq
0     Pizza Place  0.06
1     Coffee Shop  0.05
2  Cosmetics Shop  0.05
3            Café  0.04
4             Spa  0.03


----Central Harlem----
                 venue  freq
0   African Restaurant  0.07
1       Cosmetics Shop  0.05
2   Seafood Restaurant  0.05
3  American Restaurant  0.05
4    French Restaurant  0.05


----Chelsea----
                venue  freq
0         Coffee Shop  0.07
1  Italian Restaurant  0.06
2      Ice Cream Shop  0.05
3              Bakery  0.04
4           Nightclub  0.04


----Chinatown----
                   venue  freq
0     Chinese Restaurant  0.09
1        Bubble Tea Shop  0.06
2     Dim Sum Restaurant  0.04
3  Vietnamese Restaurant  0.04
4    American Restaurant  0.04


----Civic Center----
                  venue 

In [59]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [60]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Coffee Shop,Park,Hotel,Wine Shop,Italian Restaurant,Gym,Food Court,Memorial Site,Women's Store,BBQ Joint
1,Carnegie Hill,Pizza Place,Coffee Shop,Cosmetics Shop,Café,French Restaurant,Bar,Bookstore,Spa,Yoga Studio,Gym
2,Central Harlem,African Restaurant,French Restaurant,American Restaurant,Fried Chicken Joint,Gym / Fitness Center,Chinese Restaurant,Cosmetics Shop,Seafood Restaurant,Dessert Shop,Event Space
3,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,Bakery,American Restaurant,Nightclub,Theater,Seafood Restaurant,Hotel,Art Gallery
4,Chinatown,Chinese Restaurant,Bubble Tea Shop,American Restaurant,Cocktail Bar,Vietnamese Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Salon / Barbershop,Noodle House,Bakery


## 4. Cluster Neighborhoods

In [62]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 4, 0, 0, 0, 0, 0, 4, 0, 3], dtype=int32)

In [63]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Discount Store,Coffee Shop,Yoga Studio,Deli / Bodega,Supplement Shop,Steakhouse,Shopping Mall,Shoe Store,Seafood Restaurant,Sandwich Place
1,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bubble Tea Shop,American Restaurant,Cocktail Bar,Vietnamese Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Salon / Barbershop,Noodle House,Bakery
2,Manhattan,Washington Heights,40.851903,-73.9369,4,Café,Mobile Phone Shop,Bakery,Pizza Place,Tapas Restaurant,Caribbean Restaurant,Chinese Restaurant,Women's Store,Grocery Store,Shoe Store
3,Manhattan,Inwood,40.867684,-73.92121,4,Café,Mexican Restaurant,Pizza Place,Lounge,Wine Bar,Restaurant,Bakery,Deli / Bodega,Frozen Yogurt Shop,Chinese Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Deli / Bodega,Mexican Restaurant,Coffee Shop,Café,Pizza Place,Liquor Store,Cocktail Bar,Sandwich Place,School,Chinese Restaurant


In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

In [65]:
#Cluster 1
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Bubble Tea Shop,American Restaurant,Cocktail Bar,Vietnamese Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Salon / Barbershop,Noodle House,Bakery
6,Central Harlem,African Restaurant,French Restaurant,American Restaurant,Fried Chicken Joint,Gym / Fitness Center,Chinese Restaurant,Cosmetics Shop,Seafood Restaurant,Dessert Shop,Event Space
8,Upper East Side,Italian Restaurant,Exhibit,Art Gallery,Coffee Shop,Bakery,Juice Bar,Hotel,Boutique,Gym / Fitness Center,French Restaurant
9,Yorkville,Bar,Gym,Coffee Shop,Italian Restaurant,Pizza Place,Japanese Restaurant,Deli / Bodega,Sushi Restaurant,Mexican Restaurant,Pub
10,Lenox Hill,Italian Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Gym / Fitness Center,Sporting Goods Shop,Gym,Burger Joint,Cycle Studio,Cosmetics Shop
12,Upper West Side,Italian Restaurant,Bar,Vegetarian / Vegan Restaurant,Burger Joint,Indian Restaurant,Bakery,Coffee Shop,Wine Bar,Sushi Restaurant,Mediterranean Restaurant
13,Lincoln Square,Theater,Gym / Fitness Center,Italian Restaurant,Plaza,Concert Hall,French Restaurant,Café,Performing Arts Venue,Park,Opera House
14,Clinton,Theater,Coffee Shop,Gym / Fitness Center,American Restaurant,Italian Restaurant,Gym,Hotel,Spa,Wine Shop,Food Court
15,Midtown,Hotel,Coffee Shop,Steakhouse,Theater,Clothing Store,Food Truck,Park,Bookstore,Bakery,Sporting Goods Shop
16,Murray Hill,Coffee Shop,Hotel,Bar,Japanese Restaurant,Spa,Italian Restaurant,Gym,French Restaurant,Sandwich Place,Salon / Barbershop


In [66]:
#Cluster 2
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Discount Store,Coffee Shop,Yoga Studio,Deli / Bodega,Supplement Shop,Steakhouse,Shopping Mall,Shoe Store,Seafood Restaurant,Sandwich Place


In [67]:
#Cluster 3
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Stuyvesant Town,Bar,Boat or Ferry,Park,Playground,Gas Station,Pet Service,Farmers Market,Basketball Court,Baseball Field,Cocktail Bar


In [68]:
#Cluster 4
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Roosevelt Island,Park,Sandwich Place,Gym,Dog Run,Bus Station,Supermarket,Farmers Market,Metro Station,Outdoors & Recreation,School
26,Morningside Heights,Coffee Shop,Bookstore,American Restaurant,Park,Food Truck,Deli / Bodega,Tennis Court,Burger Joint,Sandwich Place,Ethiopian Restaurant
28,Battery Park City,Coffee Shop,Park,Hotel,Wine Shop,Italian Restaurant,Gym,Food Court,Memorial Site,Women's Store,BBQ Joint
29,Financial District,Coffee Shop,Hotel,Wine Shop,Bar,Gym,Steakhouse,Italian Restaurant,Food Truck,Pizza Place,Falafel Restaurant


In [69]:
#Cluster 5
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Café,Mobile Phone Shop,Bakery,Pizza Place,Tapas Restaurant,Caribbean Restaurant,Chinese Restaurant,Women's Store,Grocery Store,Shoe Store
3,Inwood,Café,Mexican Restaurant,Pizza Place,Lounge,Wine Bar,Restaurant,Bakery,Deli / Bodega,Frozen Yogurt Shop,Chinese Restaurant
4,Hamilton Heights,Deli / Bodega,Mexican Restaurant,Coffee Shop,Café,Pizza Place,Liquor Store,Cocktail Bar,Sandwich Place,School,Chinese Restaurant
5,Manhattanville,Italian Restaurant,Mexican Restaurant,Chinese Restaurant,Seafood Restaurant,Deli / Bodega,Japanese Curry Restaurant,Ramen Restaurant,Sushi Restaurant,Supermarket,Burger Joint
7,East Harlem,Mexican Restaurant,Bakery,Latin American Restaurant,Deli / Bodega,Thai Restaurant,Liquor Store,Street Art,Gym,Grocery Store,Coffee Shop
20,Lower East Side,Coffee Shop,Café,Ramen Restaurant,Cocktail Bar,Latin American Restaurant,Sandwich Place,Chinese Restaurant,Art Gallery,Shoe Store,Japanese Restaurant
22,Little Italy,Bakery,Café,Yoga Studio,Cocktail Bar,Bubble Tea Shop,Seafood Restaurant,Sandwich Place,Ice Cream Shop,Chinese Restaurant,Salon / Barbershop
25,Manhattan Valley,Coffee Shop,Pizza Place,Yoga Studio,Bar,Spa,Italian Restaurant,Mexican Restaurant,Café,Thai Restaurant,Indian Restaurant
30,Carnegie Hill,Pizza Place,Coffee Shop,Cosmetics Shop,Café,French Restaurant,Bar,Bookstore,Spa,Yoga Studio,Gym
36,Tudor City,Mexican Restaurant,Park,Greek Restaurant,Café,Pizza Place,Dog Run,Diner,Deli / Bodega,Hotel,Burger Joint


## Toronto

<b>Download Dataset<b>

In [72]:
# Read the wiki page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
r = requests.get(url)

<b>Extracting the data<b>

In [74]:
# Extract the raw_table info
raw_postal_code = []
raw_borough = []
raw_neighborhood = []
html_context = r.text
soup = BeautifulSoup(html_context,'lxml')
table = soup.find_all('table')[0]
for row in table.find_all('tr'):
    count = 0
    columns = row.find_all('td')
    for column in columns:
            count += 1
            if (count == 1):
               raw_postal_code.append(column.get_text())
            elif (count == 2):
                raw_borough.append(column.get_text())
            else:
                raw_neighborhood.append(column.get_text())
raw_neighborhood = list(map(lambda x:x.strip(),raw_neighborhood))
raw_dict = {'PostalCode':raw_postal_code,'Borough':raw_borough,'Neighborhood':raw_neighborhood}
raw_table = pd.DataFrame(raw_dict) 
print(raw_table.shape)
raw_table.head()

(289, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<b> Only process the cells that have an assigned Borough. Ignore cells with a Borough that is Not assigned.<b>

In [76]:
dropborough_raw_table = raw_table[raw_table.Borough != 'Not assigned']
dropborough_raw_table.reset_index(drop=True,inplace = True)
print(dropborough_raw_table.shape)
dropborough_raw_table.head(5)

(212, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


<b>More than one neighborhood can exist in one postal code area.<b>

In [78]:
# Make single row for each postal code
dropborough_raw_table = dropborough_raw_table.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
print(dropborough_raw_table.shape)
dropborough_raw_table.head()

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<b>If a cell has a Borough but a Not assigned Neighborhood, then the Neighborhood will be the same as the Borough.<b>

In [80]:
mask = dropborough_raw_table.Neighborhood == 'Not assigned'
dropborough_raw_table.loc[mask,'Neighborhood'] = dropborough_raw_table.loc[mask,'Borough']
dropborough_raw_table.rename(columns={'PostalCode': 'PostalCode','Borough': 'Borough','Neighborhood':'Neighborhood'}, inplace=True)
print("Shape of the DataFrame:",dropborough_raw_table.shape)
dropborough_raw_table

Shape of the DataFrame: (103, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


<b>Merge geographical co-ordinates of each postal code<b>

In [82]:
coordinate_path ='http://cocl.us/Geospatial_data/Geospatial_Coordinates.csv'
coordinate_data = pd.read_csv(coordinate_path)
coordinate_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [83]:
frames = [dropborough_raw_table, coordinate_data]
neighborhoods = pd.concat(frames, axis=1)
neighborhoods.drop(columns = ['Postal Code'], inplace = True)
print(neighborhoods.shape)
neighborhoods.head()

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<b>Filter Toronto Data<b>

In [84]:
toronto_borough = ['East Toronto','Central Toronto','Downtown Toronto','West Toronto']
toronto_data = neighborhoods[neighborhoods.Borough.isin(toronto_borough)].reset_index(drop=True)
print(toronto_data .shape)
toronto_data .head()

(38, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


<b>Geo-graphical co-ordinates of Toronto<b>

In [86]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [87]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<b>Define Foursquare Credentials<b>

In [89]:
CLIENT_ID = '21J4ATBEIFC4S3DQT3SDBWGJGUAY3GZ34TOEYLVMCFHN4KAR' # your Foursquare ID
CLIENT_SECRET = 'QRWAQABWRS3FJNJRISCKRFAJ4WSIFE1SQQC5VRDMBB5B5YQF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 21J4ATBEIFC4S3DQT3SDBWGJGUAY3GZ34TOEYLVMCFHN4KAR
CLIENT_SECRET:QRWAQABWRS3FJNJRISCKRFAJ4WSIFE1SQQC5VRDMBB5B5YQF


In [90]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
LIMIT = 100
radius = 500
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

In [92]:
print(toronto_venues.shape)
toronto_venues.head()

(1694, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
1,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
2,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
3,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
4,"The Danforth West, Riverdale",43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop


In [93]:
# Count of each Neighbohood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55
"Brockton, Exhibition Place, Parkdale Village",19,19,19,19,19,19
Business Reply Mail Processing Centre 969 Eastern,19,19,19,19,19,19
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",47,47,47,47,47,47
Central Bay Street,83,83,83,83,83,83
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,15,15,15,15,15,15
Church and Wellesley,86,86,86,86,86,86


In [94]:
#Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 235 uniques categories.


<b>Analyze each Neighborhood<b>

In [96]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [100]:
toronto_onehot.shape
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.012048,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.01,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011628,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011628,0.0,0.011628,0.0,0.011628,0.0


<b>Let's print each neighborhood along with the top 5 most common venues<b>

In [103]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2  American Restaurant  0.04
3      Thai Restaurant  0.04
4           Steakhouse  0.04


----Berczy Park----
            venue  freq
0     Coffee Shop  0.07
1    Cocktail Bar  0.05
2      Restaurant  0.05
3             Pub  0.04
4  Farmers Market  0.04


----Brockton, Exhibition Place, Parkdale Village----
               venue  freq
0               Café  0.11
1     Breakfast Spot  0.11
2        Coffee Shop  0.11
3  Convenience Store  0.05
4      Grocery Store  0.05


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.11
1       Burrito Place  0.05
2       Garden Center  0.05
3              Garden  0.05
4                 Spa  0.05


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge  0.14
1  Airport Te

<b>Let's put that into a pandas dataframe<b>

In [104]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [105]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(38, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,American Restaurant,Thai Restaurant,Gym,Bakery,Clothing Store,Asian Restaurant,Bar
1,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Pub,Café,Cheese Shop,Seafood Restaurant,Farmers Market,Italian Restaurant,Steakhouse
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Café,Breakfast Spot,Gym,Furniture / Home Store,Pet Store,Nightclub,Climbing Gym,Caribbean Restaurant,Burrito Place
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Yoga Studio,Auto Workshop,Comic Shop,Pizza Place,Butcher,Recording Studio,Restaurant,Burrito Place,Brewery
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Boat or Ferry,Sculpture Garden,Plane,Boutique,Airport Gate,Airport


<b>Cluster Neighborhoods<b>

In [106]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0,
       4, 0, 1, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [107]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Pub,Women's Store,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Health Food Store,Furniture / Home Store,Pub,Pizza Place,Liquor Store
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Fast Food Restaurant,Sandwich Place,Italian Restaurant,Sushi Restaurant,Burrito Place,Pub,Burger Joint,Ice Cream Shop,Fish & Chips Shop,Movie Theater
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Italian Restaurant,Bakery,American Restaurant,Chinese Restaurant,Fish Market,Juice Bar,Latin American Restaurant,Seafood Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,Bus Line,Park,Swim School,Dim Sum Restaurant,Women's Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


In [108]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<b>Examine Cluster<b>

In [109]:
#Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Coffee Shop,Pub,Women's Store,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
1,East Toronto,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Health Food Store,Furniture / Home Store,Pub,Pizza Place,Liquor Store
2,East Toronto,0,Fast Food Restaurant,Sandwich Place,Italian Restaurant,Sushi Restaurant,Burrito Place,Pub,Burger Joint,Ice Cream Shop,Fish & Chips Shop,Movie Theater
3,East Toronto,0,Café,Coffee Shop,Italian Restaurant,Bakery,American Restaurant,Chinese Restaurant,Fish Market,Juice Bar,Latin American Restaurant,Seafood Restaurant
5,Central Toronto,0,Clothing Store,Gym,Sandwich Place,Burger Joint,Park,Breakfast Spot,Hotel,Food & Drink Shop,Dance Studio,Dumpling Restaurant
6,Central Toronto,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Fast Food Restaurant,Mexican Restaurant,Diner,Dessert Shop,Park,Chinese Restaurant,Rental Car Location
7,Central Toronto,0,Sandwich Place,Dessert Shop,Pizza Place,Sushi Restaurant,Coffee Shop,Restaurant,Café,Italian Restaurant,Seafood Restaurant,Toy / Game Store
9,Central Toronto,0,Coffee Shop,Pub,Fried Chicken Joint,Light Rail Station,Sushi Restaurant,Bagel Shop,Sports Bar,American Restaurant,Pizza Place,Convenience Store
11,Downtown Toronto,0,Coffee Shop,Restaurant,Park,Pub,Italian Restaurant,Bakery,Pizza Place,Café,Breakfast Spot,Caribbean Restaurant
12,Downtown Toronto,0,Japanese Restaurant,Coffee Shop,Sushi Restaurant,Gay Bar,Restaurant,Burger Joint,Men's Store,Gastropub,Mediterranean Restaurant,Pub


In [110]:
#Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,1,Playground,Park,Women's Store,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
10,Downtown Toronto,1,Park,Playground,Trail,Women's Store,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [111]:
#Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,2,Garden,Women's Store,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [112]:
#Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,3,Park,Trail,Jewelry Store,Sushi Restaurant,Women's Store,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [113]:
#Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,4,Bus Line,Park,Swim School,Dim Sum Restaurant,Women's Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


<b>3. Methodology<b>

We have used k-means Clustering to segment the neighborhoods. 

<b>4. Result & Conclusion<b>

For Manhattan with k-mean = 3, Discount Bar, Boat & Park are the top 3 common venues.
For Toronto with k-mean = 5, Busline, Park and Swi School are the top 3 common venues.

As you can see, the clustering is completely different, which mean, if tourist have to visit their choice of 
city as per their preferneces.

