Continueing the project from the last notebook, we have successfully web scraped the postal codes in Canada and then sorted them according to Borough and Neighborhood. Then we used geocoder to pull the longitude and latitudes for each postal code. Now we will explore that generated data set, using the analysis used in the lab for an guide. First we will get all our dependencies:

In [3]:
import numpy as np
import pandas as pd
import json
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Libraries imported.')

Collecting package metadata: ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata: ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\John\Anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-2.4.1               |           py37_0         544 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         642 KB

The following NEW packages will be INSTALLED:

  altair             conda-f

Now lets import our data set from the last notebook:

In [6]:
df = pd.read_csv (r'C:\Users\John\Desktop\notebooks\toronto_cords.csv')
df = df.drop('Unnamed: 0', axis=1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.78573,-79.15875
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76569,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


Now we will need to pull the geospatial coordinates of the city of toronto

In [19]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="TO_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207


In [20]:
map_TO = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_TO)  
    
map_TO

WoW! Look at that! Seems like this is a lot. Lets focus on the boroughs that have Toronto in the name. So we need those boroughs:

In [21]:
print(np.unique(df.Borough)) 

['Central Toronto' 'Downtown Toronto' 'East Toronto' 'East York'
 'Etobicoke' 'Mississauga' 'North York' "Queen's Park" 'Scarborough'
 'West Toronto' 'York']


Ok, so it looks like we are going to focus on Central Toronto, Downtown Toronto, East Toronto, and West Toronto. Great! First, lets setup and define our function!

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Also define Foursquare Credentials and Version:

In [137]:
CLIENT_ID = 'LPRKJCXB2AZSNVAAPYWFR4MYLWKPTLZMC2BA5RIF3KTA3AOK' # your Foursquare ID
CLIENT_SECRET = '4WWX2K1341BS3PL1K3V0EI2PUTTYEV2J3BOWN3RPIKHWGU5I' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LPRKJCXB2AZSNVAAPYWFR4MYLWKPTLZMC2BA5RIF3KTA3AOK
CLIENT_SECRET:4WWX2K1341BS3PL1K3V0EI2PUTTYEV2J3BOWN3RPIKHWGU5I


Great we are ready to get started. First we need to slice our origincal dataframe by the Boroughs we are looking at.

In [138]:
central_toronto_data = df[df['Borough'] == 'Central Toronto'].reset_index(drop=True)
central_toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085
1,M4P,Central Toronto,Davisville North,43.712815,-79.388526
2,M4R,Central Toronto,North Toronto West,43.714523,-79.40696
3,M4S,Central Toronto,Davisville,43.703395,-79.385964
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.690655,-79.383561


In [139]:
downtown_toronto_data = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.68194,-79.378474
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.66816,-79.366602
2,M4Y,Downtown Toronto,Church and Wellesley,43.666585,-79.381302
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65512,-79.36264
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657363,-79.37818


In [140]:
east_toronto_data = df[df['Borough'] == 'East Toronto'].reset_index(drop=True)
east_toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676845,-79.295225
1,M4K,East Toronto,"The Danforth West, Riverdale",43.683262,-79.35512
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.667965,-79.314673
3,M4M,East Toronto,Studio District,43.662766,-79.33483
4,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.64869,-79.38544


In [142]:
west_toronto_data = df[df['Borough'] == 'West Toronto'].reset_index(drop=True)
west_toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M6H,West Toronto,"Dovercourt Village, Dufferin",43.665087,-79.438705
1,M6J,West Toronto,"Little Portugal, Trinity",43.648485,-79.417742
2,M6K,West Toronto,"Brockton, Exhibition Place, Parkdale Village",43.63941,-79.424362
3,M6P,West Toronto,"High Park, The Junction South",43.659975,-79.462874
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.64787,-79.449762


Okay, the data is all sliced up and ready for the next step. For this, we will go through each Borough in the order above. First we will geneate a localized map focused on the Borough. Then we will use the defined function to pull the venue data for each neighborhood. Then using the top ten vendues for each neighborhood, perform a cluster analysis and plot it. So for Central Toronto:

In [157]:
central_toronto_venues = getNearbyVenues(names=central_toronto_data['Neighborhood'],
                                   latitudes=central_toronto_data['Latitude'],
                                   longitudes=central_toronto_data['Longitude']
                                  )

Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville


In [158]:
print(central_toronto_venues.shape)
central_toronto_venues.head()

(78, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72816,-79.387085,The Photo School – Toronto,43.730429,-79.388767,Photography Studio
1,Lawrence Park,43.72816,-79.387085,Zodiac Swim School,43.728532,-79.38286,Swim School
2,Lawrence Park,43.72816,-79.387085,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
3,Davisville North,43.712815,-79.388526,Sherwood Park,43.716551,-79.387776,Park
4,Davisville North,43.712815,-79.388526,Summerhill Market North,43.715499,-79.392881,Food & Drink Shop


Now we need to analyze each neighborhood. First we perform the necessary one hot encoding and add the neighborhood column back into the data frame.Lets check it to make sure it worked.

In [159]:
central_toronto_onehot = pd.get_dummies(central_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
central_toronto_onehot['Neighborhood'] = central_toronto_venues['Neighborhood'] 
fixed_columns = [central_toronto_onehot.columns[-1]] + list(central_toronto_onehot.columns[:-1])
central_toronto_onehot = central_toronto_onehot[fixed_columns]

central_toronto_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,BBQ Joint,Breakfast Spot,Burger Joint,Bus Line,Café,Coffee Shop,Convenience Store,Cosmetics Shop,...,Sandwich Place,Seafood Restaurant,Skating Rink,Supermarket,Sushi Restaurant,Swim School,Tennis Court,Thai Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant
0,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
2,Lawrence Park,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Davisville North,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Davisville North,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now we need to group rows by neighborhood and determine the mean of the frequency of occurance of each category.

In [160]:
central_toronto_grouped = central_toronto_onehot.groupby('Neighborhood').mean().reset_index()
central_toronto_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,BBQ Joint,Breakfast Spot,Burger Joint,Bus Line,Café,Coffee Shop,Convenience Store,Cosmetics Shop,...,Sandwich Place,Seafood Restaurant,Skating Rink,Supermarket,Sushi Restaurant,Swim School,Tennis Court,Thai Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant
0,Davisville,0.0,0.0,0.0,0.0,0.0,0.08,0.08,0.0,0.0,...,0.08,0.04,0.04,0.0,0.04,0.0,0.0,0.04,0.04,0.0
1,Davisville North,0.125,0.0,0.125,0.125,0.125,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.166667,0.0,...,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0
3,"Forest Hill North, Forest Hill West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Lawrence Park,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0


Write a funciton to sort the venues we will be generating in decending order.

In [161]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

No we will generate a data frame for the 10 top venues for each neighborhood.

In [162]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
central_toronto_venues_sorted = pd.DataFrame(columns=columns)
central_toronto_venues_sorted['Neighborhood'] = central_toronto_grouped['Neighborhood']

for ind in np.arange(central_toronto_grouped.shape[0]):
    central_toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(central_toronto_grouped.iloc[ind, :], num_top_venues)

central_toronto_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Davisville,Dessert Shop,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Pizza Place,Fast Food Restaurant,Toy / Game Store,Indie Movie Theater,Farmers Market
1,Davisville North,American Restaurant,Food & Drink Shop,Breakfast Spot,Burger Joint,Bus Line,Gym,Park,Hotel,Historic Site,Gym Pool
2,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",Light Rail Station,Coffee Shop,Supermarket,Convenience Store,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Historic Site,Gym Pool,Gym,Garden
3,"Forest Hill North, Forest Hill West",Park,Pharmacy,Vegetarian / Vegan Restaurant,Hotel,Historic Site,Gym Pool,Gym,Garden,Furniture / Home Store,French Restaurant
4,Lawrence Park,Swim School,Bus Line,Photography Studio,Vegetarian / Vegan Restaurant,Farmers Market,Historic Site,Gym Pool,Gym,Garden,Furniture / Home Store


Now we will run k-means to clustor the neighborhood into 8 clusters. I

In [163]:
kclusters = 5

central_toronto_grouped_clustering = central_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(central_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

central_toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

central_toronto_merged = central_toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
central_toronto_merged = central_toronto_merged.join(central_toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
central_toronto_merged = central_toronto_merged[np.isfinite(central_toronto_merged['Cluster Labels'])]
central_toronto_merged.head()


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085,3.0,Swim School,Bus Line,Photography Studio,Vegetarian / Vegan Restaurant,Farmers Market,Historic Site,Gym Pool,Gym,Garden,Furniture / Home Store
1,M4P,Central Toronto,Davisville North,43.712815,-79.388526,0.0,American Restaurant,Food & Drink Shop,Breakfast Spot,Burger Joint,Bus Line,Gym,Park,Hotel,Historic Site,Gym Pool
2,M4R,Central Toronto,North Toronto West,43.714523,-79.40696,2.0,Playground,Gym Pool,Garden,Park,Hotel,Historic Site,Gym,Furniture / Home Store,French Restaurant,Food & Drink Shop
3,M4S,Central Toronto,Davisville,43.703395,-79.385964,0.0,Dessert Shop,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Pizza Place,Fast Food Restaurant,Toy / Game Store,Indie Movie Theater,Farmers Market
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.690655,-79.383561,4.0,Gym,Playground,Tennis Court,Diner,Historic Site,Gym Pool,Garden,Furniture / Home Store,French Restaurant,Food & Drink Shop


Now we are going to plot the clusters!

In [164]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
central_toronto_merged['Cluster Labels'] =central_toronto_merged['Cluster Labels'].astype(int)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(central_toronto_merged['Latitude'], central_toronto_merged['Longitude'], central_toronto_merged['Neighborhood'], central_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Great! Lets do the rest of the Boroughs! For the sake of simplicity I am goin to try go through as many steps as possible in the fewest number of cells. Considering that we may be doing a similar set of analysis in the future, it may be worth it to try and build a function to do this for me. However, at this time, we just need to repeat it for the rest of the Boroughs. First lets do downtown:

In [189]:
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude']
                                  )
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

downtown_toronto_venues_sorted = pd.DataFrame(columns=columns)
downtown_toronto_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    downtown_toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
downtown_toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
downtown_toronto_merged = downtown_toronto_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(downtown_toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
downtown_toronto_merged = downtown_toronto_merged[np.isfinite(downtown_toronto_merged['Cluster Labels'])]

Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie


In [190]:
# create map
map_clusters_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)
downtown_toronto_merged['Cluster Labels'] =downtown_toronto_merged['Cluster Labels'].astype(int)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_downtown)
       
map_clusters_downtown

Next lets do East Toronto:

In [168]:
east_toronto_venues = getNearbyVenues(names=east_toronto_data['Neighborhood'],
                                   latitudes=east_toronto_data['Latitude'],
                                   longitudes=east_toronto_data['Longitude']
                                  )
east_toronto_onehot = pd.get_dummies(east_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
east_toronto_onehot['Neighborhood'] = east_toronto_venues['Neighborhood'] 
fixed_columns = [east_toronto_onehot.columns[-1]] + list(east_toronto_onehot.columns[:-1])
east_toronto_onehot = east_toronto_onehot[fixed_columns]
east_toronto_grouped = east_toronto_onehot.groupby('Neighborhood').mean().reset_index()

east_toronto_venues_sorted = pd.DataFrame(columns=columns)
east_toronto_venues_sorted['Neighborhood'] = east_toronto_grouped['Neighborhood']

for ind in np.arange(east_toronto_grouped.shape[0]):
    east_toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(east_toronto_grouped.iloc[ind, :], num_top_venues)

east_toronto_venues_sorted.head()

east_toronto_grouped_clustering = east_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(east_toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
east_toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
east_toronto_merged = east_toronto_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
east_toronto_merged = east_toronto_merged.join(east_toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
east_toronto_merged = east_toronto_merged[np.isfinite(east_toronto_merged['Cluster Labels'])]

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Business Reply Mail Processing Centre 969 Eastern


In [179]:
# create map
map_clusters_east = folium.Map(location=[latitude, longitude], zoom_start=11)
east_toronto_merged['Cluster Labels'] =east_toronto_merged['Cluster Labels'].astype(int)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(east_toronto_merged['Latitude'], east_toronto_merged['Longitude'], east_toronto_merged['Neighborhood'], east_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_east)
       
map_clusters_east

Next lets do West Toronto:

In [183]:
west_toronto_venues = getNearbyVenues(names=west_toronto_data['Neighborhood'],
                                   latitudes=west_toronto_data['Latitude'],
                                   longitudes=west_toronto_data['Longitude']
                                  )
west_toronto_onehot = pd.get_dummies(west_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
west_toronto_onehot['Neighborhood'] = west_toronto_venues['Neighborhood'] 
fixed_columns = [west_toronto_onehot.columns[-1]] + list(west_toronto_onehot.columns[:-1])
west_toronto_onehot = west_toronto_onehot[fixed_columns]
west_toronto_grouped = west_toronto_onehot.groupby('Neighborhood').mean().reset_index()

west_toronto_venues_sorted = pd.DataFrame(columns=columns)
west_toronto_venues_sorted['Neighborhood'] = west_toronto_grouped['Neighborhood']

for ind in np.arange(east_toronto_grouped.shape[0]):
    west_toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(west_toronto_grouped.iloc[ind, :], num_top_venues)

west_toronto_venues_sorted.head()

west_toronto_grouped_clustering = west_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(west_toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
west_toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
west_toronto_merged = west_toronto_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
west_toronto_merged = west_toronto_merged.join(west_toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
west_toronto_merged = west_toronto_merged[np.isfinite(west_toronto_merged['Cluster Labels'])]

Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea


In [185]:
# create map
map_clusters_west = folium.Map(location=[latitude, longitude], zoom_start=11)
west_toronto_merged['Cluster Labels'] =west_toronto_merged['Cluster Labels'].astype(int)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(west_toronto_merged['Latitude'], west_toronto_merged['Longitude'], west_toronto_merged['Neighborhood'], west_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_west)
       
map_clusters_west

Great! Now lets see if we can pull it all together and make a single composite map for all Boroughs!

In [187]:
toronto_merged=central_toronto_merged
downtown_holder=downtown_toronto_merged
downtown_holder['Cluster Labels']=downtown_toronto_merged['Cluster Labels']+5
toronto_merged=toronto_merged.append(downtown_holder, ignore_index = True)
east_holder=east_toronto_merged
east_holder['Cluster Labels']=east_toronto_merged['Cluster Labels']+10
toronto_merged=toronto_merged.append(east_holder, ignore_index = True)
west_holder=west_toronto_merged
west_holder['Cluster Labels']=west_toronto_merged['Cluster Labels']+15
toronto_merged=toronto_merged.append(west_holder, ignore_index = True)
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085,3,Swim School,Bus Line,Photography Studio,Vegetarian / Vegan Restaurant,Farmers Market,Historic Site,Gym Pool,Gym,Garden,Furniture / Home Store
1,M4P,Central Toronto,Davisville North,43.712815,-79.388526,0,American Restaurant,Food & Drink Shop,Breakfast Spot,Burger Joint,Bus Line,Gym,Park,Hotel,Historic Site,Gym Pool
2,M4R,Central Toronto,North Toronto West,43.714523,-79.40696,2,Playground,Gym Pool,Garden,Park,Hotel,Historic Site,Gym,Furniture / Home Store,French Restaurant,Food & Drink Shop
3,M4S,Central Toronto,Davisville,43.703395,-79.385964,0,Dessert Shop,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Pizza Place,Fast Food Restaurant,Toy / Game Store,Indie Movie Theater,Farmers Market
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.690655,-79.383561,4,Gym,Playground,Tennis Court,Diner,Historic Site,Gym Pool,Garden,Furniture / Home Store,French Restaurant,Food & Drink Shop
5,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686083,-79.402335,0,Light Rail Station,Coffee Shop,Supermarket,Convenience Store,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Historic Site,Gym Pool,Gym,Garden
6,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.694785,-79.414405,1,Park,Pharmacy,Vegetarian / Vegan Restaurant,Hotel,Historic Site,Gym Pool,Gym,Garden,Furniture / Home Store,French Restaurant
7,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67484,-79.403698,0,Sandwich Place,Café,Coffee Shop,Pizza Place,Vegetarian / Vegan Restaurant,Mexican Restaurant,BBQ Joint,Burger Joint,Cosmetics Shop,French Restaurant
8,M4W,Downtown Toronto,Rosedale,43.68194,-79.378474,6,Playground,Park,Bank,Tennis Court,Building,Dance Studio,Creperie,Farmers Market,Farm,Falafel Restaurant
9,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.66816,-79.366602,5,Coffee Shop,Restaurant,Park,Café,Bakery,Italian Restaurant,Pizza Place,Deli / Bodega,Snack Place,Pet Store


In [191]:
# create map
map_clusters_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(20)
ys = [i + x + (i*x)**2 for i in range(20)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
toronto_merged['Cluster Labels'] =toronto_merged['Cluster Labels'].astype(int)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_toronto)
       
map_clusters_toronto

There we have a plot of the by borough neighborhood cluster analysis for all of Toronto!