# IBM Data Science Capstone Week 3 Assignment #
**Please note that this Notebook contains Parts, 1, 2 & 3 of the assignment combined
Please scroll to the relevant section of the notebook.

## Week 3 Assignment Part 1

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto. Start by creating a new Notebook for this assignment. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.


In [48]:
# Load & Import Required Libraries
!conda install -c conda-forge beautifulsoup4 --yes
!conda install -c conda-forge folium
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [49]:
# Open Canada Post Codes Wikipedia page with Beautiful Soup
data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(data, 'html.parser')

In [50]:
# create empty lists to store data on postcode, borough and neighborhood
postcodeL = []
boroughL = []
neighbourhoodL = []

In [51]:
# Parse scraped data into empty lists and at teh same time clean up the "\n" format in the neighborhood column
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postcodeL.append(cells[0].text)
        boroughL.append(cells[1].text)
        neighbourhoodL.append(cells[2].text.rstrip('\n')) 

In [52]:
# Convert lists into pandas dataframe
tor_nhood = [('Postcode', postcodeL),
                      ('Borough', boroughL),
                      ('Neighborhood', neighbourhoodL)]
tor_df = pd.DataFrame.from_dict(dict(tor_nhood))
tor_df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


In [53]:
# Remove rows where Borough is not assigned and reset index
tor_df_na = tor_df[tor_df.Borough != 'Not assigned'].reset_index(drop=True)
tor_df_na.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M9A,Queen's Park,Not assigned
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


In [54]:
# Group Neighbourhoods with the same Postcode and Borough in the same row,
# separated by comma
tor_df_group = tor_df_na.groupby(['Postcode','Borough'], as_index=False).agg(lambda x: ','.join(x))
tor_df_group.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [55]:
# Change Neighborhood not assigned values
# to the corresponding borough name
nhood_rows = tor_df_group.Neighborhood == 'Not assigned'
tor_df_group.loc[nhood_rows, 'Neighborhood'] = tor_df_group.loc[nhood_rows, 'Borough']

In [56]:
# Rename clean dataframe and check shape
tor_df_clean = tor_df_group
tor_df_clean.shape

(103, 3)

## IBM Capstone Week 3 Assignment Part 2 ##

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html. The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code.
Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data Use the Geocoder package or the csv file to create the dataframe.

In [57]:
# Download co-ordinates dataframe csv file into a pands datfarme called coord_df
!wget -q -O "toronto_coordinates.csv" http://cocl.us/Geospatial_data
coord_df = pd.read_csv('toronto_coordinates.csv')

In [58]:
# Inspect the dataframe
coord_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [59]:
# check shape of dataframe to confirm same number of rows as
# the datframe created in part 1 of the assignment
coord_df.shape

(103, 3)

In [60]:
# As both dataframes contain 103 rows, they can be merged
# However the Postal Code Column will need to be changed to Postcode
# to have the same column name in both dataframes
coord_df.columns = ['Postcode','Latitude','Longitude']

In [61]:
# Merge both dataframes together
torComb_df = pd.merge(tor_df_clean, coord_df, on='Postcode')
torComb_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [62]:
# check shape of new datframe to confirm 103 rows
torComb_df.shape

(103, 5)

## IBM Capstone Week 3 Assignment Part 3 ## 
Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto
and then replicate the same analysis we did to the New York City data. It is up to you. 
Just make sure to add enough Markdown cells to explain what you decided to do and to report any observations you make and to generate maps to visualize your neighborhoods and how they cluster together. 
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository.

In [63]:
# Use geopy library to get geographical co-ordinates for Toronto
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="can_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [64]:
# Create folium map for toronto
tor_map = folium.Map(location=[latitude, longitude], zoom_start=11)
tor_map

In [65]:
# Add markers to the map for each of the boroughs
for lat, long, post, borough, neigh in zip(torComb_df['Latitude'], torComb_df['Longitude'], torComb_df['Postcode'], torComb_df['Borough'], torComb_df['Neighborhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tor_map)
    
tor_map

In [78]:
# Access Foursquare API
LIMIT = 100

CLIENT_ID = 'JXVETAYDYL2KYQXYGROHGMZFPA0C1L34ORVUO2LVO3D2H0Y1' # your Foursquare ID
CLIENT_SECRET = 'U3TG2OKOUA5FXKXWUSRBWK022LG4CGUZRV4EZD0PSP2WASPS' # your Foursquare Secret

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
VERSION = '202001010'

Your credentials:
CLIENT_ID: JXVETAYDYL2KYQXYGROHGMZFPA0C1L34ORVUO2LVO3D2H0Y1
CLIENT_SECRET:U3TG2OKOUA5FXKXWUSRBWK022LG4CGUZRV4EZD0PSP2WASPS


In [70]:
# Reduce number of boroughs to only those containing the word Toronto
torBoro = ['East Toronto', 'Central Toronto', 'Downtown Toronto', 'West Toronto']
torRed_df = torComb_df[torComb_df['Borough'].isin(torBoro)].reset_index(drop=True)
torRed_df.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


In [79]:
# Update map
tor_map2 = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, long, post, borough, neigh in zip(torRed_df['Latitude'], torRed_df['Longitude'], torRed_df['Postcode'], torRed_df['Borough'], torRed_df['Neighborhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tor_map2)
    
tor_map2

In [86]:
# Explore to get recommended places close to each borough in reduced dataframe
LIMIT = 100
radius = 500
venues = []

for lat, long, post, borough, neighborhood in zip(torRed_df['Latitude'], torRed_df['Longitude'], torRed_df['Postcode'], torRed_df['Borough'], torRed_df['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [88]:
torVen_df = pd.DataFrame(venues)
torVen_df.columns = ['Postcode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(torVen_df.shape)
torVen_df.head()

(1719, 9)


Unnamed: 0,Postcode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,East Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


In [92]:
# Return number of venues for each neighbourhood
torVen_df.groupby(['Postcode', 'Borough', 'Neighborhood'])['VenueName'].count()

Postcode  Borough           Neighborhood                                                                                        
M4E       East Toronto      The Beaches                                                                                               5
M4K       East Toronto      The Danforth West,Riverdale                                                                              43
M4L       East Toronto      The Beaches West,India Bazaar                                                                            19
M4M       East Toronto      Studio District                                                                                          43
M4N       Central Toronto   Lawrence Park                                                                                             4
M4P       Central Toronto   Davisville North                                                                                          9
M4R       Central Toronto   North Toronto West         

In [97]:
# Identify number of unique categories returned
uniqCat = len(venues_df['VenueCategory'].unique())
print('There are', uniqCat, 'unique categories')

There are 239 unique categories


In [106]:
# Analyze with onehot encoding
torOH_df = pd.get_dummies(torVen_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
torOH_df['Postcode'] = torVen_df['Postcode'] 
torOH_df['Borough'] = torVen_df['Borough'] 
torOH_df['Neighborhood'] = torVen_df['Neighborhood'] 

# move postcode, borough and neighborhood columns to the first 3 columns
fixed_columns = list(torOH_df.columns[-3:]) + list(torOH_df.columns[:-3])
torOH_df = torOH_df[fixed_columns]

torOH_df.head()

Unnamed: 0,Yoga Studio,Postcode,Borough,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,0,M4E,East Toronto,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,0,M4E,East Toronto,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,M4E,East Toronto,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,M4E,East Toronto,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,M4E,East Toronto,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [118]:
# Get frequency of each category by area
torFreq_df = torOH_df.groupby(['Postcode', 'Borough', 'Neighborhood']).mean().reset_index()
print(torFreq_df.shape)
torFreq_df.head()

(39, 241)


Unnamed: 0,Postcode,Borough,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"The Danforth West,Riverdale",0.023256,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4L,East Toronto,"The Beaches West,India Bazaar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,...,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# Print the categories and top 5 most common category each neighborhood
numCat = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['Postcode', 'Borough', 'Neighborhood']
freqColumns = []
for i in np.arange(numCat):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(i+1, indicators[i]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(i+1))
columns = areaColumns+freqColumns
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postcode'] = torFreq_df['Postcode']
neighborhoods_venues_sorted['Borough'] = torFreq_df['Borough']
neighborhoods_venues_sorted['Neighborhood'] = torFreq

In [124]:
# Print the categories and top 5 most common category each neighborhood
numCat = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['Postcode', 'Borough', 'Neighborhood']
freqColumns = []
for i in np.arange(numCat):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(i+1, indicators[i]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(i+1))
columns = areaColumns+freqColumns
# create a new dataframe
venSort_df = pd.DataFrame(columns=columns)
venSort_df['Postcode'] = torFreq_df['Postcode']
venSort_df['Borough'] = torFreq_df['Borough']
venSort_df['Neighborhood'] = torFreq_df['Neighborhood']

for i in np.arange(torFreq_df.shape[0]):
    row_categories = torFreq_df.iloc[i, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    venSort_df.iloc[i, 3:] = row_categories_sorted.index.values[0:numCat]

venSort_df.sort_values(freqColumns, inplace=True)
venSort_df

Unnamed: 0,Postcode,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
27,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Lounge,Airport Terminal,Boutique,Harbor / Marina
32,M6J,West Toronto,"Little Portugal,Trinity",Bar,Restaurant,Asian Restaurant,Vietnamese Restaurant,Men's Store
33,M6K,West Toronto,"Brockton,Exhibition Place,Parkdale Village",Café,Coffee Shop,Breakfast Spot,Gym,Stadium
3,M4M,East Toronto,Studio District,Café,Coffee Shop,Gastropub,Bakery,Brewery
36,M6S,West Toronto,"Runnymede,Swansea",Café,Coffee Shop,Pizza Place,Italian Restaurant,Sushi Restaurant
25,M5S,Downtown Toronto,"Harbord,University of Toronto",Café,Restaurant,Bakery,Bar,Bookstore
26,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Coffee Shop
19,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",Coffee Shop,Aquarium,Hotel,Italian Restaurant,Café
11,M4X,Downtown Toronto,"Cabbagetown,St. James Town",Coffee Shop,Bakery,Café,Market,Italian Restaurant
29,M5X,Downtown Toronto,"First Canadian Place,Underground city",Coffee Shop,Café,Gym,Restaurant,Steakhouse


In [135]:
# Create 3 clusters
kclusters = 3

torFreqClust_df = torFreq_df.drop(['Postcode', 'Borough', 'Neighborhood'], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(torFreqClust_df)
torRedClust_df = torRed_df
torRedClust_df['Cluster'] = kmeans.labels_

torRedClust_df = torRedClust_df.join(neighborhoods_venues_sorted.drop(['Borough', 'Neighborhood'], 1).set_index('Postcode'), on='Postcode')
torRedClust_df.sort_values(['Cluster'] + freqColumns, inplace=True)
torRedClust_df



#toronto_central_venues_freq_clustering = toronto_central_venues_freq.drop(['PostalCode', 'Borough', 'Neighborhoods'], 1)

#kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_central_venues_freq_clustering)

#toronto_central_clustered_df = toronto_central_df
#toronto_central_clustered_df['Cluster'] = kmeans.labels_

#toronto_central_clustered_df = toronto_central_clustered_df.join(neighborhoods_venues_sorted.drop(['Borough', 'Neighborhoods'], 1).set_index('PostalCode'), on='PostalCode')
#toronto_central_clustered_df.sort_values(['Cluster'] + freqColumns, inplace=True)
#toronto_central_clustered_df

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Airport Service,Airport Lounge,Airport Terminal,Boutique,Harbor / Marina
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049,0,Café,Coffee Shop,Breakfast Spot,Gym,Stadium
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Bakery,Brewery
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Café,Coffee Shop,Pizza Place,Italian Restaurant,Sushi Restaurant
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Café,Restaurant,Bakery,Bar,Bookstore
23,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.696948,-79.411307,0,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Coffee Shop
13,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Aquarium,Hotel,Italian Restaurant,Café
17,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Bakery,Café,Market,Italian Restaurant
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Coffee Shop,Café,Gym,Restaurant,Steakhouse
37,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Café,Italian Restaurant,Ice Cream Shop,Japanese Restaurant


In [136]:
# Show the 3 clusters on a map
# create map
torclust_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# set map color scheme
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(torRedClust_df['Latitude'], torRedClust_df['Longitude'], torRedClust_df['Postcode'], torRed_df['Borough'], torRedClust_df['Neighborhood'], toronto_central_clustered_df['Cluster']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(torclust_map)
       
torclust_map