## Problem 1:
+ The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
+ More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia  page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11  in the above table.
+ If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.
+ Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
+ In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup as BS

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r = requests.get(url).text
data = BS(r)

### Fetch raw data from Wikipedia page and parsed into text file by using Beautifulsoup. Then in the table info, postcode is always in the tag 'b', and parsed out the borough and neighborhood. If is not 'Not assigned', added it into df.

In [51]:
column_names = ['Postalcode','Borough','Neighborhood']
toronto = pd.DataFrame(columns = column_names)

content = data.find('div', class_='mw-parser-output')
table = content.table.tbody
postcode = 0
borough = 0
neighborhood = 0

for tr in table.find_all('tr'):
    
    for td in tr.find_all('td'):
        postcode = td.find('b').text
        borough = td.text.strip('\n')[3:].split('(')[0]
        neighborhood = td.text.strip('\n')[3:].split('(')[-1].rstrip(')')
        if borough != 'Not assigned':
            neighborhood = neighborhood.replace('/',',')
            toronto = toronto.append({'Postalcode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)

In [52]:
toronto

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,..."


In [53]:
toronto.shape

(103, 3)

In [64]:
toronto.dtypes

Postalcode      object
Borough         object
Neighborhood    object
dtype: object

## Problem 2

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In [54]:
coordinate = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')


In [66]:
coordinate = coordinate.rename(columns = {'Postal Code':'Postalcode'})

In [67]:
data = toronto.merge(coordinate,how = 'left', on = 'Postalcode')

In [68]:
data

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,...",43.636258,-79.498509


## Problem 3

### Tried to use folium to draw the toronto map and all boroughs, and the all boroughs that contain the word Toronto.

In [69]:
pip install folium

Collecting foliumNote: you may need to restart the kernel to use updated packages.
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)

Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [70]:
import folium

In [72]:
latitude = 43.653963 
longitude = -79.387207
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
for lat, lng, borough, neighborhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)
map_toronto

In [73]:
B_contain_toronto = data[data['Borough'].str.contains("Toronto")].reset_index(drop=True)

In [74]:
df = B_contain_toronto.copy()

In [76]:
df

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond , Adelaide , King",43.650571,-79.384568
8,M6H,West Toronto,"Dufferin , Dovercourt Village",43.669005,-79.442259
9,M4J,East YorkEast Toronto,The Danforth East,43.685347,-79.338106


In [77]:
latitude = 43.653963 
longitude = -79.387207
map_contain_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_contain_toronto)
map_contain_toronto

### K Mean Clustering

In [79]:
from sklearn.cluster import KMeans

In [80]:
df_cor = df[['Latitude','Longitude']]

In [81]:
CLIENT_ID = '' # Put Your Client Id
CLIENT_SECRET = '' # Put You Client Secret 
VERSION = '20180604'
LIMIT = 30

In [82]:
data.loc[0, 'Neighborhood']

'Parkwoods'

In [83]:
neighborhood_latitude = data.loc[0, 'Latitude']
neighborhood_longitude = data.loc[0, 'Longitude'] 

neighborhood_name = data.loc[0, 'Neighborhood'] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [84]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=DPBYY4JUY3DU20ALPSUV4ONY2K1GOJJKJ1NIHBB32XEMOVYY&client_secret=1MV443TYEP4HUO0WDUW5NQ5W10L2Y4G05NWG11WIR3NUGC5B&v=20180604&ll=43.7532586,-79.3296565&radius=500&limit=30'

In [85]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '608f629004ef220d873c7ca5'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

In [86]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [89]:
from pandas.io.json import json_normalize

In [90]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114


###  Explore Neighborhoods in Toronto

In [87]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
toronto_venues = getNearbyVenues(names=data['Neighborhood'],
                                   latitudes=data['Latitude'],
                                   longitudes=data['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park , Harbourfront
Lawrence Manor , Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern , Rouge
Don Mills)North
Parkview Hill , Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park , Princess Gardens , Martin Grove , Islington , Cloverdale
Rouge Hill , Port Union , Highland Creek
Flemingdon Park
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate , Bloordale Gardens , Old Burnhamthorpe , Markland Wood
Guildwood , Morningside , West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor , Wilson Heights , Downsview North
Thorncliffe Park
Richmond , Adelaide , King
Dufferin , Dovercourt Village
Scarborough Village
Fairview , Henry Farm , Oriole
Northwood Park , York University
The Danforth  East
Harbourfront East , Union Station , Toronto Islands
Little Portugal , Trinity
Kennedy Park , Ionview , East Birchmount Park
Bayview Village

### Analyze Each Neighborhood

In [92]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [93]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
1,"Alderwood , Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
2,"Bathurst Manor , Wilson Heights , Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
4,"Bedford Park , Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,Willowdale)South,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.033333,0.0,0.0,0.0,0.0
94,Willowdale)West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
95,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0
96,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.111111,0.000000,0.0,0.0,0.0,0.0


In [94]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [96]:
import numpy as np

In [97]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt),Latin American Restaurant,Clothing Store,Lounge,Breakfast Spot,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
1,"Alderwood , Long Branch",Pizza Place,Coffee Shop,Pub,Gym,Sandwich Place,Deli / Bodega,Cuban Restaurant,Curling Ice,Dance Studio,Dessert Shop
2,"Bathurst Manor , Wilson Heights , Downsview North",Coffee Shop,Bank,Park,Mobile Phone Shop,Diner,Sandwich Place,Deli / Bodega,Bridal Shop,Restaurant,Ice Cream Shop
3,Bayview Village,Chinese Restaurant,Japanese Restaurant,Bank,Café,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store
4,"Bedford Park , Lawrence Manor East",Coffee Shop,Restaurant,Italian Restaurant,Sandwich Place,Butcher,Thai Restaurant,Juice Bar,Liquor Store,Indian Restaurant,Pub


### Cluster Neighborhoods


In [98]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 1, 0, 1, 1, 1, 3, 1])

In [110]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Agincourt),Latin American Restaurant,Clothing Store,Lounge,Breakfast Spot,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
1,0,"Alderwood , Long Branch",Pizza Place,Coffee Shop,Pub,Gym,Sandwich Place,Deli / Bodega,Cuban Restaurant,Curling Ice,Dance Studio,Dessert Shop
2,0,"Bathurst Manor , Wilson Heights , Downsview North",Coffee Shop,Bank,Park,Mobile Phone Shop,Diner,Sandwich Place,Deli / Bodega,Bridal Shop,Restaurant,Ice Cream Shop
3,1,Bayview Village,Chinese Restaurant,Japanese Restaurant,Bank,Café,Women's Store,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store
4,0,"Bedford Park , Lawrence Manor East",Coffee Shop,Restaurant,Italian Restaurant,Sandwich Place,Butcher,Thai Restaurant,Juice Bar,Liquor Store,Indian Restaurant,Pub
...,...,...,...,...,...,...,...,...,...,...,...,...
93,1,Willowdale)South,Ramen Restaurant,Coffee Shop,Pizza Place,Bubble Tea Shop,Café,Sandwich Place,Movie Theater,Electronics Store,Lounge,Bank
94,1,Willowdale)West,Pharmacy,Pizza Place,Coffee Shop,Butcher,Discount Store,Women's Store,Department Store,Curling Ice,Dance Studio,Deli / Bodega
95,1,Woburn,Coffee Shop,Korean BBQ Restaurant,Pharmacy,College Auditorium,Curling Ice,College Arts Building,Donut Shop,Dog Run,Distribution Center,Discount Store
96,1,Woodbine Heights,Park,Bus Stop,Skating Rink,Beer Store,Intersection,Dance Studio,Curling Ice,Athletics & Sports,Video Store,Deli / Bodega


In [113]:
kmeans.labels_

array([1, 0, 0, 1, 0, 1, 1, 1, 3, 1, 3, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 3, 1, 3, 1, 0, 1, 1, 1, 1,
       4, 3, 1, 1, 1, 3, 1, 3, 1, 1, 1, 3, 0, 0, 1, 3, 1, 1, 3, 1, 1, 1,
       3, 1, 1, 3, 1, 2, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 3, 1, 1, 0, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 3])

In [111]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

toronto_merged = data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster Label,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,3.0,Park,Food & Drink Shop,Women's Store,Creperie,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,1.0,Coffee Shop,Portuguese Restaurant,Intersection,Hockey Arena,Pizza Place,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Creperie
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,1.0,1.0,Coffee Shop,Park,Bakery,Theater,Breakfast Spot,Café,Restaurant,Pub,Chocolate Shop,Yoga Studio
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763,1.0,1.0,Clothing Store,Miscellaneous Shop,Arts & Crafts Store,Accessories Store,Boutique,Vietnamese Restaurant,Coffee Shop,Athletics & Sports,Furniture / Home Store,Carpet Store
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,1.0,1.0,Coffee Shop,Sushi Restaurant,Yoga Studio,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Restaurant,Burrito Place


In [103]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [109]:
kclusters

5

In [125]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
toronto_merged['Cluster Label'] = toronto_merged['Cluster Label'].fillna(4)
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [124]:
toronto_merged['Cluster Label'].fillna(0).tolist()

[3.0,
 1.0,
 1.0,
 1.0,
 1.0,
 4.0,
 1.0,
 1.0,
 1.0,
 1.0,
 3.0,
 4.0,
 2.0,
 0.0,
 1.0,
 1.0,
 3.0,
 1.0,
 0.0,
 1.0,
 1.0,
 3.0,
 1.0,
 1.0,
 1.0,
 0.0,
 1.0,
 1.0,
 0.0,
 0.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 3.0,
 1.0,
 1.0,
 1.0,
 1.0,
 3.0,
 1.0,
 1.0,
 1.0,
 1.0,
 4.0,
 0.0,
 1.0,
 1.0,
 3.0,
 1.0,
 1.0,
 4.0,
 1.0,
 1.0,
 0.0,
 0.0,
 4.0,
 1.0,
 1.0,
 0.0,
 3.0,
 1.0,
 0.0,
 1.0,
 0.0,
 3.0,
 0.0,
 3.0,
 1.0,
 0.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 3.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 0.0,
 1.0,
 3.0,
 1.0,
 1.0,
 1.0,
 0.0,
 0.0,
 3.0,
 1.0,
 0.0,
 1.0,
 4.0,
 1.0,
 1.0,
 1.0,
 1.0,
 1.0,
 3.0,
 0.0]