# Segmenting and Clustering Neighborhoods in Toronto 
##### This is a Peer-graded Assignment to Segmenting and Clustering Neighborhoods in Toronto. It is a part of week three of the IBM Applied Data Science Capstone course in Coursera. 

## This notebook is to explore, segment, and cluster the neighborhoods in the city of Toronto.  
#### There Have Three Parts of the assignment. So This notebook will be divided into three parts.
##### 01. Scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas data frame so that it is in a structured format. The data frame will consist of three columns: PostalCode, Borough, and Neighborhood.
##### 02. By using the Geocoder Python package, get latitude and the longitude coordinates of each neighborhood. Update the existing Data Fream by adding the latitude and longitude coordinates of each neighborhood. 
##### 03. Explore and cluster the neighborhoods in Toronto. Also, generate maps to visualize neighborhoods and how they clustered.

## Environment Setup 

In [283]:
pip install BeautifulSoup4

Note: you may need to restart the kernel to use updated packages.


In [284]:
pip install html5lib

Note: you may need to restart the kernel to use updated packages.


In [285]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [286]:
pip install folium

Note: you may need to restart the kernel to use updated packages.


In [287]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [288]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


## Import Libraries 

In [289]:
import pandas as pd
import numpy as np
from pandas.io.html import read_html
import geocoder
import matplotlib.pyplot as plt
from matplotlib import cm, colors
import folium
import requests
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
%matplotlib inline 
print("Imported")

Imported


# Task 01: Scrape the Wikipedia page and Prepare DataFrame

In [290]:
#Getting the table from wikipedia
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table = read_html(url,  attrs={"class":"wikitable"})

##### Examine The number of table

In [291]:
len(table)

1

##### We have one table. Viewing the attribute, trying to confirm the degier table 

In [292]:
table[0].head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


###### This is out desigr table

##### Converting the table to dataframe for further proceeing

In [293]:
df=pd.DataFrame(data=table[0])
df.head(5)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


##### DataFrame Shape

In [294]:
df.shape

(287, 3)

##### Rename column name 'PostalCode' from 'Postcode'

In [295]:
df.rename(columns={"Postcode": "PostalCode"}, inplace=True)

##### Ignoreing the rows which borough value is Not assigned

In [296]:
df = df[df['Borough'] != 'Not assigned']
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


##### Combining rows which PostalCode have more than one Neighborhood value by joing Neighborhood value by comma
##### Replace 'Not assigned' Neighborhood value by Borough value

In [297]:
df = df.groupby(['PostalCode', 'Borough'], as_index=False)['Neighborhood'].agg(lambda x: ', '.join(x))
df.Neighborhood[df['Neighborhood'] == 'Not assigned']=df.Borough

##### Vesualizing the same attributes which are discribed in assignments

In [298]:
example = ['M5G', 'M2H', 'M4B', 'M1J', 'M4G', 'M4M', 'M1R', 'M9V', 'M9L', 'M5V', 'M1B', 'M5A']
df[df['PostalCode'].isin(example)]

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
5,M1J,Scarborough,Scarborough Village
11,M1R,Scarborough,"Maryvale, Wexford"
17,M2H,North York,Hillcrest Village
35,M4B,East York,"Woodbine Gardens, Parkview Hill"
38,M4G,East York,Leaside
43,M4M,East Toronto,Studio District
53,M5A,Downtown Toronto,Harbourfront
57,M5G,Downtown Toronto,Central Bay Street
68,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo..."


#### Printing the number of rows of the Final DataFrame

In [299]:
df.shape

(103, 3)

# Task 02: Marging latitude and the longitude coordinates in DataFrame

##### Getting geographical Data from provided data source

In [300]:
geo_data = 'http://cocl.us/Geospatial_data'
geo = pd.read_csv(geo_data)
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


##### Rename column PostalCode from Postal Code for marging purpurpose

In [301]:
geo.rename(columns={"Postal Code": "PostalCode"}, inplace=True)
geo.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


##### Marging coordinates in DataFrame

In [302]:
dfc=pd.merge(df, geo, how='inner', on = 'PostalCode')
dfc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


##### Vesualizing the same attributes which are discribed in assignments

In [303]:
dfc[dfc['PostalCode'].isin(example)]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
11,M1R,Scarborough,"Maryvale, Wexford",43.750072,-79.295849
17,M2H,North York,Hillcrest Village,43.803762,-79.363452
35,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
38,M4G,East York,Leaside,43.70906,-79.363452
43,M4M,East Toronto,Studio District,43.659526,-79.340923
53,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
57,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
68,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.628947,-79.39442


# Task 03: Cluster Toronto and generate maps to visualize neighborhoods

##### Create new Geo-Toronto DataFream 'dfgt' for explore, analysis, clustering and visualization

In [304]:
dfgt=dfc
dfgt.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [305]:
print('Torento has {} boroughs and {} neighborhoods.'.format(
        len(dfgt['Borough'].unique()),
        dfgt.shape[0]))

Torento has 11 boroughs and 103 neighborhoods.


##### Foursquare tuning Setup

In [306]:
CLIENT_ID = 'XNEXOP42BZTJAHJREQYEH4W5TZKCPUJNEGCFKMR3BG1MGW4M' # Foursquare ID
CLIENT_SECRET = 'FGT1L3QYQSFRMWAGHHC4UBJ1U53C5CIQW5DNKJ3Z2HKEEMZH' # Foursquare Secret
VERSION = '20200101' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

print('My credentails are')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('LIMIT: ',  LIMIT)
print('Radius:', radius)

My credentails are
CLIENT_ID: XNEXOP42BZTJAHJREQYEH4W5TZKCPUJNEGCFKMR3BG1MGW4M
CLIENT_SECRET:FGT1L3QYQSFRMWAGHHC4UBJ1U53C5CIQW5DNKJ3Z2HKEEMZH
LIMIT:  100
Radius: 500


##### Get the coordinates of Toronto

In [307]:
address = 'Toronto, CA'
geolocator = Nominatim(user_agent="Foursquare_agent")
location = geolocator.geocode(address)
Toronto_latitude = location.latitude
Toronto_longitude = location.longitude
print('Toronto Geographical coordinates are Latitude={}, Lngitude={} ' .format(Toronto_latitude, Toronto_longitude))

Toronto Geographical coordinates are Latitude=43.653963, Lngitude=-79.387207 


### Exploring Toronto Data

##### Visualization of Toronto with markers

In [308]:
Toronto_map = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=11, control_scale = True)

for lat, lng, Borough, Neighbourhood in zip(dfgt['Latitude'], dfgt['Longitude'], dfgt['Borough'], dfgt['Neighborhood']):
    tag = '{}, {}'.format(Neighbourhood, Borough)
    tag = folium.Popup(tag, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=tag,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6,
        parse_html=False).add_to(Toronto_map)

# adding a folium fetaure allowing location [Neighborhood, Borough], and coordinates shown with mouse-over and 
# mouse click on the map 
Toronto_map.add_child(folium.LatLngPopup())
    
Toronto_map

##### Generate a new dataframe 'dfgt_nbr' that 'Toronto' exists in 'Borough'

In [309]:
dfgt_nbr = dfgt
dfgt_nbr = dfgt_nbr[dfgt_nbr['Borough'].str.contains('Toronto')]
print(dfgt_nbr.shape)
dfgt_nbr.head()

(39, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


##### Define a function to get Nearby venues

In [310]:
def getNearbyVenues (names, latitude, longitude, radius=500, limit=100):
    
    venue_lst=[] # initialize empty list
    
    for name, lat, lng in zip (names, latitude, longitude):
        print(name)
        
    
        # Foursuare API explore generated per Neighbourhood lat / lng passed from toronto_df
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
        
        # Get request
        results = requests.get(url).json()["response"]['groups'][0]['items']
                
        # return only relevant information for each nearby venue in the preliminary list
        venue_lst.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    venue_df = pd.DataFrame([item for venue_lst in venue_lst for item in venue_lst])
    venue_df.columns = ['Neighbourhood', 
                  'Neigh Latitude', 
                  'Neigh Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    print('{} venues were returned by Foursquare.'.format(venue_df.shape[0]))
    
    return (venue_df)

##### Getting the venues of Toronto Neighborhood 

In [311]:
dfgt_vnu = getNearbyVenues (names=dfgt_nbr['Neighborhood'], 
                                     latitude=dfgt_nbr['Latitude'], 
                                     longitude=dfgt_nbr['Longitude'])

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

##### Number of Venue 

In [312]:
dfgt_vnu.shape[0]

1722

##### Venue DataFrame

In [313]:
dfgt_vnu.head()

Unnamed: 0,Neighbourhood,Neigh Latitude,Neigh Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


##### Check how many venue returned for Toronto neighborhood

In [314]:
dfgt_vnu['Venue'].groupby(dfgt_vnu.Neighbourhood).count()

Neighbourhood
Adelaide, King, Richmond                                                                                      100
Berczy Park                                                                                                    56
Brockton, Exhibition Place, Parkdale Village                                                                   21
Business Reply Mail Processing Centre 969 Eastern                                                              17
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara     16
Cabbagetown, St. James Town                                                                                    44
Central Bay Street                                                                                             86
Chinatown, Grange Park, Kensington Market                                                                      90
Christie                                                                  

##### Unique categories in returned venues

In [315]:
print('There are {} uniques categories.'.format(len(dfgt_vnu['Venue Category'].unique())))

There are 237 uniques categories.


### Analyze Neighborhood

##### one hot encoding in Toronto venues

In [316]:
dfgt_nbr_onht = pd.get_dummies(dfgt_vnu[['Venue Category']], prefix="", prefix_sep="")

##### Add neighborhood column back to dataframe

In [317]:
dfgt_nbr_onht['Neighbourhood'] = dfgt_vnu['Neighbourhood'] 

##### Move neighborhood column to the first column

In [318]:
fixed_columns = [dfgt_nbr_onht.columns[-1]] + list(dfgt_nbr_onht.columns[:-1])
dfgt_nbr_onht = dfgt_nbr_onht[fixed_columns]

##### View Toronto Neighborhood Onehot DataFrame

In [319]:
dfgt_nbr_onht.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


##### View Toronto Neighborhood Onehot DataFrame size

In [320]:
dfgt_nbr_onht.shape

(1722, 238)

##### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [321]:
dfgt_grp = dfgt_nbr_onht.groupby('Neighbourhood').mean().reset_index()
dfgt_grp.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0625,0.0625,0.0625,0.0625,0.1875,0.0625,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Group DataFream Size

In [322]:
dfgt_grp.shape

(39, 238)

##### Print each neighborhood with the top 3 common venues

In [323]:
num_top_venues = 3

for hood in dfgt_grp['Neighbourhood']:
    print("----"+hood+"----")
    temp = dfgt_grp[dfgt_grp['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
         venue  freq
0  Coffee Shop  0.08
1         Café  0.05
2   Steakhouse  0.04


----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2      Beer Bar  0.04


----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0            Café  0.14
1     Coffee Shop  0.10
2  Breakfast Spot  0.10


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.12
1          Comic Shop  0.06
2       Auto Workshop  0.06


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
             venue  freq
0  Airport Service  0.19
1  Harbor / Marina  0.06
2          Airport  0.06


----Cabbagetown, St. James Town----
                venue  freq
0         Coffee Shop  0.07
1  Italian Restaurant  0.05
2                Park  0.05


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.1

##### Write a function to sort the venues in descending order

In [324]:
def rtn_mst_cmn_venues(row, ntop_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:ntop_venues]

##### Create the new DataFrame and display the top 5 venues for each neighborhood

In [325]:
ntop_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(ntop_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
nbr_vnu_st = pd.DataFrame(columns=columns)
nbr_vnu_st['Neighbourhood'] = dfgt_grp['Neighbourhood']

for ind in np.arange(dfgt_grp.shape[0]):
    nbr_vnu_st.iloc[ind, 1:] = rtn_mst_cmn_venues(dfgt_grp.iloc[ind, :], ntop_venues)

nbr_vnu_st.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Steakhouse,Bakery
2,"Brockton, Exhibition Place, Parkdale Village",Café,Coffee Shop,Breakfast Spot,Pet Store,Bakery
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Auto Workshop,Brewery,Spa,Farmers Market
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Harbor / Marina,Bar,Plane,Coffee Shop


### Clustering Toronto by neighborhoods!

##### set number of clusters

In [326]:
k = 3

##### Create new DataFream for clustering

In [327]:
dfgt_grp_clustering = dfgt_grp.drop('Neighbourhood', 1)

##### run k-means clustering

In [328]:
kmeans = KMeans(n_clusters=k, random_state=0).fit(dfgt_grp_clustering)

##### check cluster labels generated for each row in the dataframe

In [329]:
kmeans.labels_[0:30]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 1, 0, 1, 1, 1, 0, 2], dtype=int32)

##### Create a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood

##### Add clustering labels

In [330]:
nbr_vnu_st.insert(0, 'Cluster_Labels', kmeans.labels_)

In [331]:
nbr_vnu_st.head(2)

Unnamed: 0,Cluster_Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Restaurant
1,1,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Steakhouse,Bakery


In [332]:
Toronto_merged = dfgt_nbr
Toronto_merged.rename(columns={"Neighborhood":"Neighbourhood"}, inplace=True)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(nbr_vnu_st.set_index('Neighbourhood'), on='Neighbourhood')

Toronto_merged.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Neighborhood,Trail,Health Food Store,Pub,Dim Sum Restaurant
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,1,Park,Sandwich Place,Liquor Store,Steakhouse,Fish & Chips Shop
43,M4M,East Toronto,Studio District,43.659526,-79.340923,1,Café,Coffee Shop,Gastropub,Bakery,Italian Restaurant
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Park,Swim School,Bus Line,Yoga Studio,Diner
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197,1,Gym,Park,Sandwich Place,Breakfast Spot,Food & Drink Shop
46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,1,Clothing Store,Sporting Goods Shop,Coffee Shop,Yoga Studio,Chinese Restaurant
47,M4S,Central Toronto,Davisville,43.704324,-79.38879,1,Pizza Place,Sandwich Place,Dessert Shop,Gym,Italian Restaurant
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0,Park,Playground,Restaurant,Tennis Court,Comfort Food Restaurant
49,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,1,Light Rail Station,Coffee Shop,Pub,American Restaurant,Bagel Shop


#### Visualize the clusters!

In [333]:
# create map
map_clusters = folium.Map(location=[Toronto_latitude, Toronto_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighbourhood'], Toronto_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters