# Final Assignment: Segmenting and Clustering Neighborhoods in Toronto, Canada

#### This notebook is to extract the list of postal codes of Toronto from this [Wikipedia page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) and then cluster the neighborhoods located in one of Toronoto's boroughs based on the venues located in each neighborhood

### Table of Content
1. [PART 1 - Scrape the Wikipedia page = Reading the data, cleaning and preparing](#part_1)
2. [PART 2 - Adding the _latitude_ and the _longitude_ coordinates of each neighborhood](#part_2)
3. [PART 3: Explore and cluster the neighborhoods in _Downtown_ Toronto](#part_3)

## PART 1 : Scrape the Wikipedia page = Reading the data, cleaning and preparing
Input data [Wikipedia: List of postal codes of Canada: M](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)

In [106]:
from bs4 import BeautifulSoup
import requests

# Reading the source:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(website_url,'html.parser')

my_table = soup.find('tbody')
# print(my_table.prettify())


### ** Making DataFrame = Convert Table of postal codes to Pandas DataFrame

In [107]:
table_data = []

for row in my_table.findAll('tr'):
    row_data = []

    for cell in row.findAll('td'):
        row_data.append(cell.text)

    if(len(row_data) > 0):
        data_item = {
            "Postalcode": row_data[0],
            "Borough": row_data[1],
            "Neighbourhood": row_data[2],
        }
        table_data.append(data_item)

In [108]:
## Convert table to Pandas DataFrame
#table_data
import pandas as pd
df = pd.DataFrame(table_data)
df.head()

Unnamed: 0,Borough,Neighbourhood,Postalcode
0,Not assigned,Not assigned\n,M1A
1,Not assigned,Not assigned\n,M2A
2,North York,Parkwoods\n,M3A
3,North York,Victoria Village\n,M4A
4,Downtown Toronto,Harbourfront\n,M5A


In [109]:
type(df)

pandas.core.frame.DataFrame

In [110]:
df['Neighbourhood'] = df['Neighbourhood'].str.replace('\n','')

### Ignore cells with a borough that is "Not assigned"

In [111]:
df_filtered = df.loc[df['Borough'] != 'Not assigned']
df_filtered.head()

Unnamed: 0,Borough,Neighbourhood,Postalcode
2,North York,Parkwoods,M3A
3,North York,Victoria Village,M4A
4,Downtown Toronto,Harbourfront,M5A
5,Downtown Toronto,Regent Park,M5A
6,North York,Lawrence Heights,M6A


### Cells with Neibourhood is "Not assigned" then the neighborhood will be the same as the borough

In [112]:
df_filtered['Neighbourhood'].loc[df_filtered['Neighbourhood']=='Not assigned'] = df_filtered['Borough'].loc[df_filtered['Neighbourhood']=='Not assigned']


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


### More than one neighborhood can exist in one postal code area
notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park

In [113]:
df_toronto = df_filtered.groupby(['Postalcode','Borough'])['Neighbourhood'].apply(','.join).reset_index()
df_toronto

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


Final assignament requirement: dataframe shape is shown.

In [114]:
df_toronto.shape

(103, 3)

In [115]:
df_toronto.head(10) 

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


## PART 2 :  Adding the latitude and the longitude coordinates of each neighborhood

### Geocoder doesn't works for me, all the time I get 'None' as response.  
> Therefore I downloaded the 'Geospatial_Coordinates.csv' and got geocoding from that file.

In [116]:
# Reading CSV fie
df_locations = pd.read_csv('https://cocl.us/Geospatial_data')
df_locations.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [117]:
df_toronto_loc = pd.merge(left=df_toronto ,right=df_locations, left_on='Postalcode', right_on='Postal Code', how='inner')

In [118]:
df_toronto_loc[['Postalcode','Borough','Neighbourhood','Postal Code','Latitude','Longitude']].head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


In [119]:
df_toronto.tail()

Unnamed: 0,Postalcode,Borough,Neighbourhood
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie..."
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam..."
102,M9W,Etobicoke,Northwest


In [120]:
df_toronto_loc.shape

(103, 6)

## PART 3 : Explore and cluster the neighborhoods in Downtown Toronto

In [121]:
#Use geopy library to get the latitude and longitude values of Toronto City.
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

- **Select Toronto boroughs with word "Toronto" in the name.**

In [122]:
df_toronto = df_toronto[df_toronto['Borough'].str.contains('Toronto')].reset_index(drop=True)
print(df_toronto.shape)
df_toronto.head()

(38, 3)


Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M4E,East Toronto,The Beaches
1,M4K,East Toronto,"The Danforth West,Riverdale"
2,M4L,East Toronto,"The Beaches West,India Bazaar"
3,M4M,East Toronto,Studio District
4,M4N,Central Toronto,Lawrence Park


In [123]:
df_toronto.tail()

Unnamed: 0,Postalcode,Borough,Neighbourhood
33,M6K,West Toronto,"Brockton,Exhibition Place,Parkdale Village"
34,M6P,West Toronto,"High Park,The Junction South"
35,M6R,West Toronto,"Parkdale,Roncesvalles"
36,M6S,West Toronto,"Runnymede,Swansea"
37,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern


In [124]:
address = 'Toronto, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


- **Build Toronto map including markers for boroughs.**  

### Create a map of Toronto with neighborhoods superimposed on top.

In [125]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge


In [126]:
df_toronto_merge = df_toronto_loc[['Postalcode','Borough','Neighbourhood','Postal Code','Latitude','Longitude']]
df_toronto_merge.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",M1B,43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",M1C,43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",M1E,43.763573,-79.188711
3,M1G,Scarborough,Woburn,M1G,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,M1H,43.773136,-79.239476


In [127]:
map_city = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto_merge['Latitude'], df_toronto_merge['Longitude'], df_toronto_merge['Borough'], df_toronto_merge['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_city)  
    
map_city


__Work with only boroughs that contain the word Toronto__ 

In [128]:
df_toronto_data =  df_toronto_merge[df_toronto_merge['Borough'].str.contains('Toronto')]
df_toronto_data

Unnamed: 0,Postalcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
37,M4E,East Toronto,The Beaches,M4E,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West,Riverdale",M4K,43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West,India Bazaar",M4L,43.668999,-79.315572
43,M4M,East Toronto,Studio District,M4M,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,M4N,43.72802,-79.38879
45,M4P,Central Toronto,Davisville North,M4P,43.712751,-79.390197
46,M4R,Central Toronto,North Toronto West,M4R,43.715383,-79.405678
47,M4S,Central Toronto,Davisville,M4S,43.704324,-79.38879
48,M4T,Central Toronto,"Moore Park,Summerhill East",M4T,43.689574,-79.38316
49,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",M4V,43.686412,-79.400049


## Let's use the Foursquare API to explore the neighborhoods and segment them

In [131]:

CLIENT_ID = '4RB5IXVXDPPO4XJCRVLBCZBSFA5TAKFNQ0RY3TFFPVGT4LOT' # your Foursquare ID
CLIENT_SECRET = 'DSIJ4WL2JZDRVLIE3JFID4K0Y54PPIJ3I0JYHYFT12XYYJOL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4RB5IXVXDPPO4XJCRVLBCZBSFA5TAKFNQ0RY3TFFPVGT4LOT
CLIENT_SECRET:DSIJ4WL2JZDRVLIE3JFID4K0Y54PPIJ3I0JYHYFT12XYYJOL


## Explore Neighborhoods
#### Let's create a function to find Nearby venues to all the neighborhoods

In [136]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [137]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [138]:
toronto_venues = getNearbyVenues(names=df_toronto_data['Neighbourhood'],
                                   latitudes=df_toronto_data['Latitude'],
                                   longitudes=df_toronto_data['Longitude']
                                  )
toronto_venues

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvall

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
1,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
2,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
3,The Beaches,43.676357,-79.293031,Beaches Fitness,43.680319,-79.290991,Gym / Fitness Center
4,The Beaches,43.676357,-79.293031,Dip 'n Sip,43.678897,-79.297745,Coffee Shop
5,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
6,"The Danforth West,Riverdale",43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
7,"The Danforth West,Riverdale",43.679557,-79.352188,MenEssentials,43.677820,-79.351265,Cosmetics Shop
8,"The Danforth West,Riverdale",43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
9,"The Danforth West,Riverdale",43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant


In [140]:
print(toronto_venues.shape)
toronto_venues.head(10)

(1708, 7)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
1,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
2,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
3,The Beaches,43.676357,-79.293031,Beaches Fitness,43.680319,-79.290991,Gym / Fitness Center
4,The Beaches,43.676357,-79.293031,Dip 'n Sip,43.678897,-79.297745,Coffee Shop
5,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
6,"The Danforth West,Riverdale",43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop
7,"The Danforth West,Riverdale",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop
8,"The Danforth West,Riverdale",43.679557,-79.352188,Messini Authentic Gyros,43.677827,-79.350569,Greek Restaurant
9,"The Danforth West,Riverdale",43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant


In [141]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 239 uniques categories.


## Analyze Each Neighborhood

In [145]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [146]:
toronto_onehot.shape

(1708, 239)

In [147]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.05,0.0,0.04,0.01,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011494,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.011494,0.0


## Build the Top 10 venues into a dataframe

In [148]:
# **Build ten top venues dataset.**
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [149]:
import numpy as np

In [154]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)


print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(38, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Steakhouse,American Restaurant,Thai Restaurant,Restaurant,Asian Restaurant,Bakery,Bar,Clothing Store
1,Berczy Park,Coffee Shop,Restaurant,Cocktail Bar,Café,Farmers Market,Steakhouse,Pub,Seafood Restaurant,Bakery,Cheese Shop
2,"Brockton,Exhibition Place,Parkdale Village",Coffee Shop,Café,Breakfast Spot,Bar,Stadium,Burrito Place,Italian Restaurant,Climbing Gym,Office,Furniture / Home Store
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Auto Workshop,Park,Pizza Place,Moving Target,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Plane,Boutique,Harbor / Marina,Airport,Airport Food Court,Airport Gate,Boat or Ferry


## Cluster Neighborhoods ( 5 clusters )

 **Calculate clustering using k-means algorithm.**

In [155]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 3, 2, 2, 2, 2, 1,
       2, 0, 2, 2, 0, 4, 2, 2, 2, 2, 2, 2, 2, 1, 2], dtype=int32)

#### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [156]:
toronto_merged = df_toronto_data

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

print(toronto_merged.shape)
toronto_merged.head() # check the last columns!

(38, 17)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Postalcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,The Beaches,M4E,43.676357,-79.293031,2,Coffee Shop,Gym / Fitness Center,Pub,Women's Store,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
41,M4K,East Toronto,"The Danforth West,Riverdale",M4K,43.679557,-79.352188,2,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Yoga Studio,Cosmetics Shop,Brewery,Bubble Tea Shop,Restaurant
42,M4L,East Toronto,"The Beaches West,India Bazaar",M4L,43.668999,-79.315572,2,Park,Pizza Place,Pub,Fast Food Restaurant,Board Shop,Liquor Store,Fish & Chips Shop,Burger Joint,Sandwich Place,Burrito Place
43,M4M,East Toronto,Studio District,M4M,43.659526,-79.340923,1,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place,Cheese Shop
44,M4N,Central Toronto,Lawrence Park,M4N,43.72802,-79.38879,2,Bus Line,Park,Dim Sum Restaurant,Swim School,Women's Store,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


## Finally, let's visualize the resulting clusters

  - **Build cluster dataset and plot the map**

In [157]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [159]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine clusters

### Cluster 1

In [160]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
65,Central Toronto,-79.405678,0,Coffee Shop,Café,Sandwich Place,Pizza Place,History Museum,Indian Restaurant,Pharmacy,Cosmetics Shop,Pub,Burger Joint
68,Downtown Toronto,-79.39442,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Boutique,Harbor / Marina,Airport,Airport Food Court,Airport Gate,Boat or Ferry


### Cluster 2

In [161]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,East Toronto,-79.340923,1,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place,Cheese Shop
52,Downtown Toronto,-79.38316,1,Japanese Restaurant,Coffee Shop,Sushi Restaurant,Gay Bar,Restaurant,Burger Joint,Café,Pub,Bubble Tea Shop,Men's Store
63,Central Toronto,-79.416936,1,Pool,Music Venue,Garden,Women's Store,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
84,West Toronto,-79.48445,1,Coffee Shop,Pizza Place,Café,Sushi Restaurant,Diner,Italian Restaurant,Gym,Bookstore,Food,Bar


### Cluster 3

In [162]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,East Toronto,-79.293031,2,Coffee Shop,Gym / Fitness Center,Pub,Women's Store,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
41,East Toronto,-79.352188,2,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Yoga Studio,Cosmetics Shop,Brewery,Bubble Tea Shop,Restaurant
42,East Toronto,-79.315572,2,Park,Pizza Place,Pub,Fast Food Restaurant,Board Shop,Liquor Store,Fish & Chips Shop,Burger Joint,Sandwich Place,Burrito Place
44,Central Toronto,-79.38879,2,Bus Line,Park,Dim Sum Restaurant,Swim School,Women's Store,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
45,Central Toronto,-79.390197,2,Park,Breakfast Spot,Burger Joint,Food & Drink Shop,Dance Studio,Clothing Store,Sandwich Place,Hotel,Grocery Store,Gym
46,Central Toronto,-79.405678,2,Sporting Goods Shop,Coffee Shop,Clothing Store,Salon / Barbershop,Sandwich Place,Diner,Rental Car Location,Furniture / Home Store,Dessert Shop,Chinese Restaurant
47,Central Toronto,-79.38879,2,Pizza Place,Dessert Shop,Sandwich Place,Italian Restaurant,Seafood Restaurant,Café,Sushi Restaurant,Coffee Shop,Chinese Restaurant,Diner
48,Central Toronto,-79.38316,2,Playground,Gym,Tennis Court,Park,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Diner,Electronics Store
49,Central Toronto,-79.400049,2,Pub,Coffee Shop,American Restaurant,Light Rail Station,Sushi Restaurant,Bagel Shop,Sports Bar,Supermarket,Fried Chicken Joint,Pizza Place
50,Downtown Toronto,-79.377529,2,Park,Playground,Trail,Diner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


### Cluster 4

In [163]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, \
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,Downtown Toronto,-79.387383,3,Coffee Shop,Café,Italian Restaurant,Burger Joint,Bar,Thai Restaurant,Falafel Restaurant,Bubble Tea Shop,Chinese Restaurant,Indian Restaurant


### Cluster 5

In [164]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, \
                   toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
69,Downtown Toronto,-79.374846,4,Coffee Shop,Restaurant,Café,Italian Restaurant,Beer Bar,Seafood Restaurant,Pub,Hotel,Cocktail Bar,Bakery
