# Instructions

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.
2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

Format: ![What dataframe should look like](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1561766400000&hmac=d8kll666HCK-Njmyedf9N6OuZwJlu8ASZ3bvcJG7ST8)

3. To create the above dataframe:

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.
4. Submit a link to your Notebook on your Github repository. (10 marks)

Note: There are different website scraping libraries and packages in Python. One of the most common packages is BeautifulSoup. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

The package is so popular that there is a plethora of tutorials and examples of how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k

Use the BeautifulSoup package or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe

# Project

### Get data from website

In [128]:
import pandas as pd
import numpy as np

In [129]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' 

In [130]:
# create dataframe with url
df = pd.read_html(url, header=0) 

# we want the first table in the list
df = df[0]                        

In [131]:
df.head(5)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Clean Data

In [132]:
df.replace("Not assigned", np.nan, inplace=True)    
df.dropna(subset=["Borough"], axis=0, inplace=True) # drop NaN values in Borough column

In [133]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


In [134]:
# Reset index because we dropped some rows.
df.reset_index(drop=True, inplace=True) 
df.head(2)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village


In [135]:
# Sort values to find 1 missing data point.
df.sort_values(by=["Neighbourhood"], na_position='first', inplace=True) 
df.head(2)

Unnamed: 0,Postcode,Borough,Neighbourhood
6,M7A,Queen's Park,
49,M5H,Downtown Toronto,Adelaide


In [136]:
# This only works because there is one missing value, otherwise
# I would need a more appropriate method.
df.replace(np.nan, 'Queen\'s Park', inplace=True) 

In [137]:
df.sort_index(inplace=True)
df.head(2)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village


In [138]:
# Group the dataframe by the postcodes
grouped = df.groupby(['Postcode'])

# Isolate Neighbourhood
group_Neighbourhood = grouped['Neighbourhood']

#Find Unique Values
uniqNeigh = group_Neighbourhood.unique()

# Do the same with Borough
group_borough = grouped['Borough']
uniqBor = group_borough.unique()

# Reconstruct dataframe
newDF = pd.DataFrame(uniqBor)
newDF['Neighbourhood'] = uniqNeigh
df = newDF.reset_index()

In [139]:
df.head(2)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,[Scarborough],"[Rouge, Malvern]"
1,M1C,[Scarborough],"[Highland Creek, Rouge Hill, Port Union]"


In [140]:
# Pull borough names out of lists format
lst = []
for i in df['Borough']:
    lst.append(i[0])

In [141]:
# Re-add names
df['Borough'] = lst

In [142]:
df.head(2)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"


In [143]:
# Pull out Neigbourhood series 
seriesNeigh = df['Neighbourhood']

# Change it to string
seriesNeigh = seriesNeigh.apply(', '.join)

# Replace column values
df['Neighbourhood'] = seriesNeigh

In [144]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [154]:
df.shape

(103, 6)

### Add Lat & Long Coordinates for Neighborhood

In [155]:
# Install gecoder
!conda install -c conda-forge geocoder --yes 

Solving environment: \ ^C
failed

CondaError: KeyboardInterrupt



I tried running the code below, but the for loop was getting stuck somehow. I downloaded the .csv file instead.

In [146]:
# import geocoder 

# lat = []
# long = []

# for i in range(len(df['Postcode'])):
#     # loop until you get the coordinates
#     postal_code = df['Postcode']
#     g = geocoder.google('{}, Toronto, Ontario'.format(str(postal_code[i])))
#     lat_lng_coords = g.latlng

#     lat.append(lat_lng_coords[0])
#     long.append(lat_lng_coords[1])

In [147]:
# Download coordinate data
!wget -q -O 'coordinates.csv' http://cocl.us/Geospatial_data

In [148]:
coordDF = pd.read_csv('coordinates.csv')

In [149]:
coordDF.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [150]:
coordDF = coordDF.sort_values(by=['Postal Code'])
coordDF.head(2)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497


In [151]:
df.head(2)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"


In [152]:
# Combine the two dataframes
df = pd.merge(df, coordDF, left_index=True, right_index=True)

In [153]:
df.drop(columns=['Latitude_y', 'Longitude_y'], inplace=True)

ValueError: labels ['Latitude_y' 'Longitude_y'] not contained in axis

### Data Clustering & Analysis

In [156]:
# The code was removed by Watson Studio for sharing.

In [41]:
!conda install -c conda-forge folium=0.5.0 --yes 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

In [157]:
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import requests

In [158]:
dfDowntown = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
dfDowntown.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,M4W,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",M4X,43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,M4Y,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",M5A,43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",M5B,43.657162,-79.378937
5,M5C,Downtown Toronto,St. James Town,M5C,43.651494,-79.375418
6,M5E,Downtown Toronto,Berczy Park,M5E,43.644771,-79.373306
7,M5G,Downtown Toronto,Central Bay Street,M5G,43.657952,-79.387383
8,M5H,Downtown Toronto,"Adelaide, King, Richmond",M5H,43.650571,-79.384568
9,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",M5J,43.640816,-79.381752


In [159]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET, 
    version, 
    lat, 
    long, 
    radius, 
    LIMIT)

In [160]:
results = requests.get(url).json()

In [161]:
results

{'meta': {'code': 200, 'requestId': '5d1a6d6dad1789002cb70bf6'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4adcb343f964a520e32e21e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/food_grocery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d118951735',
         'name': 'Grocery Store',
         'pluralName': 'Grocery Stores',
         'primary': True,
         'shortName': 'Grocery Store'}],
       'id': '4adcb343f964a520e32e21e3',
       'location': {'address': '446 Summerhill Ave',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'btwn. MacLennan Ave. and Glen Rd.',
        'distance': 764,
        'formattedAddress': ['446 Summerhill Ave (btwn. MacLennan Ave. and Glen Rd.)',
         'Toronto

In [162]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list=[]
    for name, lat, long in zip(names, latitudes, longitudes):
        print(names)
        
        url = url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET, 
            version, 
            lat, 
            long, 
            radius, 
            LIMIT)
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(name, lat, long, 
                             v['venue']['name'], 
                             v['venue']['location']['lat'], 
                             v['venue']['location']['lng'], 
                             v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                                 'Neighborhood Latitude', 
                                 'Neighborhood Longitude', 
                                 'Venue', 
                                 'Venue Latitude', 
                                 'Venue Longitude', 
                                 'Venue Category']
        
    return(nearby_venues)

In [163]:
dfDowntown.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,M4W,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",M4X,43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,M4Y,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",M5A,43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",M5B,43.657162,-79.378937


In [164]:
toronto_venues = getNearbyVenues(names=dfDowntown['Neighbourhood'],
                                latitudes= dfDowntown['Latitude'],
                                longitudes= dfDowntown['Longitude']
                                )

0                                              Rosedale
1                           Cabbagetown, St. James Town
2                                  Church and Wellesley
3                             Harbourfront, Regent Park
4                              Ryerson, Garden District
5                                        St. James Town
6                                           Berczy Park
7                                    Central Bay Street
8                              Adelaide, King, Richmond
9     Harbourfront East, Toronto Islands, Union Station
10             Design Exchange, Toronto Dominion Centre
11                       Commerce Court, Victoria Hotel
12                       Harbord, University of Toronto
13            Chinatown, Grange Park, Kensington Market
14    CN Tower, Bathurst Quay, Island airport, Harbo...
15                      Stn A PO Boxes 25 The Esplanade
16               First Canadian Place, Underground city
17                                             C

In [165]:
print(toronto_venues.shape)
toronto_venues.head()

(1288, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Mooredale House,43.678631,-79.380091,Building
1,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
2,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
3,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
4,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail


In [95]:
print("There are {} unique categories.".format(len(toronto_venues['Venue Category'].unique())))

There are 209 unique categories.


In [166]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",16,16,16,16,16,16
"Cabbagetown, St. James Town",46,46,46,46,46,46
Central Bay Street,88,88,88,88,88,88
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,15,15,15,15,15,15
Church and Wellesley,87,87,87,87,87,87
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100


In [167]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [101]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.0,0.0625,0.0625,0.0625,0.125,0.125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.011364,0.0,0.0
5,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.06,0.0,0.0,0.03,0.01,0.0,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.011494,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011494,0.0,0.011494,0.0,0.011494,0.0
8,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0
9,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0


In [169]:
toronto_onehot.shape

(1288, 209)

In [174]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.shape

(18, 209)

In [172]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2  American Restaurant  0.04
3           Steakhouse  0.04
4                  Bar  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2  Italian Restaurant  0.04
3                Café  0.04
4            Beer Bar  0.04


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge  0.12
1  Airport Terminal  0.12
2   Airport Service  0.12
3             Plane  0.06
4       Coffee Shop  0.06


----Cabbagetown, St. James Town----
                venue  freq
0         Coffee Shop  0.09
1          Restaurant  0.07
2              Bakery  0.04
3  Italian Restaurant  0.04
4         Pizza Place  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.16
1                Café  0.05
2  Italian 

In [175]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [176]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,American Restaurant,Thai Restaurant,Cosmetics Shop,Bakery,Gym,Hotel
1,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Cheese Shop,Steakhouse,Beer Bar,Italian Restaurant,Farmers Market,Café,Bakery
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Boutique,Sculpture Garden,Bar,Plane,Boat or Ferry,Airport Gate
3,"Cabbagetown, St. James Town",Coffee Shop,Restaurant,Park,Italian Restaurant,Café,Pizza Place,Bakery,Pub,Bar,Bank
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Middle Eastern Restaurant,Sandwich Place,Burger Joint,Japanese Restaurant,Bubble Tea Shop,Bar,Bakery


In [181]:
k = 5
toronto_grouped_cluster = toronto_grouped.drop('Neighborhood', 1)

kMeans = KMeans(n_clusters=k, random_state=0).fit(toronto_grouped_cluster)

kMeans.labels_[0:10]

array([0, 0, 3, 0, 0, 2, 4, 0, 0, 0], dtype=int32)

In [188]:
# add clustering labels
neighborhoods_venues_sorted.head()
dfDowntown['Neighborhood'] = dfDowntown['Neighbourhood']
dfDowntown.drop(columns=['Neighbourhood'], inplace=True)
dfDowntown.head()

Unnamed: 0,Postcode,Borough,Postal Code,Latitude,Longitude,Neighborhood
0,M4W,Downtown Toronto,M4W,43.679563,-79.377529,Rosedale
1,M4X,Downtown Toronto,M4X,43.667967,-79.367675,"Cabbagetown, St. James Town"
2,M4Y,Downtown Toronto,M4Y,43.66586,-79.38316,Church and Wellesley
3,M5A,Downtown Toronto,M5A,43.65426,-79.360636,"Harbourfront, Regent Park"
4,M5B,Downtown Toronto,M5B,43.657162,-79.378937,"Ryerson, Garden District"


In [189]:
toronto_merged = dfDowntown

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns

Unnamed: 0,Postcode,Borough,Postal Code,Latitude,Longitude,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,M4W,43.679563,-79.377529,Rosedale,1,Park,Building,Playground,Trail,Women's Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run
1,M4X,Downtown Toronto,M4X,43.667967,-79.367675,"Cabbagetown, St. James Town",0,Coffee Shop,Restaurant,Park,Italian Restaurant,Café,Pizza Place,Bakery,Pub,Bar,Bank
2,M4Y,Downtown Toronto,M4Y,43.66586,-79.38316,Church and Wellesley,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Burger Joint,Hotel,Mediterranean Restaurant,Café,Bubble Tea Shop
3,M5A,Downtown Toronto,M5A,43.65426,-79.360636,"Harbourfront, Regent Park",0,Coffee Shop,Bakery,Pub,Park,Mexican Restaurant,Breakfast Spot,Restaurant,Café,Theater,Cosmetics Shop
4,M5B,Downtown Toronto,M5B,43.657162,-79.378937,"Ryerson, Garden District",0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Bubble Tea Shop,Tea Room,Japanese Restaurant,Lingerie Store,Italian Restaurant


In [192]:
# create map
map_clusters = folium.Map(location=[lat, long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Caffeine Central

In [193]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,"Cabbagetown, St. James Town",0,Coffee Shop,Restaurant,Park,Italian Restaurant,Café,Pizza Place,Bakery,Pub,Bar,Bank
2,Downtown Toronto,Church and Wellesley,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Burger Joint,Hotel,Mediterranean Restaurant,Café,Bubble Tea Shop
3,Downtown Toronto,"Harbourfront, Regent Park",0,Coffee Shop,Bakery,Pub,Park,Mexican Restaurant,Breakfast Spot,Restaurant,Café,Theater,Cosmetics Shop
4,Downtown Toronto,"Ryerson, Garden District",0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Bubble Tea Shop,Tea Room,Japanese Restaurant,Lingerie Store,Italian Restaurant
5,Downtown Toronto,St. James Town,0,Coffee Shop,Café,Restaurant,Hotel,Bakery,Cosmetics Shop,Breakfast Spot,Gastropub,Italian Restaurant,Cocktail Bar
6,Downtown Toronto,Berczy Park,0,Coffee Shop,Cocktail Bar,Seafood Restaurant,Cheese Shop,Steakhouse,Beer Bar,Italian Restaurant,Farmers Market,Café,Bakery
7,Downtown Toronto,Central Bay Street,0,Coffee Shop,Café,Italian Restaurant,Middle Eastern Restaurant,Sandwich Place,Burger Joint,Japanese Restaurant,Bubble Tea Shop,Bar,Bakery
8,Downtown Toronto,"Adelaide, King, Richmond",0,Coffee Shop,Café,Steakhouse,Bar,American Restaurant,Thai Restaurant,Cosmetics Shop,Bakery,Gym,Hotel
9,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",0,Coffee Shop,Hotel,Aquarium,Italian Restaurant,Café,Brewery,Scenic Lookout,Fried Chicken Joint,Bakery,Restaurant
10,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",0,Coffee Shop,Café,Hotel,Restaurant,Bakery,Italian Restaurant,Bar,Deli / Bodega,Gastropub,Seafood Restaurant


### Walk in the Park

In [194]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Rosedale,1,Park,Building,Playground,Trail,Women's Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


### Young Aspiring Professionals

In [195]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,"Harbord, University of Toronto",2,Café,Italian Restaurant,Bar,Japanese Restaurant,Bookstore,Restaurant,Bakery,Pub,Beer Bar,Sandwich Place
13,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",2,Café,Vegetarian / Vegan Restaurant,Bar,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Bakery,Vietnamese Restaurant,Chinese Restaurant,Noodle House


### Grown Ups Grocery

In [196]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Downtown Toronto,Christie,4,Grocery Store,Café,Park,Baby Store,Italian Restaurant,Diner,Nightclub,Convenience Store,Restaurant,Coffee Shop
