<h1>Parse text on wiki page to import it to a dataframe</h1>

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup


url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(url,'lxml')

In [2]:
table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

table1=[]
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        table1.append(row)
    
df = pd.DataFrame(table1, columns=["Postcode", "Borough", "Neighbourhood"])
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


 <b>Ignore cells with a borough that is Not assigned.</b>

In [3]:
index = df[ df['Borough'] =='Not assigned'].index
df.drop(index , inplace=True)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
9,M9A,Queen's Park,Queen's Park
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


<b>More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. <br>These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.</b>

In [4]:
group = df.groupby(['Postcode','Borough'], sort=False).agg( ', '.join)
df=group.reset_index()
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Not assigned
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


<b>If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. <br>So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.</b>

In [5]:
df.loc[df['Neighbourhood'] =='Not assigned' , 'Neighbourhood'] = df['Borough']
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [6]:
df.shape

(103, 3)

<h1>Import geographical coordinates of the neighborhoods using the Geocoder package</h1>

In [7]:
!wget -q -O 'Toronto_coordinates.csv'  http://cocl.us/Geospatial_data
df_coordinates = pd.read_csv('Toronto_coordinates.csv')
df_coordinates.columns=['Postcode','Latitude','Longitude']
df_coordinates.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<b>Merge the two data frames based on the postal code</b>

In [8]:
df_Toronto = pd.merge(df,
                 df_coordinates[['Postcode','Latitude', 'Longitude']],
                 on='Postcode')
df_Toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


<h1>Explore and cluster neighborhoods in Toronto</h1>

Download all dependencies

In [9]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    cer

In [10]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


### Create a map of Toronto with neighborhoods superimposed on top.

In [16]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_Toronto['Latitude'], df_Toronto['Longitude'], df_Toronto['Borough'], df_Toronto['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Limit to boroughs that contain the word Toronto

In [19]:
df_limited = df_Toronto[df_Toronto['Neighbourhood'].str.contains("Toronto")]

# create map of Toronto using latitude and longitude values
map_limited = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_limited['Latitude'], df_limited['Longitude'], df_limited['Borough'], df_limited['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_limited)  
    
map_limited

#### Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [11]:
 # @hidden_cell
CLIENT_ID = 'IW2TTAGKEHCWQ2EZACB3CNBLTE5QGJ3MHN4SCGYJ2KMP23X5' # your Foursquare ID
CLIENT_SECRET = 'XBR0G5MO3GDK1SRX5T3SX1AMN3TZMCEZ1DM5KZURYATS5VY0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: IW2TTAGKEHCWQ2EZACB3CNBLTE5QGJ3MHN4SCGYJ2KMP23X5
CLIENT_SECRET:XBR0G5MO3GDK1SRX5T3SX1AMN3TZMCEZ1DM5KZURYATS5VY0


In [21]:
LIMIT = 100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
limited_venues = getNearbyVenues(names=df_limited['Neighbourhood'],
                                   latitudes=df_limited['Latitude'],
                                   longitudes=df_limited['Longitude']
                                  )

East Toronto
Harbourfront East, Toronto Islands, Union Station
CFB Toronto, Downsview East
Design Exchange, Toronto Dominion Centre
North Toronto West
Harbord, University of Toronto
Humber Bay Shores, Mimico South, New Toronto


In [23]:
limited_venues.shape

(281, 7)

#### Check how many venues have been returned for each neighbourhood

In [28]:
limited_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"CFB Toronto, Downsview East",3,3,3,3,3,3
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100
East Toronto,5,5,5,5,5,5
"Harbord, University of Toronto",37,37,37,37,37,37
"Harbourfront East, Toronto Islands, Union Station",100,100,100,100,100,100
"Humber Bay Shores, Mimico South, New Toronto",14,14,14,14,14,14
North Toronto West,22,22,22,22,22,22


#### Let's find out how many unique categories can be curated from all the returned venues

In [29]:
print('There are {} uniques categories.'.format(len(limited_venues['Venue Category'].unique())))

There are 100 uniques categories.


### Analysing each neighbourhood

In [32]:
# one hot encoding
toronto_onehot = pd.get_dummies(limited_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = limited_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Airport,American Restaurant,Aquarium,Art Gallery,Asian Restaurant,Bakery,Bar,Baseball Stadium,Basketball Stadium,Beer Bar,Beer Store,Bistro,Bookstore,Brewery,Bubble Tea Shop,Burger Joint,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,General Travel,Gift Shop,Greek Restaurant,Gym,Gym / Fitness Center,Health & Beauty Service,History Museum,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Lake,Liquor Store,Lounge,Mexican Restaurant,Monument / Landmark,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Poutine Place,Pub,Rental Car Location,Restaurant,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Yoga Studio
0,East Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,East Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,East Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,East Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,East Toronto,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [33]:
toronto_onehot.shape

(281, 101)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [34]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Airport,American Restaurant,Aquarium,Art Gallery,Asian Restaurant,Bakery,Bar,Baseball Stadium,Basketball Stadium,Beer Bar,Beer Store,Bistro,Bookstore,Brewery,Bubble Tea Shop,Burger Joint,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,General Travel,Gift Shop,Greek Restaurant,Gym,Gym / Fitness Center,Health & Beauty Service,History Museum,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Lake,Liquor Store,Lounge,Mexican Restaurant,Monument / Landmark,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Poutine Place,Pub,Rental Car Location,Restaurant,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Yoga Studio
0,"CFB Toronto, Downsview East",0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Design Exchange, Toronto Dominion Centre",0.0,0.04,0.0,0.01,0.02,0.02,0.03,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.08,0.01,0.0,0.01,0.15,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.06,0.01,0.01,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.04,0.0,0.02,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.0
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Harbord, University of Toronto",0.0,0.0,0.0,0.0,0.0,0.054054,0.054054,0.0,0.0,0.027027,0.027027,0.0,0.081081,0.0,0.0,0.0,0.135135,0.027027,0.0,0.0,0.0,0.027027,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.054054,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.054054,0.0,0.0,0.027027,0.0,0.0
4,"Harbourfront East, Toronto Islands, Union Station",0.0,0.0,0.05,0.01,0.0,0.02,0.02,0.02,0.01,0.01,0.0,0.01,0.0,0.03,0.01,0.0,0.04,0.01,0.01,0.0,0.13,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.05,0.01,0.01,0.01,0.0,0.03,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.03,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0
5,"Humber Bay Shores, Mimico South, New Toronto",0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North Toronto West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.045455,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455


#### Let's print each neighborhood along with the top 5 most common venues

In [36]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----CFB Toronto, Downsview East----
               venue  freq
0            Airport  0.33
1  Electronics Store  0.33
2               Park  0.33
3       Noodle House  0.00
4                Pub  0.00


----Design Exchange, Toronto Dominion Centre----
                 venue  freq
0          Coffee Shop  0.15
1                 Café  0.08
2                Hotel  0.06
3  American Restaurant  0.04
4           Restaurant  0.04


----East Toronto----
               venue  freq
0               Park   0.4
1  Convenience Store   0.2
2        Coffee Shop   0.2
3       Intersection   0.2
4       Liquor Store   0.0


----Harbord, University of Toronto----
                 venue  freq
0                 Café  0.14
1            Bookstore  0.08
2                  Bar  0.05
3   Italian Restaurant  0.05
4  Japanese Restaurant  0.05


----Harbourfront East, Toronto Islands, Union Station----
            venue  freq
0     Coffee Shop  0.13
1           Hotel  0.05
2        Aquarium  0.05
3            Café  0.

### Let's put that into a *pandas* dataframe

In [37]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"CFB Toronto, Downsview East",Airport,Park,Electronics Store,Food Court,Cosmetics Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store
1,"Design Exchange, Toronto Dominion Centre",Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Gastropub,Steakhouse,Italian Restaurant,Seafood Restaurant,Bar
2,East Toronto,Park,Intersection,Coffee Shop,Convenience Store,Yoga Studio,Food Court,Dance Studio,Deli / Bodega,Dessert Shop,Diner
3,"Harbord, University of Toronto",Café,Bookstore,Bakery,Sandwich Place,Italian Restaurant,Restaurant,Bar,Japanese Restaurant,Theater,Gym
4,"Harbourfront East, Toronto Islands, Union Station",Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant,Scenic Lookout,Brewery,Fried Chicken Joint,Restaurant,Bar


## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [40]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 1, 3, 0, 4, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_limited

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,M4J,East York,East Toronto,43.685347,-79.338106,1,Park,Intersection,Coffee Shop,Convenience Store,Yoga Studio,Food Court,Dance Studio,Deli / Bodega,Dessert Shop,Diner
36,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752,0,Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant,Scenic Lookout,Brewery,Fried Chicken Joint,Restaurant,Bar
40,M3K,North York,"CFB Toronto, Downsview East",43.737473,-79.464763,2,Airport,Park,Electronics Store,Food Court,Cosmetics Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store
42,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.647177,-79.381576,0,Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Gastropub,Steakhouse,Italian Restaurant,Seafood Restaurant,Bar
73,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Coffee Shop,Clothing Store,Sporting Goods Shop,Health & Beauty Service,Chinese Restaurant,Salon / Barbershop,Restaurant,Rental Car Location,Cosmetics Shop,Burger Joint


Finally, let's visualize the resulting clusters

In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

#### Cluster 1

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Downtown Toronto,0,Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant,Scenic Lookout,Brewery,Fried Chicken Joint,Restaurant,Bar
42,Downtown Toronto,0,Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Gastropub,Steakhouse,Italian Restaurant,Seafood Restaurant,Bar
73,Central Toronto,0,Coffee Shop,Clothing Store,Sporting Goods Shop,Health & Beauty Service,Chinese Restaurant,Salon / Barbershop,Restaurant,Rental Car Location,Cosmetics Shop,Burger Joint


#### Cluster 2

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,East York,1,Park,Intersection,Coffee Shop,Convenience Store,Yoga Studio,Food Court,Dance Studio,Deli / Bodega,Dessert Shop,Diner


#### Cluster 3

In [50]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,North York,2,Airport,Park,Electronics Store,Food Court,Cosmetics Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store


## Cluster 4

In [51]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
80,Downtown Toronto,3,Café,Bookstore,Bakery,Sandwich Place,Italian Restaurant,Restaurant,Bar,Japanese Restaurant,Theater,Gym


## Cluster 5

In [52]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
88,Etobicoke,4,Fast Food Restaurant,Fried Chicken Joint,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Café,Pet Store,Mexican Restaurant,Liquor Store
