# Question 1:
For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.

2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

3. To create the above dataframe:

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.


In [8]:
#installing the necessary libraries
import sys
!{sys.executable} -m pip install beautifulsoup4 # to extract data from wikipedia
!{sys.executable} -m pip install requests
!{sys.executable} -m pip install geopy
#!{sys.executable} -m pip install 

print('installation finish')

Collecting geopy
  Downloading geopy-2.0.0-py3-none-any.whl (111 kB)
Collecting geographiclib<2,>=1.49
  Downloading geographiclib-1.50-py3-none-any.whl (38 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.0.0
installation finish


In [2]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize  # tranform JSON file into a pandas dataframe

import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Successfully imported')

Successfully imported


In [19]:
# installing lxml parser. Though we can use inbuilt parser i.e. html.parser
import sys
!{sys.executable} -m pip install lxml 



In [9]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response=requests.get(url).text

soup=BeautifulSoup(response, 'html.parser')
#print(soup.prettify())

In [10]:
mytable=soup.find('table',{'class':'wikitable sortable'})
#mytable


In [11]:
#Assigning the names of the column
column_names = ['Postalcode','Borough','Neighborhood']
df = pd.DataFrame(columns = column_names)


In [12]:
#searching through each row in the table of wikipedia and assigning the data of each cell to newly formed dataframe i.e. df
rows=mytable.find_all('tr')

for tr in rows:
    row_data=[]
    for td in tr.find_all('td'):
        row_data.append(td.text.strip())
    if(len(row_data)==3):
        df.loc[len(df)]=row_data
        
    
    

In [13]:
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [14]:
#removing the rows whose 'Borough' value is equal to 'Not assigned'
df=df[df.Borough!='Not assigned']
# or
# df=df.query("Borough!='Not assigned'")  # query with one condition
#df.query('Borough=="Not assigned" & Neighborhood!="Not assigned"') # query with two conditions

print('The shape of the dataframe is: ',df.shape)
df.head()

The shape of the dataframe is:  (103, 3)


Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [42]:
# Assigning the value of Borough to Neighborhood in case it has value 'Not assigned'
#df[df['Neighborhood']=='Not assigned']=df['Borough']
#df.head()

In [16]:
#Grouping the repeating Postalcode value into one and joining its Neighborhood value separated with comma
df1=df.groupby('Postalcode')['Neighborhood'].apply(lambda x: "%s" % ', '.join(x))
df1=df1.reset_index(drop=False)
df1.rename(columns={'Neighborhood':'new_Neighborhood'},inplace=True)
df1

Unnamed: 0,Postalcode,new_Neighborhood
0,M1B,"Malvern, Rouge"
1,M1C,"Rouge Hill, Port Union, Highland Creek"
2,M1E,"Guildwood, Morningside, West Hill"
3,M1G,Woburn
4,M1H,Cedarbrae
...,...,...
98,M9N,Weston
99,M9P,Westmount
100,M9R,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,"South Steeles, Silverstone, Humbergate, Jamest..."


In [17]:
#Merging the old dataframe with new dataframe on the basis of Postalcode
df2=pd.merge(df, df1, on='Postalcode')
df2.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood,new_Neighborhood
0,M3A,North York,Parkwoods,Parkwoods
1,M4A,North York,Victoria Village,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront","Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights","Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government","Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village","Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge","Malvern, Rouge"
7,M3B,North York,Don Mills,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens","Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson","Garden District, Ryerson"


In [18]:
df2.drop(['Neighborhood'], axis=1, inplace=True)

In [19]:
df2.rename(columns={'new_Neighborhood':'Neighborhood'}, inplace=True)
df2.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [20]:
print('The shape of the dataframe is: ',df2.shape)

The shape of the dataframe is:  (103, 3)


# Question 2
Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code.

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create the following dataframe:



In [94]:
def get_geocode(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

In [21]:
df_geo=pd.read_csv('https://cocl.us/Geospatial_data')


In [25]:
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [26]:
df_geo.rename(columns={'Postal Code':'Postalcode'}, inplace=True)

In [27]:
df_join=pd.merge(df2, df_geo, on='Postalcode')
df_join.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


# Question 3
Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

In [28]:
toronto_data=df_join[df_join['Borough'].str.contains("Toronto")]
toronto_data.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [60]:
CLIENT_ID = 'XHED4EHOOSFZ41A1QJXZ21TLW2KGUUUEH45ETS5QPVJDTXTU' # your Foursquare ID
CLIENT_SECRET = 'H53SGUBWTK44GUDTUXOI242XB5DVFTISNO4X0MGNVQKL4FEL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [30]:
toronto_data.loc[2, 'Neighborhood']


'Regent Park, Harbourfront'

In [31]:
neighborhood_latitude = toronto_data.loc[2, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[2, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[2, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


In [None]:
LIMIT=100 # limit of number of venues returned by Foursquare API
radius=500

url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
neighborhood_latitude,
neighborhood_longitude,
radius,
LIMIT)

url

In [36]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f1acb8ddf83fc0d8ca83d1f'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 45,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
 

In [37]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [44]:
venues=results['response']['groups'][0]['items']
venues

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '54ea41ad498e9a11e9e13308',
   'name': 'Roselle Desserts',
   'location': {'address': '362 King St E',
    'crossStreet': 'Trinity St',
    'lat': 43.653446723052674,
    'lng': -79.3620167174383,
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.653446723052674,
      'lng': -79.3620167174383}],
    'distance': 143,
    'postalCode': 'M5A 1K9',
    'cc': 'CA',
    'city': 'Toronto',
    'state': 'ON',
    'country': 'Canada',
    'formattedAddress': ['362 King St E (Trinity St)',
     'Toronto ON M5A 1K9',
     'Canada']},
   'categories': [{'id': '4bf58dd8d48988d16a941735',
     'name': 'Bakery',
     'pluralName': 'Bakeries',
     'shortName': 'Bakery',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'grou

In [47]:
#Now we are ready to clean the json and structure it into a pandas dataframe.
nearby_venues=json_normalize(venues)
filtered_columns=['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues=nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories']=nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns=[col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Impact Kitchen,Restaurant,43.656369,-79.35698


In [51]:
print("{} venues were returned by foursquare".format(nearby_venues.shape[0]))

45 venues were returned by foursquare


# Creating a function to repeat the above process for all neighborhoods in Toronto

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT1=10
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT1)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [67]:
#Calling above function for each neighborhood of Toronto
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


In [68]:
print(toronto_venues.shape)
toronto_venues.head()

(351, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [69]:
toronto_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,10,10,10,10,10,10
"Brockton, Parkdale Village, Exhibition Place",10,10,10,10,10,10
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",10,10,10,10,10,10
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",10,10,10,10,10,10
Central Bay Street,10,10,10,10,10,10
Christie,10,10,10,10,10,10
Church and Wellesley,10,10,10,10,10,10
"Commerce Court, Victoria Hotel",10,10,10,10,10,10
Davisville,10,10,10,10,10,10
Davisville North,10,10,10,10,10,10


In [71]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))


There are 122 uniques categories.


In [88]:
#Analyzing each neighborhood

# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
#toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
#fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
#toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.drop(['Neighborhood'],axis=1,inplace=True) 
toronto_onehot.insert(loc=0, column='Neighborhood', value=toronto_venues['Neighborhood'] )

toronto_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,...,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [89]:
toronto_onehot.shape


(351, 122)

In [77]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,...,Swim School,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.1,0.1,0.1,0.2,0.1,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [91]:
toronto_grouped.shape


(39, 122)

In [99]:
#Printing each neighborhood with top 5 most common venues
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
   # print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()#transpose the columns of each neighborhood into rows
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

                           venue  freq
0                   Liquor Store   0.1
1                   Concert Hall   0.1
2  Vegetarian / Vegan Restaurant   0.1
3                 Farmers Market   0.1
4                           Park   0.1


                    venue  freq
0             Coffee Shop   0.2
1                     Gym   0.1
2  Furniture / Home Store   0.1
3                     Bar   0.1
4                  Bakery   0.1


                  venue  freq
0           Pizza Place   0.1
1         Garden Center   0.1
2  Fast Food Restaurant   0.1
3         Burrito Place   0.1
4        Farmers Market   0.1


             venue  freq
0   Airport Lounge   0.2
1  Harbor / Marina   0.1
2          Airport   0.1
3            Plane   0.1
4              Bar   0.1


                        venue  freq
0                 Coffee Shop   0.3
1            Sushi Restaurant   0.1
2  Modern European Restaurant   0.1
3   Middle Eastern Restaurant   0.1
4                         Spa   0.1


                ve

In [102]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [104]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Park,Farmers Market,Vegetarian / Vegan Restaurant,Cocktail Bar,Museum,Thai Restaurant,Concert Hall,Restaurant,Beer Bar,Liquor Store
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Bakery,Café,Breakfast Spot,Italian Restaurant,Pet Store,Bar,Gym,Furniture / Home Store,Donut Shop
2,"Business reply mail Processing Centre, South C...",Skate Park,Auto Workshop,Pizza Place,Burrito Place,Restaurant,Brewery,Farmers Market,Fast Food Restaurant,Comic Shop,Garden Center
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport,Airport Food Court,Airport Gate,Airport Terminal,Coffee Shop,Bar,Harbor / Marina,Plane,Wine Bar
4,Central Bay Street,Coffee Shop,Gastropub,Sushi Restaurant,Pizza Place,Modern European Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Spa,Flea Market,Diner


# Now Cluster Neighborhoods 

In [111]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 3, 1, 0, 1, 2, 1, 1, 2])

In [116]:
#adding clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Park,Breakfast Spot,Coffee Shop,Spa,Restaurant,Bakery,Pub,Historic Site,Distribution Center,Gym / Fitness Center
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Creperie,Burrito Place,Distribution Center,Park,Arts & Crafts Store,Yoga Studio,Italian Restaurant,Furniture / Home Store
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Clothing Store,Café,Music Venue,Plaza,Burrito Place,Tea Room,Pizza Place,Thai Restaurant,Theater,Comic Shop
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Japanese Restaurant,Restaurant,Cosmetics Shop,Creperie,Italian Restaurant,Gym,Food Truck,Middle Eastern Restaurant,Distribution Center
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Health Food Store,Trail,Asian Restaurant,Pub,Wine Bar,Dog Run,Distribution Center,Diner,Dessert Shop,Department Store


In [113]:
#defining the latitude and longitude of City/place whose map is to be created
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 43.6534817, -79.3839347.


In [115]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examining clusters

# Cluster 1

In [117]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Creperie,Burrito Place,Distribution Center,Park,Arts & Crafts Store,Yoga Studio,Italian Restaurant,Furniture / Home Store
15,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Restaurant,Cosmetics Shop,Creperie,Italian Restaurant,Gym,Food Truck,Middle Eastern Restaurant,Distribution Center
24,Downtown Toronto,0,Coffee Shop,Gastropub,Sushi Restaurant,Pizza Place,Modern European Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Spa,Flea Market,Diner
73,Central Toronto,0,Yoga Studio,Spa,Fast Food Restaurant,Diner,Mexican Restaurant,Coffee Shop,Clothing Store,Restaurant,Salon / Barbershop,Chinese Restaurant
86,Central Toronto,0,Coffee Shop,American Restaurant,Restaurant,Pub,Sports Bar,Supermarket,Sushi Restaurant,Liquor Store,Fried Chicken Joint,Cuban Restaurant


# Cluster 2

In [119]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,1,Park,Breakfast Spot,Coffee Shop,Spa,Restaurant,Bakery,Pub,Historic Site,Distribution Center,Gym / Fitness Center
25,Downtown Toronto,1,Café,Grocery Store,Candy Store,Italian Restaurant,Diner,Restaurant,Coffee Shop,Cuban Restaurant,Donut Shop,Dance Studio
31,West Toronto,1,Bakery,Bar,Middle Eastern Restaurant,Brewery,Bank,Supermarket,Café,Grocery Store,Music Venue,Dance Studio
42,Downtown Toronto,1,Coffee Shop,Restaurant,Bakery,Gym,Gym / Fitness Center,Tea Room,Café,Pub,Hotel,Wine Bar
43,West Toronto,1,Coffee Shop,Bakery,Café,Breakfast Spot,Italian Restaurant,Pet Store,Bar,Gym,Furniture / Home Store,Donut Shop
48,Downtown Toronto,1,Coffee Shop,Gym,Restaurant,Bakery,Pub,Gym / Fitness Center,Tea Room,Café,Museum,Japanese Restaurant
54,East Toronto,1,Pet Store,Fish Market,Gay Bar,Café,Sandwich Place,Bookstore,Bakery,Ice Cream Shop,Coffee Shop,Flea Market
74,Central Toronto,1,Café,Donut Shop,Indian Restaurant,BBQ Joint,Burger Joint,History Museum,Park,Coffee Shop,Vegetarian / Vegan Restaurant,Cuban Restaurant
75,West Toronto,1,Gift Shop,Restaurant,Coffee Shop,Italian Restaurant,Dog Run,Dessert Shop,Eastern European Restaurant,Movie Theater,Cuban Restaurant,Dance Studio
79,Central Toronto,1,Dessert Shop,Park,Café,Italian Restaurant,Indian Restaurant,Sushi Restaurant,Seafood Restaurant,Pizza Place,Coffee Shop,Cosmetics Shop


# Cluster 3

In [120]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Downtown Toronto,2,Clothing Store,Café,Music Venue,Plaza,Burrito Place,Tea Room,Pizza Place,Thai Restaurant,Theater,Comic Shop
20,Downtown Toronto,2,Park,Farmers Market,Vegetarian / Vegan Restaurant,Cocktail Bar,Museum,Thai Restaurant,Concert Hall,Restaurant,Beer Bar,Liquor Store
30,Downtown Toronto,2,Vegetarian / Vegan Restaurant,Restaurant,Concert Hall,Hotel,Pizza Place,Speakeasy,Gym / Fitness Center,Plaza,Steakhouse,Wine Bar
36,Downtown Toronto,2,Plaza,Performing Arts Venue,Hotel,Lake,Salad Place,Dessert Shop,Sporting Goods Shop,Park,Fish Market,Fish & Chips Shop
61,Central Toronto,2,Park,Photography Studio,Bus Line,Swim School,Dance Studio,Dog Run,Distribution Center,Diner,Dessert Shop,Department Store
67,Central Toronto,2,Hotel,Breakfast Spot,Park,Pizza Place,Gym,Gym / Fitness Center,Food & Drink Shop,Sandwich Place,Department Store,Distribution Center
69,West Toronto,2,Bar,Antique Shop,Gastropub,Mexican Restaurant,Italian Restaurant,Speakeasy,Arts & Crafts Store,Flea Market,Park,Furniture / Home Store
91,Downtown Toronto,2,Park,Trail,Playground,Donut Shop,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant
92,Downtown Toronto,2,Park,Fountain,Cocktail Bar,Restaurant,Concert Hall,Food Truck,Beer Bar,Thai Restaurant,Museum,Vegetarian / Vegan Restaurant
99,Downtown Toronto,2,Bookstore,Bubble Tea Shop,Park,Breakfast Spot,Theme Restaurant,Restaurant,Mexican Restaurant,Dance Studio,Beer Bar,Ramen Restaurant


# Cluster 4

In [124]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,East Toronto,3,Health Food Store,Trail,Asian Restaurant,Pub,Wine Bar,Dog Run,Distribution Center,Diner,Dessert Shop,Department Store
37,West Toronto,3,Wine Bar,Art Gallery,Ice Cream Shop,Korean Restaurant,Cuban Restaurant,New American Restaurant,Pizza Place,Brewery,Asian Restaurant,Cocktail Bar
41,East Toronto,3,Greek Restaurant,Ice Cream Shop,Yoga Studio,Cosmetics Shop,Italian Restaurant,Brewery,Juice Bar,Restaurant,Distribution Center,Dog Run
47,East Toronto,3,Park,Fish & Chips Shop,Italian Restaurant,Pub,Ice Cream Shop,Burrito Place,Fast Food Restaurant,Sushi Restaurant,Gym,Brewery
68,Central Toronto,3,Jewelry Store,Trail,Mexican Restaurant,Sushi Restaurant,Department Store,Dog Run,Distribution Center,Diner,Dessert Shop,Wine Bar
100,East Toronto,3,Skate Park,Auto Workshop,Pizza Place,Burrito Place,Restaurant,Brewery,Farmers Market,Fast Food Restaurant,Comic Shop,Garden Center


# Cluster 5

In [125]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,4,Garden,Pool,Wine Bar,Eastern European Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio
