## 1. Creater Notebook for the project

Notebook "exploring Toronto neighborhood through ML" created

## 2. Build the code to scrape the Wikipedia page for Canada neighbourhood data

In [1]:
#Download and save the web page into local directory
!wget -q -O 'Canada_data' https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [2]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                       

In [3]:
import json # library to handle JSON files

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import pandas as pd

import numpy as np

import requests

# import k-means from clustering stage
from sklearn.cluster import KMeans


In [4]:
#Scrape the Neighborhood table into pandas dataframe

df_list = pd.read_html('Canada_data')
print(df_list[0])    # The table of interest

#Convert list to datrtaframe
df = pd.DataFrame(data=df_list[0])       


    Postal Code           Borough  \
0           M1A      Not assigned   
1           M2A      Not assigned   
2           M3A        North York   
3           M4A        North York   
4           M5A  Downtown Toronto   
5           M6A        North York   
6           M7A  Downtown Toronto   
7           M8A      Not assigned   
8           M9A         Etobicoke   
9           M1B       Scarborough   
10          M2B      Not assigned   
11          M3B        North York   
12          M4B         East York   
13          M5B  Downtown Toronto   
14          M6B        North York   
15          M7B      Not assigned   
16          M8B      Not assigned   
17          M9B         Etobicoke   
18          M1C       Scarborough   
19          M2C      Not assigned   
20          M3C        North York   
21          M4C         East York   
22          M5C  Downtown Toronto   
23          M6C              York   
24          M7C      Not assigned   
25          M8C      Not assigned   
2

## 3. create the dataframe:

#### * The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [5]:
#Rename columns - especially change 'Postal Code' to 'PostalCode'
df.columns = ['PostalCode', 'Borough', 'Neighborhood']

#Store the table into a CSV file
df.to_csv('Canada_data.csv', index=False)

In [6]:
df = pd.read_csv('Canada_data.csv')
df.head(), df.shape

(  PostalCode           Borough               Neighborhood
 0        M1A      Not assigned               Not assigned
 1        M2A      Not assigned               Not assigned
 2        M3A        North York                  Parkwoods
 3        M4A        North York           Victoria Village
 4        M5A  Downtown Toronto  Regent Park, Harbourfront, (180, 3))

#### * Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [7]:
df = df[df['Borough'] != 'Not assigned']
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### * Assumption: If a cell has a borough but a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.

In [8]:
print(df[df['Neighborhood'] == 'Not assigned'])

Empty DataFrame
Columns: [PostalCode, Borough, Neighborhood]
Index: []


Conclusion: There are no cells with "Not assigned" neighbourhood

#### * Combine records of multiple neighborhoods which are in one postal code area into one row with the neighborhoods separated with a comma

In [9]:
# Creating combined dataframe where neighborhoods within the same Boroughs are listed in a single row
df_com = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(list).reset_index()
df_com['Neighborhood'] = df_com['Neighborhood'].str.join(',')
df_com.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### * use the .shape method to print the number of rows of the dataframe

In [10]:
df_com.shape

(103, 3)

### ========================================= END OF FIRST PART - SUBMITTED ON GITHUB =======================================
                                                                            --------------------- 


## 4. Install geocoder and obtain latitude and longitudes of Toronto neighbourhoods

In [10]:
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 6.8MB/s ta 0:00:011
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [11]:
import geocoder # import geocoder

coord = pd.DataFrame(columns=['Latitude', 'Longitude'])
postal_code = df_com['PostalCode']

for i in range(len(postal_code)-1):
    
    # initialize your variable to None
    lat_lng_coords = None
    
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Parkwoods'.format(postal_code[i]))
        lat_lng_coords = g.latlng
        
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
    coord = coord.append({'Latitude': latitude, 'Longitude': longitude}, ignore_inex=True)
    i+=1

KeyboardInterrupt: 

#### geocoder did not respond. As a result, did not receive Latitude Longitude values. Had to interrupt the process

In [12]:
#Obtained lat, long info using the file mentioned in the Assignment
!wget -q -O "lat_long.csv" http://cocl.us/Geospatial_data

In [13]:
lat_long = pd.read_csv('lat_long.csv')
lat_long.columns = ['PostalCode', 'Latitude', 'Longitude']
lat_long.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
lat_long.shape

(103, 3)

In [15]:
# Created one database having neighborhood data and lat long 
df2 = pd.merge(df_com, lat_long, how='inner', on='PostalCode')
df2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
#toronto_neigh = df2[df2['Borough'].str.contains('Toronto')]
toronto_neigh = df2
toronto_neigh.reset_index(drop=True, inplace=True)
toronto_neigh.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [17]:
CLIENT_ID = 'XETBJTAFF0JQNLQ55QGHFW2EFGRSQSQQQG5Z4MTCKPFDRSAC' # my Foursquare ID
CLIENT_SECRET = 'FJY50XE0LAR1L4C5LW4BIWY0HPG3QMP5GZLZ2ZM3Q3SRUBXZ' # my Foursquare Secret
VERSION = '20180604' # Foursquare API version


print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: XETBJTAFF0JQNLQ55QGHFW2EFGRSQSQQQG5Z4MTCKPFDRSAC
CLIENT_SECRET:FJY50XE0LAR1L4C5LW4BIWY0HPG3QMP5GZLZ2ZM3Q3SRUBXZ


In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
import requests

LIMIT = 100
RADIUS = 500

venues_list=[]
    
for i in range(toronto_neigh.shape[0]-1):
    name=toronto_neigh.loc[i, 'Neighborhood']
    neigh_lat = toronto_neigh.loc[i, 'Latitude']
    neigh_long = toronto_neigh.loc[i, 'Longitude']

    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neigh_lat, neigh_long, VERSION, RADIUS, LIMIT)

    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    venues_list.append([(
            name, 
            neigh_lat, 
            neigh_long, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    toronto_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    
    toronto_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

toronto_venues.head(), toronto_venues.shape


(                             Neighborhood  Neighborhood Latitude  \
 0                          Malvern, Rouge              43.806686   
 1  Rouge Hill, Port Union, Highland Creek              43.784535   
 2       Guildwood, Morningside, West Hill              43.763573   
 3       Guildwood, Morningside, West Hill              43.763573   
 4       Guildwood, Morningside, West Hill              43.763573   
 
    Neighborhood Longitude                  Venue  Venue Latitude  \
 0              -79.194353                Wendy’s       43.807448   
 1              -79.160497  Royal Canadian Legion       43.782533   
 2              -79.188711         RBC Royal Bank       43.766790   
 3              -79.188711      G & G Electronics       43.765309   
 4              -79.188711             Sail Sushi       43.765951   
 
    Venue Longitude        Venue Category  
 0       -79.199056  Fast Food Restaurant  
 1       -79.163085                   Bar  
 2       -79.191151                 

## 5. Analyze Each Neighborhood

In [20]:
# Figured out that 'Neighborhood' is listed as a Venue catogory for few neighborhoods. These rows need to be deleted.
ven_cat_nei = toronto_venues[toronto_venues['Venue Category'] == 'Neighborhood']
ven_cat_nei

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
308,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
438,Studio District,43.659526,-79.340923,Leslieville,43.66207,-79.337856,Neighborhood
1031,"Richmond, Adelaide, King",43.650571,-79.384568,Downtown Toronto,43.653232,-79.385296,Neighborhood
1118,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood


In [21]:
toronto_venues.drop(toronto_venues[toronto_venues['Venue Category']=='Neighborhood'].index, inplace=True)
toronto_venues.shape


(2125, 7)

In [22]:
# Create table consisting of columns of venue categories and rows of neighborhoods 
toronto_ven_cat = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_ven_cat.head()

# add neighborhood column back to dataframe
toronto_ven_cat['Neighborhood'] = toronto_venues['Neighborhood'] 
toronto_ven_cat.head()

# move neighborhood column to the first column
fixed_columns = [toronto_ven_cat.columns[-1]] + list(toronto_ven_cat.columns[:-1])
toronto_ven_cat = toronto_ven_cat[fixed_columns]

toronto_ven_cat.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Grouping rows by neighborhood and taking the mean of the frequency of occurrence of each category

In [23]:
toronto_ven_grp = toronto_ven_cat.groupby('Neighborhood').mean().reset_index()
toronto_ven_grp

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
1,"Alderwood, Long Branch",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
3,Bayview Village,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.043478,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
5,Berczy Park,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.017241,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
8,"Business reply mail Processing Centre, South C...",0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.066667
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.000000,0.066667,0.066667,0.133333,0.133333,0.133333,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000


In [24]:
toronto_ven_grp.shape

(94, 269)

#### Print each neighborhood along with the top 10 most common venues

In [25]:
top_venues_cnt=10
col=['Neighborhood', '1st Most Common Venue', '2nd Most Common Venue', '3rd Most Common Venue', '4th Most Common Venue', '5th Most Common Venue', '6th Most Common Venue', '7th Most Common Venue', '8th Most Common Venue', '9th Most Common Venue', '10th Most Common Venue']
top_venues=pd.DataFrame(columns=col)
top_venues['Neighborhood'] = toronto_ven_grp['Neighborhood']

for i in range(toronto_ven_grp.shape[0]):
    temp = toronto_ven_grp[toronto_ven_grp['Neighborhood']==toronto_ven_grp['Neighborhood'][i]].T.reset_index()
    temp = temp.iloc[1:,:]
    temp.columns=['venue', 'freq']
    temp['freq']=temp['freq'].astype('float')
    temp.sort_values('freq', ascending=False, inplace=True, na_position='last')
    temp = temp.reset_index(drop=True).head(top_venues_cnt)
    for j in range(10):
        top_venues.iloc[i, j+1] = temp.iloc[j, 0]

top_venues


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Breakfast Spot,Skating Rink,Lounge,Clothing Store,Accessories Store,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
1,"Alderwood, Long Branch",Pizza Place,Pharmacy,Skating Rink,Sandwich Place,Dance Studio,Pub,Coffee Shop,Gym,Mexican Restaurant,Middle Eastern Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pharmacy,Shopping Mall,Middle Eastern Restaurant,Mobile Phone Shop,Fried Chicken Joint,Frozen Yogurt Shop,Sandwich Place,Supermarket
3,Bayview Village,Japanese Restaurant,Bank,Chinese Restaurant,Café,Music Venue,Movie Theater,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
4,"Bedford Park, Lawrence Manor East",Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Greek Restaurant,Sushi Restaurant,Juice Bar,Liquor Store,Pub,Thai Restaurant
5,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Bakery,Café,Beer Bar,Seafood Restaurant,Cheese Shop,Shopping Mall,Steakhouse
6,"Birch Cliff, Cliffside West",Café,College Stadium,General Entertainment,Skating Rink,Accessories Store,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
7,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Bakery,Intersection,Nightclub,Bar,Climbing Gym,Restaurant,Stadium
8,"Business reply mail Processing Centre, South C...",Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Light Rail Station,Fast Food Restaurant,Farmers Market,Park,Pizza Place
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Rental Car Location,Airport,Airport Food Court,Bar,Plane,Harbor / Marina,Sculpture Garden


## 6. Cluster Neighborhoods

In [26]:
# set number of clusters
kclusters = 5

toronto_grp_cluster = toronto_ven_grp.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grp_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0,
       0, 0, 3, 0, 0, 0, 1, 0, 3, 0, 3, 0, 0, 2, 0, 0, 3, 0, 0, 0, 3, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 3,
       0, 0, 0, 0, 0, 3], dtype=int32)

### Generate a database of borough + neighborhood + lat/long + cluster_id + top 10 venues

In [27]:
# add clustering labels
top_venues.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_neigh

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(top_venues.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0.0,Fast Food Restaurant,Accessories Store,Movie Theater,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0.0,Bar,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant,Market
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Electronics Store,Rental Car Location,Medical Center,Intersection,Restaurant,Bank,Mexican Restaurant,Breakfast Spot,Middle Eastern Restaurant,Miscellaneous Shop
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Pharmacy,Korean Restaurant,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Bank,Bakery,Caribbean Restaurant,Gas Station,Thai Restaurant,Hakka Restaurant,Fried Chicken Joint,Athletics & Sports,Molecular Gastronomy Restaurant,Motel


In [28]:
# Don't need the postal codes
toronto_merged.drop('PostalCode', axis=1, inplace=True)
toronto_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0.0,Fast Food Restaurant,Accessories Store,Movie Theater,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
1,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0.0,Bar,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant,Market
2,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Electronics Store,Rental Car Location,Medical Center,Intersection,Restaurant,Bank,Mexican Restaurant,Breakfast Spot,Middle Eastern Restaurant,Miscellaneous Shop
3,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Pharmacy,Korean Restaurant,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
4,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Bank,Bakery,Caribbean Restaurant,Gas Station,Thai Restaurant,Hakka Restaurant,Fried Chicken Joint,Athletics & Sports,Molecular Gastronomy Restaurant,Motel
5,Scarborough,Scarborough Village,43.744734,-79.239476,1.0,Playground,Accessories Store,Middle Eastern Restaurant,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant
6,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,0.0,Coffee Shop,Hobby Shop,Chinese Restaurant,Department Store,Bus Station,Modern European Restaurant,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
7,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,0.0,Bakery,Bus Line,Park,Metro Station,Ice Cream Shop,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
8,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476,0.0,Motel,American Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Accessories Store
9,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,0.0,Café,College Stadium,General Entertainment,Skating Rink,Accessories Store,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant


In [29]:
# Drop Neighborhood Business reply mail Processing Centre" since the clustering database did not include this neighborhood - the venue and cluster values are NaN
toronto_merged.dropna(inplace=True)
toronto_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0.0,Fast Food Restaurant,Accessories Store,Movie Theater,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
1,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0.0,Bar,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant,Market
2,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Electronics Store,Rental Car Location,Medical Center,Intersection,Restaurant,Bank,Mexican Restaurant,Breakfast Spot,Middle Eastern Restaurant,Miscellaneous Shop
3,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Pharmacy,Korean Restaurant,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
4,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Bank,Bakery,Caribbean Restaurant,Gas Station,Thai Restaurant,Hakka Restaurant,Fried Chicken Joint,Athletics & Sports,Molecular Gastronomy Restaurant,Motel
5,Scarborough,Scarborough Village,43.744734,-79.239476,1.0,Playground,Accessories Store,Middle Eastern Restaurant,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant
6,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029,0.0,Coffee Shop,Hobby Shop,Chinese Restaurant,Department Store,Bus Station,Modern European Restaurant,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
7,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577,0.0,Bakery,Bus Line,Park,Metro Station,Ice Cream Shop,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
8,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476,0.0,Motel,American Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Accessories Store
9,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,0.0,Café,College Stadium,General Entertainment,Skating Rink,Accessories Store,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant


In [30]:
toronto_merged.shape

(98, 15)

In [31]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


In [32]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


In [33]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


In [34]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_neigh['Latitude'], toronto_neigh['Longitude'], toronto_neigh['Borough'], toronto_neigh['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [35]:
toronto_merged['Cluster Labels']=toronto_merged['Cluster Labels'].astype('int32')

In [36]:
# create map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining clusters

#### Cluster 1

In [42]:
cluster_1=toronto_merged.loc[toronto_merged['Cluster Labels']==0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster_1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Malvern, Rouge",Fast Food Restaurant,Accessories Store,Movie Theater,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
1,"Rouge Hill, Port Union, Highland Creek",Bar,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant,Market
2,"Guildwood, Morningside, West Hill",Electronics Store,Rental Car Location,Medical Center,Intersection,Restaurant,Bank,Mexican Restaurant,Breakfast Spot,Middle Eastern Restaurant,Miscellaneous Shop
3,Woburn,Coffee Shop,Pharmacy,Korean Restaurant,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
4,Cedarbrae,Bank,Bakery,Caribbean Restaurant,Gas Station,Thai Restaurant,Hakka Restaurant,Fried Chicken Joint,Athletics & Sports,Molecular Gastronomy Restaurant,Motel
6,"Kennedy Park, Ionview, East Birchmount Park",Coffee Shop,Hobby Shop,Chinese Restaurant,Department Store,Bus Station,Modern European Restaurant,Motel,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant
7,"Golden Mile, Clairlea, Oakridge",Bakery,Bus Line,Park,Metro Station,Ice Cream Shop,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
8,"Cliffside, Cliffcrest, Scarborough Village West",Motel,American Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Accessories Store
9,"Birch Cliff, Cliffside West",Café,College Stadium,General Entertainment,Skating Rink,Accessories Store,Mobile Phone Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant
10,"Dorset Park, Wexford Heights, Scarborough Town...",Indian Restaurant,Light Rail Station,Pet Store,Gaming Cafe,Chinese Restaurant,Vietnamese Restaurant,Miscellaneous Shop,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant


#### Cluster 2

In [43]:
cluster_2=toronto_merged.loc[toronto_merged['Cluster Labels']==1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster_2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough Village,Playground,Accessories Store,Middle Eastern Restaurant,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant
14,"Milliken, Agincourt North, Steeles East, L'Amo...",Playground,Park,Accessories Store,Middle Eastern Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Metro Station


#### Cluster 3

In [44]:
cluster_3=toronto_merged.loc[toronto_merged['Cluster Labels']==2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster_3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,"Old Mill South, King's Mill Park, Sunnylea, Hu...",Baseball Field,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant,Market
97,"Humberlea, Emery",Food Service,Baseball Field,Accessories Store,Middle Eastern Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant


#### Cluster 4

In [45]:
cluster_4=toronto_merged.loc[toronto_merged['Cluster Labels']==3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster_4

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,York Mills West,Convenience Store,Park,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Mexican Restaurant
25,Parkwoods,Food & Drink Shop,Park,Accessories Store,Middle Eastern Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Metro Station
40,"East Toronto, Broadview North (Old East York)",Convenience Store,Park,Metro Station,Accessories Store,Miscellaneous Shop,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop
44,Lawrence Park,Photography Studio,Park,Swim School,Bus Line,Miscellaneous Shop,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant
48,"Moore Park, Summerhill East",Tennis Court,Park,Mexican Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station
50,Rosedale,Park,Playground,Trail,Accessories Store,Middle Eastern Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop
64,"Forest Hill North & West, Forest Hill Road Park",Jewelry Store,Park,Sushi Restaurant,Trail,Accessories Store,Miscellaneous Shop,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Middle Eastern Restaurant
74,Caledonia-Fairbanks,Park,Women's Store,Pool,Accessories Store,Middle Eastern Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop
79,"North Park, Maple Leaf Park, Upwood Park",Bakery,Trail,Construction & Landscaping,Park,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop
90,"The Kingsway, Montgomery Road, Old Mill North",River,Park,Accessories Store,Middle Eastern Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant


#### Cluster 5

In [46]:
cluster_5=toronto_merged.loc[toronto_merged['Cluster Labels']==4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster_5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,"West Deane Park, Princess Gardens, Martin Grov...",Golf Course,Accessories Store,Moroccan Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Movie Theater


In [51]:
cluster_5['1st Most Common Venue'].unique()

array(['Golf Course'], dtype=object)

#### List of databases created for this project --  for my reference

In [None]:
df
df_com  
lat_long
df2
toronto_neigh
toronto_venues
toronto_ven_cat
toronto_ven_grp
top_venues
toronto_grp_cluster
toronto_merged