This project covers how to convert addresses into their
equivalent latitude and longitude values. Also, It uses the
Foursquare API to explore neighborhoods in Toronto. You uses
the **explore** function to get the most common venue categories in each
neighborhood, and then use this feature to group the neighborhoods into
clusters. The *k*-means clustering algorithm is used to complete
this task. Finally, the Folium library is used to visualize the
neighborhoods in Toronto and their emerging clusters

### 1 Import and download the necesary libraries

In [1]:
import numpy as np       # library to handle data in a vectorized manner
import pandas as pd      # library for data analsysis
import json              # library to handle JSON files
!conda install -c conda-forge geopy --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1f             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1

In [2]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_row', None)

In [4]:
from geopy.geocoders import Nominatim     # convert an address into latitude and longitude values
import requests             # library to handle requests            
from pandas.io.json  import json_normalize     # tranform JSON file into a pandas dataframe


In [5]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print ('Library imported')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         713 KB

The following NEW packages will be INSTALLED:

    altair:  4.1.0-py_1 conda-forge
    branca:  0.4.0-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
altair-4.1.0         | 614 KB    | #####

###   2.  Create a dataframe with info regarding Postal Codes in Toronto from an Internet Page

In [6]:
# read the file into a pandas dataframe
import requests
d = pd.read_html('http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df = d[0]
print('dataframe dimensions are: ',df.shape)
df[0:5]


dataframe dimensions are:  (180, 3)


Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [7]:
# select only the data where the Borough is assigned

subDF1 = df[df['Borough'] != 'Not assigned']                      
print('dataframe dimensions are: ',subDF1.shape)
subDF1[0:5]
#subDF1

dataframe dimensions are:  (103, 3)


Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [8]:
# select from the dataframe only the requested values of Postal code

exposedPostallist =['M5G', 'M2H', 'M4B', 'M1J', 'M4G', 'M4M', 'M1R', 'M9V', 'M9L', 'M5V', 'M1B', 'M5A']
sub = subDF1[subDF1['Postal code'].isin(exposedPostallist)]

#create a new index list with the new position of the postal code
newIndex = []
oldIndex = list(sub.index)
for i in range(0, len(exposedPostallist)):
    for j in range(0, len(oldIndex)):
        if exposedPostallist[i] == sub.iloc[j,0]:
            newIndex.append(j)
            
# get the sort index            
sort_index = []         
for i in range(0, sub.shape[0]):
    sort_index.append(oldIndex[newIndex[i]])
    
# change the order of dataframe using sort_index    
sub = pd.DataFrame(sub, index = sort_index)
sub.reset_index(drop=True, inplace=True)
                          
print('sort_index= ',sort_index)                          
sub

sort_index=  [40, 46, 12, 54, 39, 84, 108, 143, 80, 139, 9, 4]


Unnamed: 0,Postal code,Borough,Neighborhood
0,M5G,Downtown Toronto,Central Bay Street
1,M2H,North York,Hillcrest Village
2,M4B,East York,Parkview Hill / Woodbine Gardens
3,M1J,Scarborough,Scarborough Village
4,M4G,East York,Leaside
5,M4M,East Toronto,Studio District
6,M1R,Scarborough,Wexford / Maryvale
7,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...
8,M9L,North York,Humber Summit
9,M5V,Downtown Toronto,CN Tower / King and Spadina / Railway Lands / ...


In [9]:
# create a function which receive a string and change the character '/' with  ',' as requested
def changeit(astr, chold , chnew):
    chold = '/'  
    chnew = ','
    wstr = astr
    need_more = True
    while (need_more == True):
        for i in range(0, len(wstr)):
            if (wstr[i] == '/'):
                need_more = True
                wstr = wstr[:(i)] + ',' + wstr[(i+1):] 
            else:    
                need_more = False   
    return wstr   

# change the Neighborhood column 
wlst = list(sub['Neighborhood'])

for i in range (0,len(wlst)):
    res = ''
    res = changeit(wlst[i],'/',',')
    wlst[i] = res
    
sub['Neighborhood'] = wlst                          
sub    


Unnamed: 0,Postal code,Borough,Neighborhood
0,M5G,Downtown Toronto,Central Bay Street
1,M2H,North York,Hillcrest Village
2,M4B,East York,"Parkview Hill , Woodbine Gardens"
3,M1J,Scarborough,Scarborough Village
4,M4G,East York,Leaside
5,M4M,East Toronto,Studio District
6,M1R,Scarborough,"Wexford , Maryvale"
7,M9V,Etobicoke,"South Steeles , Silverstone , Humbergate , Jam..."
8,M9L,North York,Humber Summit
9,M5V,Downtown Toronto,"CN Tower , King and Spadina , Railway Lands , ..."


#### This  is the 1st dataframe in the form asked by the assignment project

In [10]:
# read the file containing location information (latitude and longitude)

df1 = pd.read_csv('http://cocl.us/Geospatial_data')    
print('dataframe dimensions : ', df1.shape)
df1.head()

dataframe dimensions :  (103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
# extract from dataframe omly requested postal codes

exposedPostallist =['M5G', 'M2H', 'M4B', 'M1J', 'M4G', 'M4M', 'M1R', 'M9V', 'M9L', 'M5V', 'M1B', 'M5A']
sub1 = df1[df1['Postal Code'].isin(exposedPostallist)]
sub1


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
5,M1J,43.744734,-79.239476
11,M1R,43.750072,-79.295849
17,M2H,43.803762,-79.363452
35,M4B,43.706397,-79.309937
38,M4G,43.70906,-79.363452
43,M4M,43.659526,-79.340923
53,M5A,43.65426,-79.360636
57,M5G,43.657952,-79.387383
68,M5V,43.628947,-79.39442


In [12]:
# create a new list of index, so the order of the data will be the imposed one
newIndex = []
oldIndex = list(sub1.index)
for i in range(0, len(exposedPostallist)):
    for j in range(0, len(oldIndex)):
        if exposedPostallist[i] == sub1.iloc[j,0]:
            newIndex.append(j)
            
# change the Neighborhood column             
sort_index = []         
for i in range(0, sub1.shape[0]):
    sort_index.append(oldIndex[newIndex[i]])
sub1 = pd.DataFrame(sub1, index = sort_index)
sub1.reset_index(drop=True, inplace=True)
                          
print('sort_index= ',sort_index)                          
sub1

sort_index=  [57, 17, 35, 5, 38, 43, 11, 101, 96, 68, 0, 53]


Unnamed: 0,Postal Code,Latitude,Longitude
0,M5G,43.657952,-79.387383
1,M2H,43.803762,-79.363452
2,M4B,43.706397,-79.309937
3,M1J,43.744734,-79.239476
4,M4G,43.70906,-79.363452
5,M4M,43.659526,-79.340923
6,M1R,43.750072,-79.295849
7,M9V,43.739416,-79.588437
8,M9L,43.756303,-79.565963
9,M5V,43.628947,-79.39442


In [13]:
# add the Latitude and Longitude columns to the initial dataset
sub['Latitude'] = sub1['Latitude'].values
sub['Longitude'] = sub1['Longitude'].values

# change the name of dataframe to be more expresive
toronto_data = sub
toronto_data

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
1,M2H,North York,Hillcrest Village,43.803762,-79.363452
2,M4B,East York,"Parkview Hill , Woodbine Gardens",43.706397,-79.309937
3,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
4,M4G,East York,Leaside,43.70906,-79.363452
5,M4M,East Toronto,Studio District,43.659526,-79.340923
6,M1R,Scarborough,"Wexford , Maryvale",43.750072,-79.295849
7,M9V,Etobicoke,"South Steeles , Silverstone , Humbergate , Jam...",43.739416,-79.588437
8,M9L,North York,Humber Summit,43.756303,-79.565963
9,M5V,Downtown Toronto,"CN Tower , King and Spadina , Railway Lands , ...",43.628947,-79.39442


## This is the dataframe in the last form as requested by the assignment project

##  3.      Create a venues dataframe and visualize neighborhoods in Toronto 

In [14]:
# Use Nomonatim to get the latitude and longitude of Toronto Canada

address = 'Toronto, Canada'
geolocator = Nominatim(user_agent = 'toronto_explorer')
location = geolocator.geocode(address)
#geolocator = Nominatim(user_agent = 'toronto_explorer')
latitude = location.latitude
longitude = location.longitude
print('The geografical coordinate of Toronto are {}, {}'.format(latitude, longitude)) 

The geografical coordinate of Toronto are 43.6534817, -79.3839347


In [15]:
# Use folium to generate a map of Toronto, Canada with postal codes included

map_Toronto = folium.Map(location =[latitude,longitude], zoom_start =10)

for lat, lng, borough, postal_code  in zip( sub['Latitude'], sub['Longitude'], sub['Borough'], sub['Postal code']):
    label = '{}, {}'.format(postal_code, borough) 
    label = folium.Popup(label, parse_html= True)
    folium.CircleMarker([lat, lng], radius = 5, popup = label, color = 'blue', fill= True, fill_color= '#3186cc', fill_opacity= 0.7, parse_htlm= False).add_to(map_Toronto)
map_Toronto


###                                                                                                 The position of the selected postal codes on the Toronto map

In [44]:
# Create the client credentials


#print('Your Credentials: ') 
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET: ' + CLIENT_SECRET)
#print('VERSION: '  + VERSION)
#---------------------------------------------


In [45]:
# create url info for an explore request around Toronto---using foursquare

radius = 500
LIMIT = 50
url ='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        latitude,
        longitude,
        radius,
        LIMIT)

In [46]:
#Send the GET request and examine the resutls

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e912165b4b684001b423b7e'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 90,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng'

In [47]:
# function that extracts the category of the venue
#    def get_category_type(row):\


def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        
        categories_list = row['venue.categories']
        return None
    else:
        return categories_list
    
    if len(categories_list) == 0:
        return None
    else:
        return categories_list [0]['name']
    
    

In [48]:
venues = results['response']['groups'][0]['items']
venues

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '5227bb01498e17bf485e6202',
   'name': 'Downtown Toronto',
   'location': {'lat': 43.65323167517444,
    'lng': -79.38529600606677,
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.65323167517444,
      'lng': -79.38529600606677}],
    'distance': 113,
    'cc': 'CA',
    'city': 'Toronto',
    'state': 'ON',
    'country': 'Canada',
    'formattedAddress': ['Toronto ON', 'Canada']},
   'categories': [{'id': '4f2a25ac4b909258e854f55f',
     'name': 'Neighborhood',
     'pluralName': 'Neighborhoods',
     'shortName': 'Neighborhood',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/neighborhood_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'groups': []}},
  'referralId': 'e-0-5227bb01498e17bf485e6202-0'},
 {'reasons': {'count': 0,
   'items': [{'summar

In [49]:
#Now we are ready to clean the json structure it 
#into a *pandas* dataframe.

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
    
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
    
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head(10)


Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,,43.653232,-79.385296
1,Nathan Phillips Square,,43.65227,-79.383516
2,Eggspectation Bell Trinity Square,,43.653144,-79.38198
3,Indigo,,43.653515,-79.380696
4,LUSH,,43.653557,-79.3804
5,Poke Guys,,43.654895,-79.385052
6,Chatime 日出茶太,,43.655542,-79.384684
7,CF Toronto Eaton Centre,,43.65454,-79.380677
8,JOEY Eaton Centre,,43.655404,-79.381929
9,Noodle King,,43.651706,-79.383046


In [50]:
#The no of venues were returned by Foursquare?
 print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))



50 venues were returned by Foursquare.


### 4. Explore Neighborhoods in Toronto


Let's create a function to repeat the same process to all the neighborhoods
and postal codes in Toronto


In [52]:
def getNearbyVenues(names, latitudes, longitudes):
      
    LIMIT = 100
    radius = 500
    
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
    # create the API request URL
        url ='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION, 
            lat,
            lng,
            radius,
            LIMIT)
    
    # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

    #return only relevant information for each nearby venue 
        venues_list.append([(name,lat,lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'], 
            v['venue']['categories'][0]['name'])    for v in results]) 
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) 
    nearby_venues.columns = ['Postal code', 'Neighborhood Latitude', 'Neighborhood Longitude',
                    'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(nearby_venues)


In [53]:
#Now write the code to run the above function on the requested postal codes in Toronto, Canada.

toronto_venues = getNearbyVenues(names = toronto_data['Postal code'],
                                  latitudes = toronto_data['Latitude'],
                                  longitudes= toronto_data['Longitude']
                                      )


M5G
M2H
M4B
M1J
M4G
M4M
M1R
M9V
M9L
M5V
M1B
M5A


In [29]:
# Find out the dimension of the venues data frame
print(toronto_venues.shape)
toronto_venues.head(100)



(251, 7)


Unnamed: 0,Postal code,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5G,43.657952,-79.387383,Jimmy's Coffee,43.658421,-79.385613,Coffee Shop
1,M5G,43.657952,-79.387383,Tim Hortons,43.65857,-79.385123,Coffee Shop
2,M5G,43.657952,-79.387383,Neo Coffee Bar,43.66014,-79.38587,Coffee Shop
3,M5G,43.657952,-79.387383,Hailed Coffee,43.658833,-79.383684,Coffee Shop
4,M5G,43.657952,-79.387383,The Queen and Beaver Public House,43.657472,-79.383524,Gastropub
5,M5G,43.657952,-79.387383,The Elm Tree Restaurant,43.657397,-79.383761,Modern European Restaurant
6,M5G,43.657952,-79.387383,Mercatto,43.660391,-79.387664,Italian Restaurant
7,M5G,43.657952,-79.387383,College Park Area,43.659453,-79.383785,Park
8,M5G,43.657952,-79.387383,KAKA,43.657457,-79.384192,Japanese Restaurant
9,M5G,43.657952,-79.387383,Chatime 日出茶太,43.655542,-79.384684,Bubble Tea Shop


In [54]:
#check how many venues were returned for each requested Postal code in descending order
toronto_venues['Postal code'].value_counts()


M5G    73
M5A    45
M4M    40
M4G    33
M5V    16
M4B    11
M9V     9
M1R     7
M2H     6
M1B     2
M9L     2
M1J     1
Name: Postal code, dtype: int64

In [55]:
#check how many venues were returned for each requested Postal code
toronto_venues.groupby('Postal code').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,2,2,2,2,2,2
M1J,1,1,1,1,1,1
M1R,7,7,7,7,7,7
M2H,6,6,6,6,6,6
M4B,11,11,11,11,11,11
M4G,33,33,33,33,33,33
M4M,40,40,40,40,40,40
M5A,45,45,45,45,45,45
M5G,73,73,73,73,73,73
M5V,16,16,16,16,16,16


In [56]:
#find out how many unique categories can be curated from all the returned venues

print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))


There are 113 uniques categories.


### 4. Analyze Each Neighborhood##


In [57]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
    
# add Postal code column back to dataframe
toronto_onehot['Postal code'] = toronto_venues['Postal code'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
    
toronto_onehot.head(115)

Unnamed: 0,Postal code,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Asian Restaurant,Athletics & Sports,Auto Garage,Bagel Shop,Bakery,Bank,Bar,Beer Store,Bike Shop,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Line,Café,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,Gay Bar,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Historic Site,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Neighborhood,Office,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Poke Place,Pool,Pub,Rental Car Location,Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Spa,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,M5G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [58]:
# the dimensions of toronto_onehot dataframe
toronto_onehot.shape

(245, 114)

In [59]:
# Next, group rows by Postal code and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Postal code',sort=False ).mean().reset_index()
toronto_grouped


Unnamed: 0,Postal code,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Asian Restaurant,Athletics & Sports,Auto Garage,Bagel Shop,Bakery,Bank,Bar,Beer Store,Bike Shop,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Line,Café,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,Gay Bar,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Historic Site,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Latin American Restaurant,Liquor Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Neighborhood,Office,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Poke Place,Pool,Pub,Rental Car Location,Restaurant,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Spa,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M5G,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027397,0.027397,0.0,0.054795,0.0,0.013699,0.0,0.0,0.164384,0.0,0.013699,0.0,0.0,0.0,0.0,0.013699,0.013699,0.013699,0.013699,0.0,0.0,0.013699,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.013699,0.013699,0.013699,0.013699,0.0,0.0,0.0,0.013699,0.027397,0.0,0.0,0.0,0.013699,0.027397,0.013699,0.0,0.054795,0.041096,0.0,0.013699,0.0,0.0,0.0,0.0,0.041096,0.013699,0.013699,0.0,0.013699,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0,0.027397,0.041096,0.0,0.013699,0.0,0.0,0.027397,0.0,0.0,0.0,0.013699,0.0,0.013699,0.013699,0.027397,0.0,0.013699,0.013699,0.0,0.013699
1,M2H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.060606,0.0,0.030303,0.030303,0.0,0.0,0.0,0.030303,0.030303,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.090909,0.030303,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.025,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.05,0.0,0.0,0.0,0.1,0.025,0.0,0.0,0.025,0.075,0.025,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.05,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.025
6,M1R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M9L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M5V,0.0625,0.0625,0.0625,0.125,0.125,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [60]:
#confirm the new size
toronto_grouped.shape


(12, 114)

In [61]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5
    
for hood in toronto_grouped['Postal code']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Postal code'] == hood].T.reset_index()   # trebuie sa hotaresc ce coloana folosesc
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----M5G----
                       venue  freq
0                Coffee Shop  0.16
1         Italian Restaurant  0.05
2                       Café  0.05
3  Middle Eastern Restaurant  0.04
4             Sandwich Place  0.04


----M2H----
                      venue  freq
0               Golf Course  0.17
1                      Pool  0.17
2        Athletics & Sports  0.17
3      Fast Food Restaurant  0.17
4  Mediterranean Restaurant  0.17


----M4B----
                  venue  freq
0           Pizza Place  0.18
1  Gym / Fitness Center  0.09
2    Athletics & Sports  0.09
3             Pet Store  0.09
4              Bus Line  0.09


----M1J----
                   venue  freq
0             Playground   1.0
1                Airport   0.0
2               Pharmacy   0.0
3  Performing Arts Venue   0.0
4                   Park   0.0


----M4G----
                    venue  freq
0     Sporting Goods Shop  0.09
1             Coffee Shop  0.09
2            Burger Joint  0.06
3                    Ban

 put data into a *pandas* dataframe


In [62]:
#First,write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]


Now create the new dataframe and display the top 10 venues for
each requested postal code.


In [63]:
num_top_venues = 10
    
indicators = ['st', 'nd', 'rd']
    
# create columns according to number of top venues
columns = ['Postal code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
    
# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Postal code'] = toronto_grouped['Postal code']
    
for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)
    
toronto_venues_sorted.head(15)





Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5G,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Sandwich Place,Middle Eastern Restaurant,Thai Restaurant,Burger Joint,Bubble Tea Shop,Spa
1,M2H,Athletics & Sports,Golf Course,Dog Run,Fast Food Restaurant,Mediterranean Restaurant,Pool,Yoga Studio,Department Store,Coworking Space,Cosmetics Shop
2,M4B,Pizza Place,Pet Store,Gym / Fitness Center,Pharmacy,Bus Line,Bank,Athletics & Sports,Intersection,Gastropub,Fast Food Restaurant
3,M1J,Playground,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop
4,M4G,Coffee Shop,Sporting Goods Shop,Burger Joint,Furniture / Home Store,Bank,Bagel Shop,Mexican Restaurant,Juice Bar,Brewery,Pet Store
5,M4M,Café,Coffee Shop,American Restaurant,Bakery,Brewery,Gastropub,Yoga Studio,Diner,Neighborhood,Middle Eastern Restaurant
6,M1R,Middle Eastern Restaurant,Breakfast Spot,Auto Garage,Shopping Mall,Bakery,Sandwich Place,Convenience Store,Cosmetics Shop,Discount Store,Coworking Space
7,M9V,Grocery Store,Beer Store,Liquor Store,Pizza Place,Pharmacy,Sandwich Place,Fast Food Restaurant,Fried Chicken Joint,Dessert Shop,Comic Shop
8,M9L,Pizza Place,Empanada Restaurant,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store
9,M5V,Airport Lounge,Airport Service,Airport,Harbor / Marina,Plane,Boutique,Rental Car Location,Boat or Ferry,Bar,Sculpture Garden


###  5. Cluster Neighborhoods


Run *k*-means to cluster the neighborhood into 5 clusters.


In [64]:
# set number of clusters
kclusters = 5
    
toronto_grouped_clustering = toronto_grouped.drop('Postal code', 1)
    
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
    
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([0, 0, 0, 3, 0, 0, 2, 0, 1, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the
top 10 venues for each neighborhood.


In [65]:
# add clustering labels
toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
    
toronto_merged = toronto_data
    
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_venues_sorted.set_index('Postal code'), on='Postal code')
    
toronto_merged.head(20) # check the last columns!



Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Sandwich Place,Middle Eastern Restaurant,Thai Restaurant,Burger Joint,Bubble Tea Shop,Spa
1,M2H,North York,Hillcrest Village,43.803762,-79.363452,0,Athletics & Sports,Golf Course,Dog Run,Fast Food Restaurant,Mediterranean Restaurant,Pool,Yoga Studio,Department Store,Coworking Space,Cosmetics Shop
2,M4B,East York,"Parkview Hill , Woodbine Gardens",43.706397,-79.309937,0,Pizza Place,Pet Store,Gym / Fitness Center,Pharmacy,Bus Line,Bank,Athletics & Sports,Intersection,Gastropub,Fast Food Restaurant
3,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,3,Playground,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop
4,M4G,East York,Leaside,43.70906,-79.363452,0,Coffee Shop,Sporting Goods Shop,Burger Joint,Furniture / Home Store,Bank,Bagel Shop,Mexican Restaurant,Juice Bar,Brewery,Pet Store
5,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,American Restaurant,Bakery,Brewery,Gastropub,Yoga Studio,Diner,Neighborhood,Middle Eastern Restaurant
6,M1R,Scarborough,"Wexford , Maryvale",43.750072,-79.295849,2,Middle Eastern Restaurant,Breakfast Spot,Auto Garage,Shopping Mall,Bakery,Sandwich Place,Convenience Store,Cosmetics Shop,Discount Store,Coworking Space
7,M9V,Etobicoke,"South Steeles , Silverstone , Humbergate , Jam...",43.739416,-79.588437,0,Grocery Store,Beer Store,Liquor Store,Pizza Place,Pharmacy,Sandwich Place,Fast Food Restaurant,Fried Chicken Joint,Dessert Shop,Comic Shop
8,M9L,North York,Humber Summit,43.756303,-79.565963,1,Pizza Place,Empanada Restaurant,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store
9,M5V,Downtown Toronto,"CN Tower , King and Spadina , Railway Lands , ...",43.628947,-79.39442,0,Airport Lounge,Airport Service,Airport,Harbor / Marina,Plane,Boutique,Rental Car Location,Boat or Ferry,Bar,Sculpture Garden


Finally, visualize the resulting clusters


In [66]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
    
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
    
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], 
toronto_merged['Postal code'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
           
map_clusters


### 6. Examine Clusters


Now, you can examine each cluster and determine the discriminating venue
categories that distinguish each cluster. Based on the defining
categories, you can then assign a name to each cluster. I will leave
this exercise to you.


Cluster 1


In [67]:


toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]



Unnamed: 0,Postal code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5G,0,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Sandwich Place,Middle Eastern Restaurant,Thai Restaurant,Burger Joint,Bubble Tea Shop,Spa
1,M2H,0,Athletics & Sports,Golf Course,Dog Run,Fast Food Restaurant,Mediterranean Restaurant,Pool,Yoga Studio,Department Store,Coworking Space,Cosmetics Shop
2,M4B,0,Pizza Place,Pet Store,Gym / Fitness Center,Pharmacy,Bus Line,Bank,Athletics & Sports,Intersection,Gastropub,Fast Food Restaurant
4,M4G,0,Coffee Shop,Sporting Goods Shop,Burger Joint,Furniture / Home Store,Bank,Bagel Shop,Mexican Restaurant,Juice Bar,Brewery,Pet Store
5,M4M,0,Café,Coffee Shop,American Restaurant,Bakery,Brewery,Gastropub,Yoga Studio,Diner,Neighborhood,Middle Eastern Restaurant
7,M9V,0,Grocery Store,Beer Store,Liquor Store,Pizza Place,Pharmacy,Sandwich Place,Fast Food Restaurant,Fried Chicken Joint,Dessert Shop,Comic Shop
9,M5V,0,Airport Lounge,Airport Service,Airport,Harbor / Marina,Plane,Boutique,Rental Car Location,Boat or Ferry,Bar,Sculpture Garden
11,M5A,0,Coffee Shop,Park,Pub,Bakery,Café,Mexican Restaurant,Breakfast Spot,Dessert Shop,Health Food Store,Historic Site


Cluster 2


In [68]:

toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]




Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,North York,1,Pizza Place,Empanada Restaurant,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store


Cluster 3


In [69]:

toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,2,Middle Eastern Restaurant,Breakfast Spot,Auto Garage,Shopping Mall,Bakery,Sandwich Place,Convenience Store,Cosmetics Shop,Discount Store,Coworking Space


Cluster 4


In [70]:

toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,3,Playground,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop


Cluster 5


In [71]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Scarborough,4,Fast Food Restaurant,Construction & Landscaping,Yoga Studio,Discount Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop
