# Finding a place to live in Toronto, Canada
In this lab, we will be using data from Wikipedia and Foursquare to find a place to live in Toronto Canada based on environment preferences. Different parts of a city are more suitable for a person depending on their personality. For example, some people will enjoy an energetic nightlife while others will prefer a more quiet atmosphere.  

Using Foursquare, we will search for all venues near a set of Toronto postal codes using the key word "clubs". This word is chosen because it can be related to bars and night clubs as well as more tame settings such as sports centers and sandwich shops. To start, we will first need to find all of the postal codes for Toronto, Canada, all of which begin with the letter 'M'.

# Scraping Wikipedia for data
In this section, we will get a Canadian postal code table for codes that begin with 'M'.  

To start, we will import the needed libraries.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import geopy.distance

We will obtain the table from the library by using BeautifulSoup to find the table matching its class as seen in the element information on the Wikipedia web page.  
Then, we will save it to a Pandas DataFrame, and unassigned Postal Codes will be dropped

In [2]:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wiki_html = requests.get(wiki_url).content
wiki_soup = BeautifulSoup(wiki_html,'html.parser')
table = wiki_soup.find('table', class_ = "wikitable sortable")
wiki_df = pd.read_html(str(table))[0]
wiki_df = wiki_df[wiki_df['Borough'] != 'Not assigned']
wiki_df.columns = ['PostalCode','Borough','Neighborhood']
wiki_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Etobicoke,Islington Avenue
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


Some Postal Codes are assigned to a Borough, but there is no associated Neighborhood. For these cases, we will put the Borough name as the Neighborhood name.  

To make it so there are no repeated PostalCodes, Neighborhood values will be combined with a comma seperator.

In [3]:
not_assigned = wiki_df[wiki_df['Neighborhood'] == 'Not Assigned'].index
wiki_df.loc[not_assigned,'Neighborhood'] = wiki_df.loc[not_assigned,'Borough']

codes = list(wiki_df['PostalCode'].unique())
for c in codes:
    code_df = wiki_df[wiki_df['PostalCode']==c]
    neighborhood_list = list(code_df['Neighborhood'].values)
    index_list = list(code_df.index)
    wiki_df.loc[index_list[0],'Neighborhood'] = ', '.join(neighborhood_list)
    wiki_df.drop(index_list[1:],axis = 0,inplace = True)

wiki_df.reset_index(drop = True, inplace = True)
wiki_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [4]:
print('The dataframe has {} unique postal codes (rows).'.format(wiki_df.shape[0]))

The dataframe has 103 unique postal codes (rows).


# Getting coordinates
After grouping neighborhoods by their postal codes, we need to find the coordinates for each postal code.  

I first attempted to use the geocoder.google python library, but access was restricted, so I used the geopy library instead, which was used in other labs.  

The geopy library was not able to find all of the postal codes; some returned None. I wrote a try-except chain to try the neighborhood name or the borough, but this led to lots of postal codes having the same coordinates. In the end, I decided it was best to use the included csv file to get the accurage coordinates. The code I attempted to use is given in comments for reference.

In [5]:
LL_df = pd.read_csv('Geospatial_Coordinates.csv')
LL_df.columns = ['PostalCode','Latitude','Longitude']
LL_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
'''
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="can_explorer")

LL_df = pd.DataFrame(columns = ['PostalCode','Latitude','Longitude'])
for i,r in wiki_df.iterrows():
    
    try:
        address = r['PostalCode'] + ", Toronto, Ontario"
        location = geolocator.geocode(address)
        latitude = location.latitude
        print(i,end = ' ')
    except:
        try:
            address = r['Neighborhood'] + ", Toronto, Ontario"
            location = geolocator.geocode(address)
            latitude = location.latitude
            print(i,end = '_ ')
        except:
            address = r['Borough'] + ", Toronto, Ontario"
            location = geolocator.geocode(address)
            latitude = location.latitude
            print(i,end = '- ')
    longitude = location.longitude
    LL_df.loc[i] = (r['PostalCode'],latitude,longitude)
    
print()

'''

'\nfrom geopy.geocoders import Nominatim\ngeolocator = Nominatim(user_agent="can_explorer")\n\nLL_df = pd.DataFrame(columns = [\'PostalCode\',\'Latitude\',\'Longitude\'])\nfor i,r in wiki_df.iterrows():\n    \n    try:\n        address = r[\'PostalCode\'] + ", Toronto, Ontario"\n        location = geolocator.geocode(address)\n        latitude = location.latitude\n        print(i,end = \' \')\n    except:\n        try:\n            address = r[\'Neighborhood\'] + ", Toronto, Ontario"\n            location = geolocator.geocode(address)\n            latitude = location.latitude\n            print(i,end = \'_ \')\n        except:\n            address = r[\'Borough\'] + ", Toronto, Ontario"\n            location = geolocator.geocode(address)\n            latitude = location.latitude\n            print(i,end = \'- \')\n    longitude = location.longitude\n    LL_df.loc[i] = (r[\'PostalCode\'],latitude,longitude)\n    \nprint()\n\n'

The Latitude and Longitude data needs to be merged with the Borough/Neighborhood data using the PostalCode column as the common factor.

In [7]:
wiki_df = pd.merge(wiki_df,LL_df.set_index('PostalCode'), on = 'PostalCode')

wiki_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


The default search radius we will use is 1000 meters. However, some of the postal code centers are greater than 1000 meters apart from each other. In order to make sure we capture as much information as possible for these locations, we will increase the search radius to half of the distance to the closest neighboring postal code center. We will call this the Radius for the postal code

In [8]:
def closest_coordinate(coordinates):
    dist = np.zeros(len(coordinates))
    for i,c in enumerate(coordinates):
        dist2 = np.zeros(len(coordinates))
        for j,o in enumerate(coordinates):
            if i==j: continue
            dist2[j] = geopy.distance.distance(c, o).km*1000
        dist2 = np.where(dist2 == 0,np.inf,dist2)
        dist[i] = np.min(dist2)/2
    return dist

points = wiki_df[['Latitude','Longitude']].values
distances = closest_coordinate(points)
distances = np.where(distances < 1000,1000,distances)
wiki_df['Radius'] = distances
wiki_df['Radius'] = wiki_df['Radius'].astype(int)
wiki_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Radius
0,M3A,North York,Parkwoods,43.753259,-79.329656,1000
1,M4A,North York,Victoria Village,43.725882,-79.315572,1021
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,1000
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1000
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,1000
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,1289
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,1697
7,M3B,North York,Don Mills North,43.745906,-79.352188,1000
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,1000
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,1000


# Downloading Data from Foursquare
Now that we have the PostalCode and location information, we can start learning about what kind of venues are around each postal code. To start, I will provide an initialization for the Foursquare API by saving client information to variables.  

To make sure that we have implemented the URL for the API correctly, we will use a test case on the first row of the wiki_df DataFrame.

In [9]:
CLIENT_ID = 'CDZQKJRP421UKZ5QBWBDT4MJL30JKNXAVIZS3RLL0QA0PAK1' # your Foursquare ID
CLIENT_SECRET = 'PSUJNHMZO0LP3JDIZJ5RGHBD2RZCIYHU3SVHLYOZUO5FCNH2' # your Foursquare Secret
VERSION = '20200101' # Foursquare API version

radius = 1000
LIMIT = 50
query = 'club'
intent = 'browse'
i = 0

latitude = wiki_df.iloc[i]['Latitude']
longitude = wiki_df.iloc[i]['Longitude']
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}&intent={}'\
.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, query, radius, LIMIT,intent)


In [10]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e59670a1e152c001b7e5fdf'},
 'response': {'venues': [{'id': '5197f208498e8a6befd6f0b2',
    'name': 'Parkway Valley Tennis Club',
    'location': {'address': '230 Cassandra Blvd',
     'lat': 43.754481101445634,
     'lng': -79.31828498840332,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.754481101445634,
       'lng': -79.31828498840332}],
     'distance': 924,
     'postalCode': 'M3A 1T6',
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['230 Cassandra Blvd',
      'Toronto ON M3A 1T6',
      'Canada']},
    'categories': [{'id': '4e39a956bd410d7aed40cbc3',
      'name': 'Tennis Court',
      'pluralName': 'Tennis Courts',
      'shortName': 'Tennis Court',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/stadium_tennis_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1582917435',
    'hasPerk': False},
   {'

Now that we know that the API is configured correctly, we can start using it to request data en masse.  

To clean up the data, we will use the get_category_type function used in previous labs to make the category data more succinct and readable.

In [11]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Using the above function, we can clean up and reshape the results from the last API call to transform it into a pandas DataFrame.

In [12]:
from pandas.io.json import json_normalize
venues = results['response']['venues']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Parkway Valley Tennis Club,Tennis Court,43.754481,-79.318285
1,The Bargin Club,Miscellaneous Shop,43.745652,-79.32484
2,Sam's Club,Department Store,43.761463,-79.333725


Using the getNearbyVenues function from a previous lab, we will loop through all of the PostalCodes with the search query 'clubs'.

In [13]:
def getNearbyVenues(names, latitudes, longitudes, query, radius):
    
    venues_list=[]
    for name, lat, lng, rad in zip(names, latitudes, longitudes,radius):
        print(name, end = ', ')
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'\
                .format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, query, rad, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["venues"]
        if len(results) == 0:
            venues_list.append([(
                name, 
                lat, 
                lng, 
                'No Results', 
                lat, 
                lng,  
                'NA')] )
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results if len(v['categories']) > 0] )
        #break

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    #print(nearby_venues.head())
    nearby_venues.columns = ['PostalCode', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
query = ['club']
print('Postal codes searched:',end = ' ')
toronto_clubs = getNearbyVenues(names=wiki_df['PostalCode'],
                                   latitudes=wiki_df['Latitude'],
                                   longitudes=wiki_df['Longitude'],
                                   query = query,
                                   radius = wiki_df['Radius']
                                  )
print('Done.\nThere were',toronto_clubs.shape[0],'venues found.')

Postal codes searched: M3A, M4A, M5A, M6A, M7A, M9A, M1B, M3B, M4B, M5B, M6B, M9B, M1C, M3C, M4C, M5C, M6C, M9C, M1E, M4E, M5E, M6E, M1G, M4G, M5G, M6G, M1H, M2H, M3H, M4H, M5H, M6H, M1J, M2J, M3J, M4J, M5J, M6J, M1K, M2K, M3K, M4K, M5K, M6K, M1L, M2L, M3L, M4L, M5L, M6L, M9L, M1M, M2M, M3M, M4M, M5M, M6M, M9M, M1N, M2N, M3N, M4N, M5N, M6N, M9N, M1P, M2P, M4P, M5P, M6P, M9P, M1R, M2R, M4R, M5R, M6R, M7R, M9R, M1S, M4S, M5S, M6S, M1T, M4T, M5T, M1V, M4V, M5V, M8V, M9V, M1W, M4W, M5W, M8W, M9W, M1X, M4X, M5X, M8X, M4Y, M7Y, M8Y, M8Z, Done.
There were 1138 venues found.


Below is a list of venue categories and an associated count for how many there are of each category from our search results.

In [15]:
counted_types = toronto_clubs['Venue Category'].value_counts().to_frame()
counted_types.columns = ['count']
counted_types

Unnamed: 0,count
Gym,105
Nightclub,63
Event Space,52
Gym / Fitness Center,48
Residential Building (Apartment / Condo),48
Tennis Court,47
Lounge,37
Athletics & Sports,36
General Entertainment,35
Harbor / Marina,27


In order to perform custering on this data, we will need to numerically encode each venue according to its category; we will do this by one-hot encoding and then finding the frequency of each venue per postal code, a decimal value between 0 and 1 with 0 meaning there are no results matching the category and 1 meaning all venues are of that category.

In [16]:
# one hot encoding
clubs_onehot = pd.get_dummies(toronto_clubs[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
clubs_onehot['PostalCode'] = toronto_clubs['PostalCode']

# move neighborhood column to the first column
fixed_columns = [clubs_onehot.columns[-1]] + list(clubs_onehot.columns[:-1])
clubs_onehot = clubs_onehot[fixed_columns]

clubs_onehot.head()

Unnamed: 0,PostalCode,American Restaurant,Animal Shelter,Art Gallery,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Badminton Court,Bar,Baseball Stadium,...,Tennis Court,Thrift / Vintage Store,Toy / Game Store,Travel Lounge,Turkish Restaurant,University,Volleyball Court,Voting Booth,Wine Bar,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M3A,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
print('As expected, there are {} different venues and {} cetegories.'.format(*clubs_onehot.shape))

As expected, there are 1138 different venues and 147 cetegories.


In [18]:
clubs_grouped = clubs_onehot.groupby('PostalCode').mean().reset_index()
clubs_grouped.shape

(102, 147)

# Clustering the data
Now that we have a set of features to define each postal code, we can categorize each postal code based on the type and frequency of venue categories that were obtained from our search results. Using the return_most_common_venues function, we will find the ten most common venues for each postal code and save them to a DataFrame.

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = clubs_grouped['PostalCode']

for ind in np.arange(clubs_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(clubs_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Athletics & Sports,Curling Ice,Gay Bar,Yoga Studio,Dive Bar,Factory,Event Space,Electronics Store,Drugstore,Dog Run
1,M1C,Sports Club,Tennis Court,Yoga Studio,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
2,M1E,Spa,Student Center,Tennis Court,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
3,M1G,Bus Line,Bus Stop,Building,Creperie,Cultural Center,Curling Ice,Daycare,Department Store,Dessert Shop,Convention Center
4,M1H,Building,Gym / Fitness Center,Tennis Court,Athletics & Sports,Pool,Gym Pool,Fast Food Restaurant,Convenience Store,Convention Center,Creperie


Using the scikit-learn library for K-Means, we cluster the similar postal codes together based on their venues and automatically label them numerically. The labels are added to the above DataFrame, which is re-printed below.

In [21]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 4

manhattan_grouped_clustering = clubs_grouped.drop('PostalCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 1, 0, 1, 3, 1, 2, 1])

In [22]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = wiki_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('PostalCode'), on='PostalCode')

manhattan_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Radius,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1000,0.0,Department Store,Golf Course,Tennis Court,Miscellaneous Shop,Creperie,Cultural Center,Curling Ice,Convention Center,Daycare,Convenience Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,1021,1.0,Shoe Store,Gym / Fitness Center,Daycare,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop,Department Store
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,1000,1.0,Boat or Ferry,Office,Comedy Club,Harbor / Marina,Recreation Center,Residential Building (Apartment / Condo),Curling Ice,Event Space,Lounge,Speakeasy
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1000,1.0,Athletics & Sports,Gym / Fitness Center,Yoga Studio,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,1000,1.0,Residential Building (Apartment / Condo),Gym,Event Space,Nightclub,Athletics & Sports,Sports Bar,Clothing Store,Student Center,Indian Restaurant,Karaoke Bar


# Visualizing the Data
Uisng the newly-labeled data, we can visualize each postal code's assigned cluster on a map using the Folium library. Numeric cluster labels do not give much intuition, so we will first give each numeric label an intuitive name by exploring the venues within each cluster.

In [23]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Radius,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1000,0.0,Department Store,Golf Course,Tennis Court,Miscellaneous Shop,Creperie,Cultural Center,Curling Ice,Convention Center,Daycare,Convenience Store
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,1289,0.0,Historic Site,Residential Building (Apartment / Condo),Golf Course,Tennis Court,Office,Department Store,Electronics Store,Drugstore,Dog Run,Dive Bar
7,M3B,North York,Don Mills North,43.745906,-79.352188,1000,0.0,Tennis Court,Smoke Shop,Gym,Golf Course,Yoga Studio,Department Store,Electronics Store,Drugstore,Dog Run,Dive Bar
10,M6B,North York,Glencairn,43.709577,-79.445073,1000,0.0,Tennis Court,Gym,Daycare,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop,Department Store
12,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,1626,0.0,Sports Club,Tennis Court,Yoga Studio,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
18,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1208,0.0,Spa,Student Center,Tennis Court,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
23,M4G,East York,Leaside,43.70906,-79.363452,1000,0.0,Athletics & Sports,Gym,Curling Ice,Tennis Court,Yoga Studio,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore
26,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,1000,0.0,Building,Gym / Fitness Center,Tennis Court,Athletics & Sports,Pool,Gym Pool,Fast Food Restaurant,Convenience Store,Convention Center,Creperie
27,M2H,North York,Hillcrest Village,43.803762,-79.363452,1301,0.0,Gym,Golf Course,Strip Club,Pet Store,Tennis Court,Yoga Studio,Electronics Store,Drugstore,Dog Run,Dive Bar
29,M4H,East York,Thorncliffe Park,43.705369,-79.349372,1000,0.0,Garden,Residential Building (Apartment / Condo),Martial Arts Dojo,Curling Ice,Tennis Court,Yoga Studio,Electronics Store,Drugstore,Dog Run,Dive Bar


In [24]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Radius,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M4A,North York,Victoria Village,43.725882,-79.315572,1021,1.0,Shoe Store,Gym / Fitness Center,Daycare,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop,Department Store
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636,1000,1.0,Boat or Ferry,Office,Comedy Club,Harbor / Marina,Recreation Center,Residential Building (Apartment / Condo),Curling Ice,Event Space,Lounge,Speakeasy
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1000,1.0,Athletics & Sports,Gym / Fitness Center,Yoga Studio,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,1000,1.0,Residential Building (Apartment / Condo),Gym,Event Space,Nightclub,Athletics & Sports,Sports Bar,Clothing Store,Student Center,Indian Restaurant,Karaoke Bar
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,1697,1.0,Athletics & Sports,Curling Ice,Gay Bar,Yoga Studio,Dive Bar,Factory,Event Space,Electronics Store,Drugstore,Dog Run
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,1000,1.0,Event Space,Residential Building (Apartment / Condo),Gym / Fitness Center,Athletics & Sports,Curling Ice,Skating Rink,Yoga Studio,Dessert Shop,Electronics Store,Drugstore
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,1000,1.0,Gym,Gym / Fitness Center,Event Space,Nightclub,Residential Building (Apartment / Condo),Sports Bar,Salon / Barbershop,Lounge,Clothing Store,Performing Arts Venue
11,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,1000,1.0,Miscellaneous Shop,Gym,Golf Course,Office,Dessert Shop,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
13,M3C,North York,"Flemingdon Park, Don Mills South",43.725900,-79.340923,1021,1.0,Miscellaneous Shop,Restaurant,Café,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar
14,M4C,East York,Woodbine Heights,43.695344,-79.318389,1000,1.0,Athletics & Sports,Café,Event Space,Sports Club,Curling Ice,Skating Rink,Yoga Studio,Dive Bar,Electronics Store,Drugstore


In [25]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Radius,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
47,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,1000,2.0,Harbor / Marina,Real Estate Office,Breakfast Spot,Athletics & Sports,Martial Arts Dojo,Yoga Studio,Dive Bar,Event Space,Electronics Store,Drugstore
51,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476,1114,2.0,Harbor / Marina,Tennis Court,Daycare,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop,Department Store
81,M6S,West Toronto,"Runnymede, Swansea",43.651571,-79.48445,1000,2.0,Harbor / Marina,Skate Park,Tennis Court,Pool,Conference Room,Convenience Store,Convention Center,Creperie,Cultural Center,Curling Ice
87,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.628947,-79.39442,1000,2.0,Harbor / Marina,Art Gallery,Gym / Fitness Center,Bar,Bowling Alley,Dive Bar,Boat or Ferry,Yoga Studio,Dog Run,Fast Food Restaurant
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558,1000,2.0,Harbor / Marina,Breakfast Spot,Athletics & Sports,Martial Arts Dojo,Yoga Studio,Dive Bar,Factory,Event Space,Electronics Store,Drugstore
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509,1000,2.0,Harbor / Marina,Music Store,Yoga Studio,Dessert Shop,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar


In [26]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Radius,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029,1114,3.0,,Yoga Studio,Concert Hall,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop
39,M2K,North York,Bayview Village,43.786947,-79.385975,1000,3.0,,Yoga Studio,Concert Hall,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop
46,M3L,North York,Downsview West,43.739015,-79.506944,1000,3.0,,Yoga Studio,Concert Hall,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop
71,M1R,Scarborough,"Maryvale, Wexford",43.750072,-79.295849,1000,3.0,,Yoga Studio,Concert Hall,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop
77,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724,1000,3.0,,Yoga Studio,Concert Hall,Factory,Event Space,Electronics Store,Drugstore,Dog Run,Dive Bar,Dessert Shop


Based on the information in the tables, it looks like the first cluster has a lot of fitness centers, the second has more bars and night clubs, the third is dominated by waterfronts, and the fourth is associated with postal codes where the first (and only) result was "No venues", meaning that no venues that matched the query 'club' were found near the postal code.  

Finally, we can display a map of the clustered postal codes, sorted into colored groups and labeled according to their postal code and cluster name.

In [30]:
cluster_text = ['Fitness','Nightlife','Water and Sports','No Clubs']

In [31]:
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

In [32]:
manhattan_merged.dropna(inplace = True)

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['PostalCode'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) +' '+ str(cluster_text[int(cluster)]), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Since the map will not display correctly on github, a screenshot is included here.

![Since the map will not display correctly on github, a screenshot is included here.](TorontoMapScreenshot.png)

[If the screenshot does not appear, click here.](https://github.com/ddjanke/IBM_Capstone/blob/master/TorontoMapScreenshot.PNG)