# IBM Applied Data Science Capstone

## What is the purpose of this notebook? 
We will be exploring location data through Foursquare and using this data to learn more about similar neighborhoods. 

In [3]:
import numpy as np
import pandas as pd

print("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

### Pull the wiki page and scrape the data using Pandas:

In [4]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
tables_from_wiki = pd.read_html(url)

### We can see that the first table below is the one we want, so grab that

In [5]:
tables_from_wiki

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z      Not assigned   
 176         M6Z      Not assigned   
 177         M7Z      Not assigned   
 178         M8Z         Etobicoke   
 179         M9Z      Not assigned   
 
                                          Neighbourhood  
 0                                         Not assigned  
 1                                         Not assigned  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                       Not assigned  
 176                                       Not assigned  
 177                

### Using the first table to convert to a dataframe:

In [6]:
toronto_df = tables_from_wiki[0]
toronto_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### Dropping rows where borough isn't assigned per the assignment

In [7]:
toronto_cleaned_df = toronto_df[toronto_df["Borough"] != "Not assigned"]

### Checking out our before and after data: 

In [8]:
print(toronto_cleaned_df.shape)
print(toronto_df.shape)
toronto_cleaned_df

(103, 3)
(180, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [9]:
toronto_cleaned_df[toronto_cleaned_df["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postal Code,Borough,Neighbourhood


### Well, above it looks like there are none with no Neighbourhood that have a borough, so not cleaning for that. 

### So here's our df, but with the indexes all wonky: 

In [10]:
toronto_cleaned_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### Let's change the index so it doesn't annoyingly start at 2 and skip indices: 

In [11]:
toronto_cleaned_df = toronto_cleaned_df.reset_index(drop=True)

### Here's the final dataframe:

In [12]:
toronto_cleaned_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### And the shape as requested:

In [13]:
toronto_cleaned_df.shape

(103, 3)

## Getting Coordinates!

In [14]:
import geocoder # import geocoder

def get_coordinates(postal_code):
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude, longitude

In [None]:
get_coordinates("M1B")

In [15]:
postal_code_df = pd.read_csv("Geospatial_Coordinates.csv")

In [30]:
postal_code_df

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [31]:
final_toronto_df = pd.merge(toronto_cleaned_df,postal_code_df, on = "Postal Code")

### Final Toronto Dataframe with Latitude and Longitude! 

In [32]:
final_toronto_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## Time to Start Clustering Neighborhoods!

In [22]:
!pip install folium
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium 

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 3.7 MB/s eta 0:00:011
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [24]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_view")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Ontario are 43.6534817, -79.3839347.


In [40]:
toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for pc, lat, lng, borough, neighborhood in zip(final_toronto_df['Postal Code'], final_toronto_df['Latitude'], final_toronto_df['Longitude'], final_toronto_df['Borough'], final_toronto_df['Neighbourhood']):
    label = '{}, {}, {}'.format(neighborhood, borough, pc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto)  
    
toronto

In [44]:
CLIENT_ID = 'LNP5VXIAT3ARBHP2HZONJKFBZCU2Z2D4CREQEGDSIH3WGRJT' # your Foursquare ID
CLIENT_SECRET = 'R55Q3YNOJ3BH4CDXBYVKJSQ2JEYS2KUWYOTWCWJ0NVHFRTMB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LNP5VXIAT3ARBHP2HZONJKFBZCU2Z2D4CREQEGDSIH3WGRJT
CLIENT_SECRET:R55Q3YNOJ3BH4CDXBYVKJSQ2JEYS2KUWYOTWCWJ0NVHFRTMB


In [41]:
only_toronto = final_toronto_df[final_toronto_df['Borough'].str.contains('Toronto')]

### Include Only Boroughs with Toronto in the Name

In [46]:
only_toronto = only_toronto.reset_index(drop=True)
only_toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


### Map Boroughs with Toronto in the Name to Visualize How Much We've Reduced Our Dataset. 

In [60]:
only_toronto_neighborhoods = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for pc, lat, lng, borough, neighborhood in zip(only_toronto['Postal Code'], only_toronto['Latitude'], only_toronto['Longitude'], only_toronto['Borough'], only_toronto['Neighbourhood']):
    label = '{}, {}, {}'.format(neighborhood, borough, pc)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(only_toronto_neighborhoods)
    
only_toronto_neighborhoods


### Let's Randomly Pick Roselawn to Grab Data to Test the API/Make Sure it's Working

In [56]:
roselawn_latitude = only_toronto.loc[19, 'Latitude']
roselawn_longitude = only_toronto.loc[19, 'Longitude']
roselawn_latitude

43.7116948

### Build an API Call to Explore Popular Venues

In [57]:
url = "https://api.foursquare.com/v2/venues/explore"

params = dict(
client_id= CLIENT_ID,
client_secret= CLIENT_SECRET,
v='20180323',
ll='{},{}'.format(roselawn_latitude,roselawn_longitude),
radius = 500,
limit=100
)


In [58]:
import requests
import json

results = requests.get(url, params).json()
results

{'meta': {'code': 200, 'requestId': '6020425c2aaae823b6d51a53'},
  'headerLocation': 'Lawrence Park South',
  'headerFullLocation': 'Lawrence Park South, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.7161948045, 'lng': -79.41072165393975},
   'sw': {'lat': 43.707194795499994, 'lng': -79.42314954606023}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e6e176c45dd293273b74e3c',
       'name': "Rosalind's Garden Oasis",
       'contact': {},
       'location': {'lat': 43.71218888050602,
        'lng': -79.41197784736922,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.71218888050602,
          'lng': -79.41197784736922}],
        'distance': 402,
        'cc': 'CA',
        'city': 

### Using Method to Get Venues for a Set of Locations from New York Exercise

In [62]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Build the Dataframe of Venues

In [63]:
venues = getNearbyVenues(only_toronto['Neighbourhood'],only_toronto['Latitude'],only_toronto['Longitude'])

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
R

In [64]:
venues.head(5)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [69]:
print(venues.shape)
venues.groupby('Neighborhood').count()



(1585, 7)


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,54,54,54,54,54,54
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",16,16,16,16,16,16
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,61,61,61,61,61,61
Christie,15,15,15,15,15,15
Church and Wellesley,78,78,78,78,78,78
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,34,34,34,34,34,34
Davisville North,8,8,8,8,8,8


In [73]:
len(venues['Venue Category'].unique())

229

### One Hot Encode the venues dataframe we created to make it useful for ML algorithms

In [85]:
toronto_one_hot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

In [88]:
toronto_one_hot

Unnamed: 0,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1580,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1581,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1582,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1583,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Put Labels Back onto One Hot Array (Not one-hot encoding the Neighborhood name) 

In [None]:
toronto_one_hot.insert(0, "Neighborhoods",venues['Neighborhood'])

In [90]:
toronto_one_hot

Unnamed: 0,Neighborhoods,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Aquarium,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1580,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1581,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1582,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1583,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Get mean encoded value for neighborhoods by grouping venues by neighborhood

This determines the "composition" of all venues in a neighborhood by percentage.

In [137]:
toronto_category_means_grouped = toronto_one_hot.groupby("Neighborhoods").mean().reset_index()

In [138]:
toronto_category_means_grouped

Unnamed: 0,Neighborhoods,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Aquarium,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.133333,0.2,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.016393
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [139]:
print(toronto_category_means_grouped.shape)
print(only_toronto.shape)

(39, 230)
(39, 5)


### Quickly inspect all neighborhoods to determine approximate % of venues in top categories

In [140]:
for n in toronto_category_means_grouped['Neighborhoods']: 
    neighborhood = toronto_category_means_grouped[toronto_category_means_grouped['Neighborhoods'] == n].T.reset_index()
    neighborhood.columns = ['venue','freq']
    neighborhood = neighborhood.iloc[1:]
    neighborhood['freq'] = neighborhood['freq'].astype(float)
    neighborhood = neighborhood.round({'freq': 2})
    print(n)
    print(neighborhood.sort_values('freq', ascending=False).reset_index(drop=True).head())
    print("")

Berczy Park
            venue  freq
0     Coffee Shop  0.09
1    Cocktail Bar  0.06
2     Cheese Shop  0.04
3  Farmers Market  0.04
4          Bakery  0.04

Brockton, Parkdale Village, Exhibition Place
            venue  freq
0            Café  0.13
1     Coffee Shop  0.09
2       Nightclub  0.09
3  Breakfast Spot  0.09
4             Gym  0.04

Business reply mail Processing Centre, South Central Letter Processing Plant Toronto
                venue  freq
0  Light Rail Station  0.12
1                 Spa  0.06
2    Recording Studio  0.06
3      Farmers Market  0.06
4         Pizza Place  0.06

CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
             venue  freq
0  Airport Service  0.20
1   Airport Lounge  0.13
2         Boutique  0.07
3          Airport  0.07
4              Bar  0.07

Central Bay Street
                venue  freq
0         Coffee Shop  0.18
1      Sandwich Place  0.05
2  Italian Restaurant  0.05
3         

### Method to sort each row

In [185]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Build a dataframe of NAMED venue types per neighborhood instead of encodings

In [143]:
num_top_venues = 10
toronto_venue_types_by_neighborhood = toronto_category_means_grouped
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_venue_types_by_neighborhood_sorted = pd.DataFrame(columns=columns)
toronto_venue_types_by_neighborhood_sorted['Neighborhoods'] = toronto_category_means_grouped['Neighborhoods']

for ind in np.arange(toronto_category_means_grouped.shape[0]):
    toronto_venue_types_by_neighborhood_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_category_means_grouped.iloc[ind, :], num_top_venues)

toronto_venue_types_by_neighborhood_sorted

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Park,Jazz Club
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Nightclub,Coffee Shop,Climbing Gym,Burrito Place,Stadium,Restaurant,Italian Restaurant,Intersection
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Auto Workshop,Pizza Place,Comic Shop,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Spa
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Rental Car Location,Airport,Airport Food Court,Airport Gate,Bar,Sculpture Garden,Harbor / Marina,Boutique
4,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Salad Place,Bubble Tea Shop,Burger Joint,Yoga Studio,Business Service,Bike Rental / Bike Share
5,Christie,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Candy Store,Restaurant,Baby Store,Coffee Shop,Cosmetics Shop
6,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Pub,Men's Store,Mediterranean Restaurant,Hotel,Dance Studio
7,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,American Restaurant,Italian Restaurant,Gym,Cocktail Bar,Seafood Restaurant,Japanese Restaurant
8,Davisville,Pizza Place,Sandwich Place,Dessert Shop,Gym,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Thai Restaurant,Seafood Restaurant
9,Davisville North,Gym,Breakfast Spot,Hotel,Food & Drink Shop,Department Store,Park,Sandwich Place,Pizza Place,Antique Shop,Dessert Shop


In [144]:
from sklearn.cluster import KMeans

In [147]:
toronto_k_means_df = toronto_category_means_grouped.drop('Neighborhoods',1)

### Run clustering on 5 groups to break out neighborhoods. 

In [171]:
num_clusters = 5
toronto_k_means = KMeans(n_clusters=num_clusters, random_state = 0) 
toronto_k_means_model = toronto_k_means.fit(toronto_k_means_df)

In [155]:
toronto_k_means.labels_

array([0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 3, 0, 0, 0, 1, 0, 1, 0, 2, 0,
       0, 0, 0, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], dtype=int32)

In [166]:
# Use to label and unlabel with diferent k means cluster sizes
# toronto_category_means_grouped = toronto_category_means_grouped.drop('Cluster Labels',1)
# toronto_venue_types_by_neighborhood_sorted=toronto_venue_types_by_neighborhood_sorted.drop('Cluster Labels',1)

### Append data to named neighborhoods with encodings to make it more interpretable

In [167]:
toronto_category_means_grouped_clustered = toronto_category_means_grouped
toronto_category_means_grouped_clustered.insert(0, 'Cluster Labels', toronto_k_means.labels_)

In [168]:
toronto_category_means_grouped_clustered

Unnamed: 0,Cluster Labels,Neighborhoods,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0
1,0,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.133333,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.016393
5,0,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0,Church and Wellesley,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641
7,0,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
8,0,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [169]:
toronto_venue_types_by_neighborhood_sorted.insert(0, 'Cluster Labels', toronto_k_means.labels_)

In [170]:
toronto_venue_types_by_neighborhood_sorted

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Park,Jazz Club
1,0,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Nightclub,Coffee Shop,Climbing Gym,Burrito Place,Stadium,Restaurant,Italian Restaurant,Intersection
2,1,"Business reply mail Processing Centre, South C...",Light Rail Station,Auto Workshop,Pizza Place,Comic Shop,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Spa
3,1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Rental Car Location,Airport,Airport Food Court,Airport Gate,Bar,Sculpture Garden,Harbor / Marina,Boutique
4,0,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Salad Place,Bubble Tea Shop,Burger Joint,Yoga Studio,Business Service,Bike Rental / Bike Share
5,0,Christie,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Candy Store,Restaurant,Baby Store,Coffee Shop,Cosmetics Shop
6,0,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Pub,Men's Store,Mediterranean Restaurant,Hotel,Dance Studio
7,0,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,American Restaurant,Italian Restaurant,Gym,Cocktail Bar,Seafood Restaurant,Japanese Restaurant
8,0,Davisville,Pizza Place,Sandwich Place,Dessert Shop,Gym,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Thai Restaurant,Seafood Restaurant
9,1,Davisville North,Gym,Breakfast Spot,Hotel,Food & Drink Shop,Department Store,Park,Sandwich Place,Pizza Place,Antique Shop,Dessert Shop


### Merge to get coordinates into one dataframe with neighborhood clustering data

In [174]:

toronto_neighborhoods_with_clusters = toronto_venue_types_by_neighborhood_sorted
toronto_coords_and_clusters = toronto_neighborhoods_with_clusters

toronto_coords_and_clusters = toronto_coords_and_clusters.join(only_toronto.set_index('Neighbourhood'), on='Neighborhoods')

toronto_coords_and_clusters.head()


Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postal Code,Borough,Latitude,Longitude
0,0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Park,Jazz Club,M5E,Downtown Toronto,43.644771,-79.373306
1,0,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Nightclub,Coffee Shop,Climbing Gym,Burrito Place,Stadium,Restaurant,Italian Restaurant,Intersection,M6K,West Toronto,43.636847,-79.428191
2,1,"Business reply mail Processing Centre, South C...",Light Rail Station,Auto Workshop,Pizza Place,Comic Shop,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Spa,M7Y,East Toronto,43.662744,-79.321558
3,1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Rental Car Location,Airport,Airport Food Court,Airport Gate,Bar,Sculpture Garden,Harbor / Marina,Boutique,M5V,Downtown Toronto,43.628947,-79.39442
4,0,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Salad Place,Bubble Tea Shop,Burger Joint,Yoga Studio,Business Service,Bike Rental / Bike Share,M5G,Downtown Toronto,43.657952,-79.387383


### Visualize the data and the splitout. 

In [178]:
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_coords_and_clusters['Latitude'], toronto_coords_and_clusters['Longitude'], toronto_coords_and_clusters['Neighborhoods'], toronto_coords_and_clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [180]:
toronto_coords_and_clusters[toronto_coords_and_clusters['Cluster Labels']==0]

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postal Code,Borough,Latitude,Longitude
0,0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Park,Jazz Club,M5E,Downtown Toronto,43.644771,-79.373306
1,0,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Nightclub,Coffee Shop,Climbing Gym,Burrito Place,Stadium,Restaurant,Italian Restaurant,Intersection,M6K,West Toronto,43.636847,-79.428191
4,0,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Salad Place,Bubble Tea Shop,Burger Joint,Yoga Studio,Business Service,Bike Rental / Bike Share,M5G,Downtown Toronto,43.657952,-79.387383
5,0,Christie,Grocery Store,Café,Park,Nightclub,Italian Restaurant,Candy Store,Restaurant,Baby Store,Coffee Shop,Cosmetics Shop,M6G,Downtown Toronto,43.669542,-79.422564
6,0,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Pub,Men's Store,Mediterranean Restaurant,Hotel,Dance Studio,M4Y,Downtown Toronto,43.66586,-79.38316
7,0,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,American Restaurant,Italian Restaurant,Gym,Cocktail Bar,Seafood Restaurant,Japanese Restaurant,M5L,Downtown Toronto,43.648198,-79.379817
8,0,Davisville,Pizza Place,Sandwich Place,Dessert Shop,Gym,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Thai Restaurant,Seafood Restaurant,M4S,Central Toronto,43.704324,-79.38879
10,0,"Dufferin, Dovercourt Village",Pharmacy,Bakery,Furniture / Home Store,Liquor Store,Café,Middle Eastern Restaurant,Supermarket,Bar,Bank,Music Venue,M6H,West Toronto,43.669005,-79.442259
11,0,"First Canadian Place, Underground city",Coffee Shop,Café,Hotel,Gym,Restaurant,Japanese Restaurant,Seafood Restaurant,Steakhouse,Asian Restaurant,Deli / Bodega,M5X,Downtown Toronto,43.648429,-79.38228
13,0,"Garden District, Ryerson",Coffee Shop,Clothing Store,Japanese Restaurant,Cosmetics Shop,Café,Bubble Tea Shop,Middle Eastern Restaurant,Italian Restaurant,Hotel,Pizza Place,M5B,Downtown Toronto,43.657162,-79.378937


In [181]:
toronto_coords_and_clusters[toronto_coords_and_clusters['Cluster Labels']==1]

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postal Code,Borough,Latitude,Longitude
2,1,"Business reply mail Processing Centre, South C...",Light Rail Station,Auto Workshop,Pizza Place,Comic Shop,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Spa,M7Y,East Toronto,43.662744,-79.321558
3,1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Rental Car Location,Airport,Airport Food Court,Airport Gate,Bar,Sculpture Garden,Harbor / Marina,Boutique,M5V,Downtown Toronto,43.628947,-79.39442
9,1,Davisville North,Gym,Breakfast Spot,Hotel,Food & Drink Shop,Department Store,Park,Sandwich Place,Pizza Place,Antique Shop,Dessert Shop,M4P,Central Toronto,43.712751,-79.390197
16,1,"India Bazaar, The Beaches West",Fast Food Restaurant,Park,Pizza Place,Gym,Sushi Restaurant,Food & Drink Shop,Ice Cream Shop,Fish & Chips Shop,Italian Restaurant,Liquor Store,M4L,East Toronto,43.668999,-79.315572
18,1,Lawrence Park,Park,Swim School,Bus Line,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,M4N,Central Toronto,43.72802,-79.38879
35,1,The Beaches,Neighborhood,Health Food Store,Trail,Pub,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,M4E,East Toronto,43.676357,-79.293031


In [182]:
toronto_coords_and_clusters[toronto_coords_and_clusters['Cluster Labels']==2]

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postal Code,Borough,Latitude,Longitude
20,2,"Moore Park, Summerhill East",Restaurant,Trail,Tennis Court,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,M4T,Central Toronto,43.689574,-79.38316


In [183]:
toronto_coords_and_clusters[toronto_coords_and_clusters['Cluster Labels']==3]

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postal Code,Borough,Latitude,Longitude
12,3,"Forest Hill North & West, Forest Hill Road Park",Park,Trail,Jewelry Store,Sushi Restaurant,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,M5P,Central Toronto,43.696948,-79.411307
26,3,Rosedale,Park,Playground,Trail,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,M4W,Downtown Toronto,43.679563,-79.377529


In [184]:
toronto_coords_and_clusters[toronto_coords_and_clusters['Cluster Labels']==4]

Unnamed: 0,Cluster Labels,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postal Code,Borough,Latitude,Longitude
27,4,Roselawn,Music Venue,Garden,Yoga Studio,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,M5N,Central Toronto,43.711695,-79.416936
