### Github will not display the folium maps, apparently it blocks the javascript. NBViewer will render the notebook from the github repo, with the maps displayed:
https://nbviewer.jupyter.org/github/rareal/Coursera_Capstone/blob/master/Toronto_neigh_Cluster.ipynb

This is the link to the notebook on github (it's also contained in the path of the link above)
https://github.com/rareal/Coursera_Capstone/blob/master/Toronto_neigh_Cluster.ipynb


# Toronto Neighbourhoods - clustering
#### This is part of the Course [<u><b>Applied Data Science Capstone</b></u>](https://www.coursera.org/learn/applied-data-science-capstone/) on Coursera, to complete the Specialization <u><b>IBM Data Science Professional Certificate</b></u>

This exercise is to cluster the Toronto Neighbourhoods, using the geocodes we got in the second [notebook](https://github.com/rareal/Coursera_Capstone/blob/master/Toronto_Neighborhoods_LatLong.ipynb).

---------------
Importing dependencies:

In [204]:
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

Read the dataset generated in the second notebook, with the PotalCodes, Borough, Neighbourhood and the Coordinates 

In [4]:
ToNeighLL = pd.read_csv('Toronto_neigh_latlon.csv',index_col=0)

In [5]:
ToNeighLL.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.6555,-79.3626
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.7223,-79.4504
4,M7A,Queen's Park,Queen's Park,43.6641,-79.3889


Need to get coordinates for Toronto to plot. I'll use LocationIQ as in the 2nd notebook.

In [6]:
apikey = pd.read_table('locationiq_api_key',header=None).iloc[0,0]

In [7]:
# Search / Forward Geocoding url
search_url = "https://us1.locationiq.com/v1/search.php"
data = {'key': apikey,'q': 'Toronto, Ontario, CA','format': 'json'}
response = requests.get(search_url, params=data)

In [8]:
TO = response.json()[0]
print(TO['display_name'])
print('latitude: ',TO['lat'])
print('longitude: ',TO['lon'])

Toronto, Ontario, M6K 1X9, Canada
latitude:  43.653963
longitude:  -79.387207


#### Create a Map for Toronto and plot the Neighbourhoods. 

In [25]:
# Toronto map
toto_map = folium.Map(location=[float(TO['lat']), float(TO['lon'])], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(ToNeighLL['Latitude'], ToNeighLL['Longitude'], 
                                           ToNeighLL['Borough'], ToNeighLL['Neighbourhood'], ToNeighLL['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=200,  # meters
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
    
toto_map

Some Neighborhoods are really close, less then 200m. We can see this if it zoom into Downtown Toronto

In [27]:
# Toronto map
toto_map = folium.Map(location=[float(TO['lat']), float(TO['lon'])], zoom_start=14)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(ToNeighLL['Latitude'], ToNeighLL['Longitude'], 
                                           ToNeighLL['Borough'], ToNeighLL['Neighbourhood'], ToNeighLL['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=200,  # meters
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
    
toto_map

In [12]:
ToNeighLL.Borough.unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [147]:
ToNeighLL_to = ToNeighLL[ToNeighLL.Borough.str.contains('Toronto')].sort_values(by='Borough').reset_index(drop=True)

In [148]:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

In [149]:
Clus_dataSet = ToNeighLL_to[['Latitude','Longitude']]
Clus_dataSet = np.nan_to_num(Clus_dataSet)
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)

In [155]:
# Compute DBSCAN
db = DBSCAN(eps=0.2).fit(Clus_dataSet)
db.labels_

array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  0, -1,  0,
       -1,  0, -1,  0,  0,  0, -1, -1, -1,  0, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1])

In [156]:
set(db.labels_)

{-1, 0}

In [157]:
dbto = ToNeighLL_to.copy()
dbto['lables'] = db.labels_

dbto[dbto['lables']==0]

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,lables
14,M5W,Downtown Toronto,Stn A PO Boxes 25 The Esplanade,43.6437,-79.3787,0
16,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.6492,-79.3823,0
18,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.6492,-79.3823,0
20,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,0
21,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.6469,-79.3823,0
22,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,0
26,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.6496,-79.3833,0


In [160]:
# grouped neighborhoods
dbto_gr = dbto[dbto['lables']==0].groupby('Borough').mean()[['Latitude','Longitude']].reset_index()

In [161]:
dbto_gr['Neighbourhood'] = dbto[dbto['lables']==0].groupby('Borough')['Neighbourhood'].apply(lambda x: ', '.join(x))[0]

In [162]:
dbto_gr['PostalCode'] = dbto[dbto['lables']==0].groupby('Borough')['PostalCode'].apply(lambda x: ', '.join(x))[0]

In [163]:
dbto_gr[dbto.columns.values[:-1]]

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,"M5W, M5X, M5L, M5C, M5K, M5E, M5H",Downtown Toronto,"Stn A PO Boxes 25 The Esplanade, First Canadia...",43.647929,-79.379986


In [164]:
dbto_gr_joined = pd.concat([dbto[dbto['lables']==-1].iloc[:,:-1],
                            dbto_gr[dbto.columns.values[:-1]]],axis=0).reset_index(drop=True)

In [168]:
# Toronto map
toto_map = folium.Map(location=[float(TO['lat']), float(TO['lon'])], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(dbto_gr_joined['Latitude'], dbto_gr_joined['Longitude'], 
                                           dbto_gr_joined['Borough'], dbto_gr_joined['Neighbourhood'],
                                            dbto_gr_joined['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
toto_map

In [175]:
# M7Y is too far from the rest, dropping it
dbto_gr_joined2 = dbto_gr_joined[dbto_gr_joined.PostalCode!='M7Y'].reset_index(drop=True)

In [241]:
# Toronto map
toto_map = folium.Map(location=[float(TO['lat'])+0.025, float(TO['lon'])], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(dbto_gr_joined2['Latitude'], dbto_gr_joined2['Longitude'], 
                                           dbto_gr_joined2['Borough'], dbto_gr_joined2['Neighbourhood'],
                                            dbto_gr_joined2['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=500, #meters
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
toto_map

In [183]:
foursquare_cred = pd.read_csv('foursquare_cred') # saved locally and ignored in .gitignore

In [195]:
CLIENT_ID = foursquare_cred.CLIENT_ID[0] # your Foursquare ID
CLIENT_SECRET = foursquare_cred.CLIENT_SECRET[0] # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [199]:
latitude = dbto_gr_joined2.iloc[-1].Latitude
longitude = dbto_gr_joined2.iloc[-1].Longitude
radius = 500
LIMIT = 100

url = ('https://api.foursquare.com/v2/venues/explore?'
       'client_id={}&client_secret={}&ll={},{}&v={}&'
       'radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT))
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c17fcdff594df03f057c91c'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4ad4c05df964a52059f620e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/default_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1c4941735',
         'name': 'Restaurant',
         'pluralName': 'Restaurants',
         'primary': True,
         'shortName': 'Restaurant'}],
       'id': '4ad4c05df964a52059f620e3',
       'location': {'address': '66 Wellington St West',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'at Bay Street',
        'distance': 119,
        'formattedAddress': ['66 Wellington St West (at Bay Street)',
         'Toronto ON M5K 1H6',
         'Canada'],
        'labele

In [201]:
[x['venue']['name'] for x in results['response']['groups'][0]['items']][:10]

['Canoe',
 'Equinox Bay Street',
 'Mos Mos Coffee',
 'King Taps',
 'Maman',
 'Hockey Hall Of Fame (Hockey Hall of Fame)',
 'Adelaide Club Toronto',
 'Brick Street Bakery',
 'Design Exchange',
 'Pilot Coffee Roasters']

In [202]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [205]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Canoe,Restaurant,43.647452,-79.38132
1,Equinox Bay Street,Gym,43.6481,-79.379989
2,Mos Mos Coffee,Café,43.648159,-79.378745
3,King Taps,Gastropub,43.648476,-79.382058
4,Maman,Café,43.648309,-79.382253


In [283]:
def getNearbyVenues(Borough, Neighborhood, latitudes, longitudes, radius=500):
    venues_list=[]
    for Borough, Neighborhood, lat, lng in zip(Borough, Neighborhood, latitudes, longitudes):
        print(Neighborhood,end='\r')
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET,VERSION,lat,lng,radius,LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            Borough, Neighborhood, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough','Neighborhood','Neighborhood Latitude','Neighborhood Longitude',
                             'Venue','Venue Latitude','Venue Longitude', 'Venue Category']
    return(nearby_venues)

In [208]:
toronto_venues = getNearbyVenues(dbto_gr_joined2.Borough,dbto_gr_joined2.Neighbourhood,
                                   dbto_gr_joined2.Latitude,
                                   dbto_gr_joined2.Longitude)

Roselawn
Davisville North
Forest Hill North, Forest Hill West
North Toronto West
The Annex, North Midtown, Yorkville
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Lawrence Park
Chinatown, Grange Park, Kensington Market
Church and Wellesley
Harbord, University of Toronto
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Cabbagetown, St. James Town
Harbourfront, Regent Park
Ryerson, Garden District
Central Bay Street
Christie
Harbourfront East, Toronto Islands, Union Station
The Beaches
Studio District
The Danforth West, Riverdale
The Beaches West, India Bazaar
Little Portugal, Trinity
Parkdale, Roncesvalles
High Park, The Junction South
Brockton, Exhibition Place, Parkdale Village
Dovercourt Village, Dufferin
Runnymede, Swansea
Stn A PO Boxes 25 The Esplanade, First Canadian Place, Underground city, Commerce Court, Victoria Hotel, St. James Town, Design Exchange, 

In [209]:
print(toronto_venues.shape)
toronto_venues.head()

(1027, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Davisville North,43.7135,-79.3887,Sherwood Park,43.716551,-79.387776,Park
1,Davisville North,43.7135,-79.3887,Summerhill Market North,43.715499,-79.392881,Food & Drink Shop
2,Davisville North,43.7135,-79.3887,Homeway Restaurant & Brunch,43.712641,-79.391557,Breakfast Spot
3,Davisville North,43.7135,-79.3887,Dogs Off-Leash Area,43.716589,-79.384246,Dog Run
4,"Forest Hill North, Forest Hill West",43.6966,-79.412,Kay Gardner Beltline Trail,43.700726,-79.410101,Trail


In [219]:
# count venues per Neighborhood
toronto_venues.groupby('Neighborhood').count()[['Venue']].sort_values(by='Venue')

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
"High Park, The Junction South",1
Lawrence Park,2
"Forest Hill North, Forest Hill West",3
Davisville North,4
"Harbourfront East, Toronto Islands, Union Station",4
North Toronto West,4
"Moore Park, Summerhill East",5
The Beaches,5
Rosedale,6
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",6


In [215]:
# unique venue categories
len(toronto_venues['Venue Category'].unique())

211

In [229]:
toronto_venues2 = toronto_venues.copy()
toronto_venues2['Borough'] = [dbto_gr_joined2.Borough[dbto_gr_joined2.Neighbourhood==x].values[0] for x in toronto_venues.Neighborhood]

In [234]:
toronto_venues2.groupby(['Neighborhood','Borough']).count()[['Venue']].sort_values(by='Venue')

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Neighborhood,Borough,Unnamed: 2_level_1
"High Park, The Junction South",West Toronto,1
Lawrence Park,Central Toronto,2
"Forest Hill North, Forest Hill West",Central Toronto,3
Davisville North,Central Toronto,4
"Harbourfront East, Toronto Islands, Union Station",Downtown Toronto,4
North Toronto West,Central Toronto,4
"Moore Park, Summerhill East",Central Toronto,5
The Beaches,East Toronto,5
Rosedale,Downtown Toronto,6
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",Central Toronto,6


In [284]:
toronto_venues3 = getNearbyVenues(ToNeighLL.Borough,ToNeighLL.Neighbourhood,
                                   ToNeighLL.Latitude,
                                   ToNeighLL.Longitude)

Dovercourt Village, Dufferinrth, Wilson Heights Burnhamthorpeeane Park

KeyboardInterrupt: 

In [246]:
toronto_venues4 = toronto_venues3.groupby(['Neighborhood','Borough']).count()[['Venue']].sort_values(by='Venue')

In [280]:
compare = list(toronto_venues4[['Venue']][toronto_venues4.Venue>20].reset_index()['Neighborhood'].values)
ref = ToNeighLL.Neighbourhood.values
bool_ind = [x in compare for x in ref]
ToNeighLL[bool_ind]


In [282]:
df=ToNeighLL[bool_ind]
# Toronto map
toto_map = folium.Map(location=[float(TO['lat']), float(TO['lon'])], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(df['Latitude'], df['Longitude'], 
                                           df['Borough'], df['Neighbourhood'],
                                            df['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
toto_map

In [78]:
ToNeighLL_br = ToNeighLL.groupby('Borough').mean().reset_index()

In [80]:
# Toronto map
toto_map = folium.Map(location=[float(TO['lat']), float(TO['lon'])], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood, postcode in zip(ToNeighLL['Latitude'], ToNeighLL['Longitude'], 
                                           ToNeighLL['Borough'], ToNeighLL['Neighbourhood'], ToNeighLL['PostalCode']):
    label = '{}, {}, {}'.format(neighborhood, borough, postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
# add markers to map
for lat, lng, borough in zip(ToNeighLL_br['Latitude'], ToNeighLL_br['Longitude'],ToNeighLL_br['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(  # Circle makes a difined size circle, CircleMarker makes a circle of the same size in all zooms
        [lat, lng],
        radius=10,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ed2a2d',
        fill_opacity=0.7,
        parse_html=False).add_to(toto_map)  
    

    
toto_map