# <p style='text-align: center;'> Segmenting and Clustering Neighborhoods in Toronto </p> 

## Question 1 Creation of the data frame

Importing the libraries

In [1]:
import pandas as pd 
import requests 
!conda install -c conda lxml

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wiki_url = requests.get(url)
wiki_url

<Response [200]>

Connection established

In [3]:
wiki_data = pd.read_html(wiki_url.text)
wiki_data

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z      Not assigned   
 176         M6Z      Not assigned   
 177         M7Z      Not assigned   
 178         M8Z         Etobicoke   
 179         M9Z      Not assigned   
 
                                          Neighbourhood  
 0                                         Not assigned  
 1                                         Not assigned  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                       Not assigned  
 176                                       Not assigned  
 177                

In [4]:
len(wiki_data), type(wiki_data)

(3, list)

Taking the table alone from the webpage

In [5]:
wiki_data = wiki_data[0]
wiki_data

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


Removing Not assigned values from Borough

In [6]:
df = wiki_data[wiki_data["Borough"] != 'Not assigned']
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Grouping the data by the postal code

In [7]:
df = df.groupby(['Postal Code']).head()
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Checking for Not assigned values in Neighbourhood column

In [8]:
df.Neighbourhood.str.count("Not assigned").sum()

0

In [9]:
df = df.reset_index()
df

Unnamed: 0,index,Postal Code,Borough,Neighbourhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...,...
98,160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,165,M4Y,Downtown Toronto,Church and Wellesley
100,168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [10]:
df.drop(['index'], axis = 'columns', inplace = True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Shape of the dataframe

In [11]:
df.shape

(103, 3)

## Question 2 Adding location data

Installing geocoder

In [12]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [13]:
import geocoder # import geocoder

I was not able to get the geographical coordinates using this code

Using the CSV file now to obtain the coordinates

In [14]:
coor = pd.read_csv("https://cocl.us/Geospatial_data")
coor

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [15]:
print('Shape of dataframe wiki data is', df.shape)
print('Shape of dataframe wiki data is', coor.shape)

Shape of dataframe wiki data is (103, 3)
Shape of dataframe wiki data is (103, 3)


In [16]:
coor.dtypes

Postal Code     object
Latitude       float64
Longitude      float64
dtype: object

In [17]:
df.dtypes

Postal Code      object
Borough          object
Neighbourhood    object
dtype: object

We checked the dtypes and shapes to identify if the dataframes are of the same dimensions so that we can join them

In [18]:
data = df.join(coor.set_index('Postal Code'), on='Postal Code', how = 'inner')
data

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [19]:
data.shape

(103, 5)

The combined dataframe is obtained

## Question 3 

Exploring and clustering the neighborhoods in Toronto. Using the New York cluster we obtain inspiration, similar to that
we use KMeans and Foursquare API

In [20]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [21]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 43.6534817, -79.3839347.


Latitude and Longitude of Toronto obtained

In [22]:
import folium

In [23]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Map of Toronto with markers

Initializing FourSquare API credentials

In [24]:
CLIENT_ID = 'CWSCXUF1DOJL1BCTXTIM4LSMTZBQHDMXK2HTYOWQ0YXYNNTH' # your Foursquare ID
CLIENT_SECRET = 'BK4KX4153GRSTRQJIL24WMQZAOOSCI0KADA2EL5LRDM4LVO5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CWSCXUF1DOJL1BCTXTIM4LSMTZBQHDMXK2HTYOWQ0YXYNNTH
CLIENT_SECRET:BK4KX4153GRSTRQJIL24WMQZAOOSCI0KADA2EL5LRDM4LVO5


Creating a function to get all venue categories in Toronto

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Collecting venues in Toronto in each neighbourhood

In [26]:
venues_in_toronto = getNearbyVenues(data['Neighbourhood'], data['Latitude'], data['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [27]:
venues_in_toronto.shape

(1316, 5)

Checking the shape, thus we find we have 1334 venues

In [28]:
venues_in_toronto.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant


Checking a sample of the data

In [29]:
venues_in_toronto.groupby('Neighbourhood').head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant
...,...,...,...,...,...
1304,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Wingporium,Wings Joint
1305,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,South St. Burger,Burger Joint
1306,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Dollarama,Discount Store
1307,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Healthy Planet,Supplement Shop


Grouping by neighbourhoods

In [30]:
venues_in_toronto.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Ardene Shoes Outlet
Airport,Downsview,43.737473,-79.394420,Toronto Downsview Airport (YZD)
Airport Food Court,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café
Airport Gate,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8
Airport Lounge,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Porter Lounge
...,...,...,...,...
Warehouse Store,Thorncliffe Park,43.705369,-79.349372,Costco
Wine Bar,Studio District,43.659526,-79.340923,Paris Paris Bar
Wings Joint,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Wingporium
Women's Store,Caledonia-Fairbanks,43.689026,-79.453512,Maximum Woman


Max number of venues grouped by its venue category

### One Hot Encoding

In [31]:
toronto_onehot = pd.get_dummies(venues_in_toronto[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1311,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1312,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1313,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1314,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
toronto_onehot.shape

(1316, 235)

Adding neighbourhood to one hot 

In [33]:
toronto_onehot['Neighbourhood'] = venues_in_toronto['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Grouping by Neighbourhood

In [34]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


A function to return most common venues

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
import numpy as np

Taking top 10 cluster the neighborhoods of venue categories

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Breakfast Spot,Latin American Restaurant,Clothing Store,Yoga Studio,Dim Sum Restaurant,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
1,"Alderwood, Long Branch",Pizza Place,Gym,Sandwich Place,Coffee Shop,Skating Rink,Pub,Distribution Center,Dim Sum Restaurant,Diner,Discount Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Pharmacy,Mobile Phone Shop,Bridal Shop,Diner,Sandwich Place,Deli / Bodega,Restaurant,Supermarket
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Falafel Restaurant,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Italian Restaurant,Coffee Shop,Pizza Place,Thai Restaurant,Indian Restaurant,Pub,Sushi Restaurant,Japanese Restaurant,Restaurant


Lets make a model to cluster 

In [38]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [39]:
# set number of clusters
k_num_clusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(toronto_grouped_clustering)
kmeans

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

Checking labels

In [40]:
kmeans.labels_[0:100]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 2, 1, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 1, 0, 0, 0, 1,
       4, 4, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 1])

In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Park,Food & Drink Shop,Yoga Studio,Dessert Shop,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Pizza Place,Hockey Arena,French Restaurant,Coffee Shop,Portuguese Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Donut Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Park,Café,Breakfast Spot,Theater,Bakery,French Restaurant,Performing Arts Venue,Chocolate Shop,Pub
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Shoe Store,Event Space,Athletics & Sports,Boutique,Vietnamese Restaurant,Dim Sum Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Yoga Studio,Park,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant


In [42]:
toronto_merged_nonan = toronto_merged.dropna(subset=['Cluster Labels'])

In [43]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [44]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_nonan['Latitude'], toronto_merged_nonan['Longitude'], toronto_merged_nonan['Neighbourhood'], toronto_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

The clusters are plotted on the map

Lets verify each cluster

Cluster 1 

In [45]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 0, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Pizza Place,Hockey Arena,French Restaurant,Coffee Shop,Portuguese Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Donut Shop
2,Downtown Toronto,0.0,Coffee Shop,Park,Café,Breakfast Spot,Theater,Bakery,French Restaurant,Performing Arts Venue,Chocolate Shop,Pub
3,North York,0.0,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Shoe Store,Event Space,Athletics & Sports,Boutique,Vietnamese Restaurant,Dim Sum Restaurant
4,Downtown Toronto,0.0,Coffee Shop,Yoga Studio,Park,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
7,North York,0.0,Gym,Japanese Restaurant,Restaurant,Coffee Shop,Beer Store,Dim Sum Restaurant,Sporting Goods Shop,Discount Store,Café,Caribbean Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,0.0,Café,Coffee Shop,Restaurant,Seafood Restaurant,Pizza Place,Steakhouse,Speakeasy,Pub,Japanese Restaurant,Bakery
98,Etobicoke,0.0,River,Smoke Shop,Park,Pool,Dance Studio,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
99,Downtown Toronto,0.0,Coffee Shop,Steakhouse,Bookstore,Breakfast Spot,Bubble Tea Shop,Salon / Barbershop,Restaurant,Ramen Restaurant,Pub,Café
100,East Toronto,0.0,Light Rail Station,Yoga Studio,Auto Workshop,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Spa


Cluster 2

In [46]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 1, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1.0,Park,Food & Drink Shop,Yoga Studio,Dessert Shop,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
21,York,1.0,Park,Women's Store,Pool,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
32,Scarborough,1.0,Playground,Jewelry Store,Yoga Studio,Dessert Shop,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
35,East York,1.0,Park,Coffee Shop,Convenience Store,Yoga Studio,Dessert Shop,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
52,North York,1.0,Piano Bar,Park,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
61,Central Toronto,1.0,Park,Swim School,Bus Line,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
64,York,1.0,Park,Jewelry Store,Yoga Studio,Dessert Shop,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
66,North York,1.0,Park,Convenience Store,Yoga Studio,Dessert Shop,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
68,Central Toronto,1.0,Park,Jewelry Store,Trail,Sushi Restaurant,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
83,Central Toronto,1.0,Playground,Park,Summer Camp,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop


Cluster 3

In [47]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 2, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,2.0,Fast Food Restaurant,Print Shop,Department Store,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run


Cluster 4

In [48]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 3, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,3.0,Baseball Field,Furniture / Home Store,Yoga Studio,Farmers Market,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
101,Etobicoke,3.0,Baseball Field,Yoga Studio,Fast Food Restaurant,Falafel Restaurant,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop


Cluster 5

In [49]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 4, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,4.0,Bar,Home Service,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
62,Central Toronto,4.0,Garden,Home Service,Department Store,Event Space,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run


All clusters have been verified

# Lets do the same for London cluster

### Data collection
Clustering London

In [50]:
url_london = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
wiki_london_url = requests.get(url_london)
wiki_london_url

<Response [200]>

Established connection

In [51]:
wiki_london_data = pd.read_html(wiki_london_url.text)
wiki_london_data
wiki_london_data = wiki_london_data[1]
wiki_london_data

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
528,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
529,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
530,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
531,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


Scraping the webpage and creating the dataframe

Data preprocessing

In [52]:
wiki_london_data.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)
wiki_london_data

Unnamed: 0,Location,London borough,Post_town,Postcode district,Dial code,OS_grid_ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
528,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
529,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
530,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
531,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


Feature selection and renaming 

In [53]:
df1 = wiki_london_data.drop( [ wiki_london_data.columns[0], wiki_london_data.columns[4], wiki_london_data.columns[5] ], axis=1)
df1.columns = ['borough','town','post_code']
df1

Unnamed: 0,borough,town,post_code
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...
528,Greenwich,LONDON,SE18
529,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
530,Hammersmith and Fulham,LONDON,W12
531,Hillingdon,HAYES,UB4



Let's remove the Square brackets [ ] and numbers from the borough column

In [54]:
df1['borough'] = df1['borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df1

Unnamed: 0,borough,town,post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Croydon,CROYDON,CR0
3,Croydon,CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...
528,Greenwich,LONDON,SE18
529,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
530,Hammersmith and Fulham,LONDON,W12
531,Hillingdon,HAYES,UB4



We can only focusing on the neighbourhoods of London, so performing the changes

In [55]:
df1 = df1[df1['town'].str.contains('LONDON')]
df1

Unnamed: 0,borough,town,post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,City,LONDON,EC3
7,Westminster,LONDON,WC2
9,Bromley,LONDON,SE20
...,...,...,...
523,Redbridge,LONDON,"IG8, E18"
524,"Redbridge, Waltham Forest","LONDON, WOODFORD GREEN",IG8
527,Barnet,LONDON,N12
528,Greenwich,LONDON,SE18


Checking the shape

In [56]:
df1.shape

(310, 3)

In [57]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 310 entries, 0 to 530
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   borough    310 non-null    object
 1   town       310 non-null    object
 2   post_code  310 non-null    object
dtypes: object(3)
memory usage: 9.7+ KB


## Geolocations of the London Neighbourhoods
## ArcGis API
We need to get the geographical co-ordinates for the neighbourhoods to plot out map. We will use the arcgis package to do so.

Arcgis doesn't have a limitation on the number of API calls made so it fits our use case perfectly.

In [58]:
pip install arcgis

Note: you may need to restart the kernel to use updated packages.


In [59]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

In [60]:
def get_x_y_uk(address1):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, London, England, GBR'.format(address1))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return str(lat_coords) +","+ str(lng_coords)

In [61]:
geo_coordinates_uk = df1['post_code']    
geo_coordinates_uk

0           SE2
1        W3, W4
6           EC3
7           WC2
9          SE20
         ...   
523    IG8, E18
524         IG8
527         N12
528        SE18
530         W12
Name: post_code, Length: 310, dtype: object

Passing the postal codes to get geo coordinates

In [62]:
coordinates_latlng_uk = geo_coordinates_uk.apply(lambda x: get_x_y_uk(x))
coordinates_latlng_uk


0       51.492450000000076,0.12127000000003818
1        51.51324000000005,-0.2674599999999714
6       51.51200000000006,-0.08057999999994081
7       51.51651000000004,-0.11967999999995982
9       51.41009000000008,-0.05682999999993399
                        ...                   
523    51.589770000000044,0.030520000000024083
524      51.50642000000005,-0.1272099999999341
527     51.615920000000074,-0.1767399999999384
528      51.48207000000008,0.07143000000002075
530      51.50645000000003,-0.2369099999999662
Name: post_code, Length: 310, dtype: object

Latitude and Longitude info are obtained

In [63]:
lat_uk = coordinates_latlng_uk.apply(lambda x: x.split(',')[0])
lat_uk
lng_uk = coordinates_latlng_uk.apply(lambda x: x.split(',')[1])
lng_uk

0       0.12127000000003818
1       -0.2674599999999714
6      -0.08057999999994081
7      -0.11967999999995982
9      -0.05682999999993399
               ...         
523    0.030520000000024083
524     -0.1272099999999341
527     -0.1767399999999384
528     0.07143000000002075
530     -0.2369099999999662
Name: post_code, Length: 310, dtype: object

We now have the geographical co-ordinates of the London Neighbourhoods.

We proceed with Merging our source data with the geographical co-ordinates to make our dataset ready for the next stage

In [64]:
london_merged = pd.concat([df1,lat_uk.astype(float), lng_uk.astype(float)], axis=1)
london_merged.columns= ['borough','town','post_code','latitude','longitude']
london_merged

Unnamed: 0,borough,town,post_code,latitude,longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
6,City,LONDON,EC3,51.51200,-0.08058
7,Westminster,LONDON,WC2,51.51651,-0.11968
9,Bromley,LONDON,SE20,51.41009,-0.05683
...,...,...,...,...,...
523,Redbridge,LONDON,"IG8, E18",51.58977,0.03052
524,"Redbridge, Waltham Forest","LONDON, WOODFORD GREEN",IG8,51.50642,-0.12721
527,Barnet,LONDON,N12,51.61592,-0.17674
528,Greenwich,LONDON,SE18,51.48207,0.07143



Co-ordinates for London
Getting the geocode for London to help visualize it on the map

In [65]:
london = geocode(address='London, England, GBR')[0]
london_lng_coords = london['location']['x']
london_lat_coords = london['location']['y']
print (london_lng_coords, london_lat_coords)

-0.1272099999999341 51.50642000000005


Creating a map of london

In [66]:
# Creating the map of London
map_London = folium.Map(location=[london_lat_coords, london_lng_coords], zoom_start=12)
map_London

# adding markers to map
for latitude, longitude, borough, town in zip(london_merged['latitude'], london_merged['longitude'], london_merged['borough'], london_merged['town']):
    label = '{}, {}'.format(town, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_London)  
    
map_London

Venues in London¶
To proceed with the next part, we need to define Foursquare API credentials.

Using Foursquare API, we are able to get the venue and venue categories around each neighbourhood in London.

In [67]:
CLIENT_ID = 'CWSCXUF1DOJL1BCTXTIM4LSMTZBQHDMXK2HTYOWQ0YXYNNTH' # your Foursquare ID
CLIENT_SECRET = 'BK4KX4153GRSTRQJIL24WMQZAOOSCI0KADA2EL5LRDM4LVO5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Your credentails:
CLIENT_ID: CWSCXUF1DOJL1BCTXTIM4LSMTZBQHDMXK2HTYOWQ0YXYNNTH
CLIENT_SECRET:BK4KX4153GRSTRQJIL24WMQZAOOSCI0KADA2EL5LRDM4LVO5


Getting the venues in London

In [68]:
venues_in_London = getNearbyVenues(london_merged['borough'], london_merged['latitude'], london_merged['longitude'])

Bexley, Greenwich 
Ealing, Hammersmith and Fulham
City
Westminster
Bromley
Islington
Islington
Barnet
Enfield
Wandsworth
Southwark
City
Richmond upon Thames
Barnet
Islington
Wandsworth
Westminster
Bromley
Newham
Ealing
Westminster
Lewisham
Camden
Southwark
Tower Hamlets
Bexley
City
Lewisham
Greenwich
Tower Hamlets
Camden
Haringey
Tower Hamlets
Haringey
Barnet
Brent
Lambeth
Lewisham
Tower Hamlets
Kensington and ChelseaHammersmith and Fulham
Brent
Barnet
Barnet
Southwark
Tower Hamlets
Camden
Tower Hamlets
Waltham Forest
Newham
Islington
Richmond upon Thames
Lewisham
Camden
Westminster
Greenwich
Kensington and Chelsea
Barnet
Westminster
Lewisham
Waltham Forest
Hounslow, Ealing, Hammersmith and Fulham
Brent
Barnet
Lambeth, Wandsworth
Islington
Barnet
Merton
Barnet
Westminster
Barnet, Brent, Camden
Lewisham
Bexley
Haringey
Bromley
Tower Hamlets
Newham
Hackney
Dartford
Islington
Southwark
Lewisham
Brent
Southwark
Ealing
Kensington and Chelsea
Wandsworth
Southwark
Barnet
Newham
Richmond upon 

Sampling data

In [69]:
venues_in_London.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Bexley, Greenwich",51.49245,0.12127,Sainsbury's,Supermarket
1,"Bexley, Greenwich",51.49245,0.12127,Lesnes Abbey,Historic Site
2,"Bexley, Greenwich",51.49245,0.12127,Lidl,Supermarket
3,"Bexley, Greenwich",51.49245,0.12127,Abbey Wood Railway Station (ABW),Train Station
4,"Bexley, Greenwich",51.49245,0.12127,Bean @ Work,Coffee Shop


Checking shape

In [70]:
venues_in_London.shape

(6756, 5)

Grouping by venue category

In [71]:
venues_in_London.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
African Restaurant,Westminster,51.52587,-0.08808,Red Sea Restaurant
American Restaurant,Waltham Forest,51.63261,0.02912,Spielburger
Antique Shop,Kensington and Chelsea,51.51244,-0.20639,Alice's
Arepa Restaurant,Tower Hamlets,51.52669,-0.06257,Arepa & Co
Argentinian Restaurant,Wandsworth,51.61568,-0.09568,The Argentine Grill
...,...,...,...,...
Wings Joint,Camden and Islington,51.54187,-0.12273,Wingmans
Women's Store,Kensington and ChelseaHammersmith and Fulham,51.55457,-0.11478,Vivien of Holloway
Xinjiang Restaurant,Southwark,51.47480,-0.09313,Silk Road
Yoga Studio,Westminster,51.55457,-0.06257,yogahaven


## One Hot encoding

In [72]:
London_venue_cat = pd.get_dummies(venues_in_London[['Venue Category']], prefix="", prefix_sep="")
London_venue_cat

Unnamed: 0,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6751,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6752,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6753,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6754,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [73]:
London_venue_cat['Neighbourhood'] = venues_in_London['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [London_venue_cat.columns[-1]] + list(London_venue_cat.columns[:-1])
London_venue_cat = London_venue_cat[fixed_columns]

London_venue_cat.head()

Unnamed: 0,Neighbourhood,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Venue categories mean value     
We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood

In [74]:
London_grouped = London_venue_cat.groupby('Neighbourhood').mean().reset_index()
London_grouped.head()

Unnamed: 0,Neighbourhood,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barnet,0.0,0.00789,0.0,0.0,0.00789,0.0,0.0,0.0,0.021696,...,0.0,0.001972,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Barnet, Brent, Camden",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Returning common venues

In [76]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

Top venue categories        
Getting the top venue categories in London

In [77]:
# create a new dataframe for London
neighborhoods_venues_sorted_london = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_london['Neighbourhood'] = London_grouped['Neighbourhood']

for ind in np.arange(London_grouped.shape[0]):
    neighborhoods_venues_sorted_london.iloc[ind, 1:] = return_most_common_venues(London_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_london.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barnet,Coffee Shop,Café,Grocery Store,Pub,Supermarket,Italian Restaurant,Pharmacy,Bus Stop,Turkish Restaurant,Indian Restaurant
1,"Barnet, Brent, Camden",Gym / Fitness Center,Convenience Store,Clothing Store,Supermarket,Bus Station,Fountain,Food Truck,French Restaurant,Food Service,Fast Food Restaurant
2,Bexley,Supermarket,Historic Site,Train Station,Platform,Coffee Shop,Convenience Store,Park,Bus Stop,Construction & Landscaping,Golf Course
3,"Bexley, Greenwich",Park,Sports Club,Bus Stop,Golf Course,Convenience Store,Construction & Landscaping,Historic Site,Flower Shop,Fish & Chips Shop,Fishing Store
4,"Bexley, Greenwich",Supermarket,Historic Site,Coffee Shop,Platform,Convenience Store,Train Station,Flower Shop,Fast Food Restaurant,Fish & Chips Shop,Fishing Store


KMeans clustering

In [78]:
# set number of clusters
k_num_clusters = 5

London_grouped_clustering = London_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=k_num_clusters, random_state=0).fit(London_grouped_clustering)
kmeans_london

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

In [79]:
kmeans_london.labels_[0:100]

array([3, 1, 2, 2, 2, 4, 3, 4, 0, 2, 3, 0, 0, 0, 3, 0, 3, 4, 3, 3, 3, 3,
       3, 3, 3, 3, 0, 3, 3, 3, 0, 3, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       0, 3, 3, 3, 3, 3, 0])

In [80]:
neighborhoods_venues_sorted_london.insert(0, 'Cluster Labels', kmeans_london.labels_ +1)

In [81]:
london_data = london_merged
london_data = london_data.join(neighborhoods_venues_sorted_london.set_index('Neighbourhood'), on='borough')
london_data.head()

Unnamed: 0,borough,town,post_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127,3,Supermarket,Historic Site,Coffee Shop,Platform,Convenience Store,Train Station,Flower Shop,Fast Food Restaurant,Fish & Chips Shop,Fishing Store
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746,5,Grocery Store,Indian Restaurant,Park,Breakfast Spot,Train Station,Flower Shop,Fast Food Restaurant,Fish & Chips Shop,Fishing Store,Flea Market
6,City,LONDON,EC3,51.512,-0.08058,1,Coffee Shop,Hotel,Falafel Restaurant,Vietnamese Restaurant,Gym / Fitness Center,Food Truck,Wine Bar,Beer Bar,Pub,Cocktail Bar
7,Westminster,LONDON,WC2,51.51651,-0.11968,1,Coffee Shop,Hotel,Theater,Pub,Restaurant,French Restaurant,Café,Sandwich Place,Juice Bar,Sporting Goods Shop
9,Bromley,LONDON,SE20,51.41009,-0.05683,3,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Bus Stop,Café,Gastropub,Bistro


In [82]:

london_data_nonan = london_data.dropna(subset=['Cluster Labels'])

In [83]:
map_clusters_london = folium.Map(location=[london_lat_coords, london_lng_coords], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_data_nonan['latitude'], london_data_nonan['longitude'], london_data_nonan['borough'], london_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters_london)
        
map_clusters_london

In [84]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 1, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,LONDON,1,Coffee Shop,Hotel,Falafel Restaurant,Vietnamese Restaurant,Gym / Fitness Center,Food Truck,Wine Bar,Beer Bar,Pub,Cocktail Bar
7,LONDON,1,Coffee Shop,Hotel,Theater,Pub,Restaurant,French Restaurant,Café,Sandwich Place,Juice Bar,Sporting Goods Shop
18,LONDON,1,Coffee Shop,Hotel,Falafel Restaurant,Vietnamese Restaurant,Gym / Fitness Center,Food Truck,Wine Bar,Beer Bar,Pub,Cocktail Bar
28,LONDON,1,Coffee Shop,Hotel,Theater,Pub,Restaurant,French Restaurant,Café,Sandwich Place,Juice Bar,Sporting Goods Shop
35,LONDON,1,Coffee Shop,Hotel,Theater,Pub,Restaurant,French Restaurant,Café,Sandwich Place,Juice Bar,Sporting Goods Shop
49,LONDON,1,Coffee Shop,Hotel,Falafel Restaurant,Vietnamese Restaurant,Gym / Fitness Center,Food Truck,Wine Bar,Beer Bar,Pub,Cocktail Bar
68,LONDON,1,Bookstore,Ice Cream Shop,Women's Store,Japanese Restaurant,Bakery,Restaurant,Plaza,Boxing Gym,Clothing Store,Coffee Shop
87,LONDON,1,Coffee Shop,Hotel,Theater,Pub,Restaurant,French Restaurant,Café,Sandwich Place,Juice Bar,Sporting Goods Shop
91,LONDON,1,Italian Restaurant,Bookstore,Restaurant,Exhibit,Café,Cocktail Bar,Bakery,Science Museum,Clothing Store,Pub
95,LONDON,1,Coffee Shop,Hotel,Theater,Pub,Restaurant,French Restaurant,Café,Sandwich Place,Juice Bar,Sporting Goods Shop


In [85]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 2, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
121,LONDON,2,Gym / Fitness Center,Convenience Store,Clothing Store,Supermarket,Bus Station,Fountain,Food Truck,French Restaurant,Food Service,Fast Food Restaurant


In [86]:

london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 3, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,LONDON,3,Supermarket,Historic Site,Coffee Shop,Platform,Convenience Store,Train Station,Flower Shop,Fast Food Restaurant,Fish & Chips Shop,Fishing Store
9,LONDON,3,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Bus Stop,Café,Gastropub,Bistro
29,"BECKENHAM, LONDON",3,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Bus Stop,Café,Gastropub,Bistro
45,"BEXLEYHEATH, LONDON",3,Supermarket,Historic Site,Train Station,Platform,Coffee Shop,Convenience Store,Park,Bus Stop,Construction & Landscaping,Golf Course
124,LONDON,3,Supermarket,Historic Site,Train Station,Platform,Coffee Shop,Convenience Store,Park,Bus Stop,Construction & Landscaping,Golf Course
127,LONDON,3,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Bus Stop,Café,Gastropub,Bistro
168,"LONDON, WELLING",3,Park,Sports Club,Bus Stop,Golf Course,Convenience Store,Construction & Landscaping,Historic Site,Flower Shop,Fish & Chips Shop,Fishing Store
292,"LONDON, SIDCUP",3,Supermarket,Historic Site,Train Station,Platform,Coffee Shop,Convenience Store,Park,Bus Stop,Construction & Landscaping,Golf Course
317,LONDON,3,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Bus Stop,Café,Gastropub,Bistro
361,LONDON,3,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Bus Stop,Café,Gastropub,Bistro


In [87]:

london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 4, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,LONDON,4,Coffee Shop,Pub,Café,Vietnamese Restaurant,Cocktail Bar,Food Truck,Burger Joint,Grocery Store,Bike Shop,Park
12,LONDON,4,Coffee Shop,Pub,Café,Vietnamese Restaurant,Cocktail Bar,Food Truck,Burger Joint,Grocery Store,Bike Shop,Park
14,"BARNET, LONDON",4,Coffee Shop,Café,Grocery Store,Pub,Supermarket,Italian Restaurant,Pharmacy,Bus Stop,Turkish Restaurant,Indian Restaurant
15,LONDON,4,Coffee Shop,Fast Food Restaurant,Italian Restaurant,Pizza Place,Supermarket,Gym / Fitness Center,Grocery Store,Turkish Restaurant,Pub,Sandwich Place
16,LONDON,4,Pub,Coffee Shop,Indian Restaurant,Bar,Portuguese Restaurant,Café,Pizza Place,Burger Joint,Gym / Fitness Center,Gym
...,...,...,...,...,...,...,...,...,...,...,...,...
522,LONDON,4,Café,Pub,Coffee Shop,Park,Grocery Store,French Restaurant,Food & Drink Shop,Train Station,Hardware Store,South American Restaurant
523,LONDON,4,Pub,Grocery Store,Coffee Shop,Café,Bar,Bridal Shop,BBQ Joint,Seafood Restaurant,Park,Pizza Place
527,LONDON,4,Coffee Shop,Café,Grocery Store,Pub,Supermarket,Italian Restaurant,Pharmacy,Bus Stop,Turkish Restaurant,Indian Restaurant
528,LONDON,4,Pub,Grocery Store,Indian Restaurant,Bus Stop,Coffee Shop,Construction & Landscaping,Turkish Restaurant,Gym / Fitness Center,Golf Course,Middle Eastern Restaurant


In [88]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 5, london_data_nonan.columns[[1] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,LONDON,5,Grocery Store,Indian Restaurant,Park,Breakfast Spot,Train Station,Flower Shop,Fast Food Restaurant,Fish & Chips Shop,Fishing Store,Flea Market
61,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
69,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
100,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
138,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
218,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
261,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
270,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
320,LONDON,5,Grocery Store,Café,Sandwich Place,Warehouse Store,Indian Restaurant,Convenience Store,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pub
358,LONDON,5,Grocery Store,Sandwich Place,Fast Food Restaurant,Café,Pharmacy,Warehouse Store,Convenience Store,Chinese Restaurant,Zoo Exhibit,Fish & Chips Shop
