# Capstone Project - New British or Old British: Toronto vs London
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Toronto Analysis](#toronto)
* [London Analysis](#london)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction: Business Problem <a name="introduction"></a>

Two major cities in the world. To the one who lives in it and to the ones who wants to move in. Toronto and London are known as cities full of opportunities, full of immigrants and, of course, full of everything that is fun. Restaurants (huge variety of cuisines), bus stops, hotels, coffee shops, shopping malls, stores, culture, history, all the cool things that any human being want in its life.

Toronto and London are extremely popular tourist and immigration destinations for people all around the world. They are diverse and multicultural and offer a wide variety of experiences that is widely sought after. In this work we tried to group the neighbourhoods of London and Toronto to come up with insights to what they look like.

Help tourist and prospective immigrants is the main objective of this work. Depending on the experience and infrastructure that the neighborhoods have to offer, people can chose those two cities for vacation, to immigrate and, if they already live there, to see other neighborhoods that is equal or better to the one they lives and, perhaps, relocate within the city. All the finds will help stakeholders and clients of tourism/immigration’s companies to make informed decisions and address concerns they have, including any kind of infrastructure insights in the neighbourhoods.


# Data Description <a name="data"></a>

I used geolocation data for both Toronto and London. As a starting point, anyone can use the postal codes in each city to find out the neighbourhoods, boroughs, venues and their most popular venue categories.


## London

Data Scrapped from: https://en.wikipedia.org/wiki/List_of_areas_of_London

This wikipedia page has information about all the neighbourhoods and it was limited to London.

1. **Borough**: Name of Neighbourhood
2. **Town**: Name of borough
3. **Post_Code**: Postal codes for London.
4. **london_merge**: Latitude and longitude of the Neighbourhoods

This wikipedia didn’t have information about the geographical co-ordinates. To solve this issue I used ArcGIS API

### ArcGIS API

ArcGIS Online lets you to connect locations, people, and data by using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. 

I used ArcGIS to get the geo locations of the neighbourhoods of London. The columns below were added to the raw datasetama our data. 

1. **Latitude**: Latitude for Neighbourhood
2. **Longitude**: Longitude for Neighbourhood

## Toronto

Data Scrapped from: https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969

This wikipedia page has information about all the neighbourhoods and it was limited to Toronto.

1. **Postal_Code**: Postal codes for Toronto
2. **Neighbourhood**: Name of Neighbourhoods in Toronto
3. **Borough**: Name of the boroughs
4. **venues_toronto**: Latitude and longitude of the Neighbourhoods.


## Foursquare API Data

Data about different venues in different neighbourhoods of that specific borough is needed. In order get that information it was used the "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. This way, the foursquare location platform was used to sole data source, since all the stated required information was obtained through the API.

We connected to the Foursquare API to get information of venues inside each and all of the neighbourhoods, after finding the list of neighbourhoods. For each neighbourhood, we chose the radius of 500 meters.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

1. **Neighbourhood**: Name of the Neighbourhood
2. **Neighbourhood Latitude**: Latitude of the Neighbourhood
3. **Neighbourhood Longitude**: Longitude of the Neighbourhood
4. **Venue** : Name of the Venue
5. **Venue Latitude**: Latitude of Venue
6. **Venue Longitude**: Longitude of Venue
7. **Venue Category**: Category of Venue


After collect all the information about Toronto and London, I had enough data to build the model. The neighbourhoods was clustered together, based on similar venue categories. Then, the observations and findings were presented. With all this data, our stakeholders can make decisions and a lot of useful visualization documents can be made to show to the clients.

# Methodology <a name="methodology"></a>

The model will be created using Python. So we need to import all the packages.

In this project we will direct our efforts on detecting neighborhoods of Toronto and London that have good infrastructure for tourists and immigrants, such as: restaurantes, coffee shops, stores, squares, shopping malls, etc.

We will cluster the neighborhoods of both cities to see which neighborhoods has the biggest variety fun places to people who is looking for vacation and to support people whos looking for immigration.

**Lets get it started!**

In [1]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# import k-means for the clustering stage
from sklearn.cluster import KMeans

# Toronto's Neighbourhoods <a name="toronto"></a>

### Extracting Data
We begin to start collecting and refining the data needed for the our business solution to work.
To get the neighbourhoods in Toronto, we start by scraping the list of areas of london wiki page.

In [2]:
wiki_url = "https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969"
wiki = requests.get(wiki_url)
wiki

<Response [200]>

In [3]:
# Just need the first table

wiki_data = pd.read_html(wiki.text)
wiki_data = wiki_data[0]
wiki_data

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### Data Preprocessing and Feature Engeneering

##### Drop borough that are "not assigned"

In [4]:
df = wiki_data[wiki_data["Borough"] != "Not assigned"]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


##### Grouping everything based on Postal Code

In [5]:
df = df.groupby(['Postal Code']).head()
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [6]:
df = df.reset_index()
df

Unnamed: 0,index,Postal Code,Borough,Neighbourhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...,...
98,160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,165,M4Y,Downtown Toronto,Church and Wellesley
100,168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


###### Looking for number of records where neighbourhood is "Not Assigned"

In [7]:
df.Neighbourhood.str.count("Not assigned").sum()

0

In [8]:
df.drop(['index'], axis = 'columns', inplace = True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [9]:
df.shape

(103, 3)

##  Toronto's Neighbourhoods Geolocations

In [10]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [11]:
import geocoder

In [12]:
geo_url = 'https://cocl.us/Geospatial_data'

In [13]:
geo_data = pd.read_csv(geo_url)
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
geo_data.shape

(103, 3)

##### Checking the column types of the dataframes, especially Postal Code column since we are trying to join on it

In [15]:
df.dtypes

Postal Code      object
Borough          object
Neighbourhood    object
dtype: object

In [16]:
geo_data.dtypes

Postal Code     object
Latitude       float64
Longitude      float64
dtype: object

In [17]:
df_join = df.join(geo_data.set_index('Postal Code'), on='Postal Code', how='inner')
df_join

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [18]:
df_join.shape

(103, 5)

### Geographical Co-ordinates and ploting

In [19]:
!conda install -c conda-forge geocoder --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\jnsju\anaconda3

  added / updated specs:
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.10.0               |   py38haa244fe_1         3.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.1 MB

The following packages will be UPDATED:

  conda                                4.9.2-py38haa244fe_0 --> 4.10.0-py38haa244fe_1



Downloading and Extracting Packages

conda-4.10.0         | 3.1 MB    |            |   0% 
conda-4.10.0         | 3.1 MB    |            |   1% 
conda-4.10.0         | 3.1 MB    | 4          |   4% 
conda-4.10.0         | 3.1 MB    | #1         |  11% 
conda-4.10.0         | 3.1 MB    | ##1 

In [20]:
conda install -c conda-forge geopy

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [21]:
conda install -c conda-forge/label/gcc7 folium

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


###### Clustering Toronto based on the similarities of the venues categories, using Kmeans clustering and Foursquare API.

In [22]:
import geocoder
from geopy.geocoders import Nominatim

In [23]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The coordinates of Toronto are 43.6534817, -79.3839347.


In [None]:
# Creating the map of London
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to map
for Latitude, Longitude, Borough, Town in zip(london_merge['Latitude'], london_merge['Longitude'], london_merge['Borough'], london_merge['Town']):
    label = '{}, {}'.format(Town, Borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=7,
        popup=label,
        color='blue',
        fill=True
        ).add_to(map_London)  
    
map_London

In [93]:
import folium

# Creating the map of Toronto
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to map
for latitude, longitude, borough, neighbourhood in zip(df_join['Latitude'], df_join['Longitude'], df_join['Borough'], df_join['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=4,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Toronto)  

In [94]:
map_Toronto

In [26]:
CLIENT_ID = 'WC4JTINZUHMN11ID1CB1WFZ0UAEN4F22MNISRYVFJ1NIHA2F' 
CLIENT_SECRET = 'WN5JHVJJ1MG3O2JAZGZWOAV5CCZN2Z2HTETHSFBBZATRA4EP'
VERSION = '20210321'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WC4JTINZUHMN11ID1CB1WFZ0UAEN4F22MNISRYVFJ1NIHA2F
CLIENT_SECRET:WN5JHVJJ1MG3O2JAZGZWOAV5CCZN2Z2HTETHSFBBZATRA4EP


###### Getting all venue categories in Toronto

In [27]:
def getNearbyVenues2(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # returning only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

###### Venues for each Neighbourhood

In [28]:
venues_toronto = getNearbyVenues2(df_join['Neighbourhood'], df_join['Latitude'], df_join['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [29]:
venues_toronto.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Brookbanks Pool,Pool
2,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant


In [30]:
venues_toronto.shape

(1335, 5)

##### Venues based on Neighbourhood

In [122]:
venues_toronto.groupby('Neighbourhood').head(10)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,Park
1,Parkwoods,43.753259,-79.329656,Brookbanks Pool,Pool
2,Parkwoods,43.753259,-79.329656,Variety Store,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,Portuguese Restaurant
...,...,...,...,...,...
1325,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Subway,Sandwich Place
1326,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,7-Eleven,Convenience Store
1327,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,McDonald's,Fast Food Restaurant
1328,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Jim & Maria's No Frills,Grocery Store


###### Maximum venue categories

In [32]:
venues_toronto.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Ardene Shoes Outlet
Airport,Downsview,43.737473,-79.394420,Toronto Downsview Airport (YZD)
Airport Food Court,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café
Airport Gate,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8
Airport Lounge,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Porter Lounge
...,...,...,...,...
Warehouse Store,Thorncliffe Park,43.705369,-79.349372,Costco
Wine Bar,"Little Portugal, Trinity",43.653206,-79.400049,Paris Paris Bar
Wings Joint,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Wingporium
Women's Store,Caledonia-Fairbanks,43.689026,-79.453512,Maximum Woman


In [33]:
# Minimum venue categories

venues_toronto.groupby('Venue Category').min()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Ardene Shoes Outlet
Airport,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.464763,Billy Bishop Toronto City Airport (YTZ) (Billy...
Airport Food Court,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café
Airport Gate,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8
Airport Lounge,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Crew Room
...,...,...,...,...
Warehouse Store,Thorncliffe Park,43.705369,-79.349372,Costco
Wine Bar,"Kensington Market, Chinatown, Grange Park",43.647927,-79.419750,Grey Gardens
Wings Joint,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,St. Louis Bar and Grill
Women's Store,Caledonia-Fairbanks,43.689026,-79.453512,Maximum Woman


In [34]:
venues_toronto.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Agincourt,4,4,4,4
"Alderwood, Long Branch",9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21
Bayview Village,4,4,4,4
"Bedford Park, Lawrence Manor East",24,24,24,24
...,...,...,...,...
"Willowdale, Willowdale East",30,30,30,30
"Willowdale, Willowdale West",5,5,5,5
Woburn,3,3,3,3
Woodbine Heights,7,7,7,7


###### How many categories we can find?

### One Hot encoding

In [127]:
toronto_onehot = pd.get_dummies(venues_toronto[['Venue Category']], prefix="", prefix_sep="")

# Adding neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = venues_toronto['Neighbourhood'] 

# Moving neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(20)

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
toronto_onehot.shape

(1335, 243)

##### Grouping Neighbourhoods and calculating the mean venue categories in each Neighbourhood

In [37]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head(10)

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.058824,0.058824,0.058824,0.117647,0.117647,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


###### Function to get top most common venue categories

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

##### Top 30 for each neighbourhood

In [135]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Breakfast Spot,Chinese Restaurant,Lounge,Yoga Studio,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore
1,"Alderwood, Long Branch",Pizza Place,Skating Rink,Coffee Shop,Pub,Dance Studio,Sandwich Place,Pharmacy,Gym,Gas Station,Dog Run
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Grocery Store,Supermarket,Bridal Shop,Shopping Mall,Sandwich Place,Restaurant,Pizza Place,Mobile Phone Shop
3,Bayview Village,Chinese Restaurant,Café,Japanese Restaurant,Bank,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Department Store
4,"Bedford Park, Lawrence Manor East",Sushi Restaurant,Coffee Shop,Sandwich Place,Italian Restaurant,Butcher,Restaurant,Café,Pub,Pizza Place,Grocery Store
5,Berczy Park,Seafood Restaurant,Beer Bar,Cocktail Bar,Farmers Market,Breakfast Spot,Bakery,Basketball Stadium,Japanese Restaurant,Jazz Club,Bistro
6,"Birch Cliff, Cliffside West",College Stadium,Farm,General Entertainment,Café,Skating Rink,Yoga Studio,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner
7,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Coffee Shop,Breakfast Spot,Grocery Store,Performing Arts Venue,Pet Store,Convenience Store,Climbing Gym,Restaurant
8,"Business reply mail Processing Centre, South C...",Light Rail Station,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Park
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Plane,Harbor / Marina,Coffee Shop,Rental Car Location,Sculpture Garden,Boutique,Bar


### Clustering Neighbourhoods by using K-Means

In [40]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100]

array([0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 1, 4, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 4, 2, 2, 2, 4,
       3, 2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 3, 2,
       4, 2, 4, 2, 2, 2, 2, 4])

In [41]:
# Adding Clustering Label Column to top 10 common venue categories

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# Merge toronto_grouped with df_join on neighbourhood to add latitude and longitude for each neighborhood and prepare it for plot

merge_toronto = df_join

merge_toronto = merge_toronto.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

merge_toronto.head(40)



Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Pool,Park,Food & Drink Shop,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,2.0,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Financial or Legal Service,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2.0,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,French Restaurant,Performing Arts Venue,Chocolate Shop,Restaurant
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2.0,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Gift Shop,Vietnamese Restaurant,Boutique,Discount Store,Dessert Shop,Dim Sum Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2.0,Coffee Shop,Sushi Restaurant,Gym,Fried Chicken Joint,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,,,,,,,,,,,
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,1.0,Fast Food Restaurant,Yoga Studio,Falafel Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
7,M3B,North York,Don Mills,43.745906,-79.352188,2.0,Gym,Coffee Shop,Restaurant,Supermarket,Italian Restaurant,Japanese Restaurant,Discount Store,Dim Sum Restaurant,Clothing Store,Chinese Restaurant
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,2.0,Pizza Place,Pharmacy,Flea Market,Bank,Breakfast Spot,Athletics & Sports,Intersection,Gastropub,Pet Store,Gym / Fitness Center
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2.0,Café,Ramen Restaurant,Clothing Store,Theater,Steakhouse,Shopping Mall,Tanning Salon,Hotel,Fast Food Restaurant,Burger Joint


In [42]:
toronto_merged_cleaned = merge_toronto.dropna(subset=['Cluster Labels'])
toronto_merged_cleaned

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Pool,Park,Food & Drink Shop,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,2.0,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Financial or Legal Service,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,2.0,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,French Restaurant,Performing Arts Venue,Chocolate Shop,Restaurant
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2.0,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Gift Shop,Vietnamese Restaurant,Boutique,Discount Store,Dessert Shop,Dim Sum Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2.0,Coffee Shop,Sushi Restaurant,Gym,Fried Chicken Joint,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,4.0,Park,River,Yoga Studio,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,2.0,Gay Bar,Indian Restaurant,Steakhouse,Escape Room,Beer Bar,Italian Restaurant,Japanese Restaurant,Bookstore,Breakfast Spot,Bubble Tea Shop
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,2.0,Light Rail Station,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Park
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,0.0,Breakfast Spot,Baseball Field,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop


In [43]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# Creating Map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Setting color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Adding markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_cleaned['Latitude'], toronto_merged_cleaned['Longitude'], toronto_merged_cleaned['Neighbourhood'], toronto_merged_cleaned['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.5).add_to(map_clusters)
       
map_clusters

### Verifying each cluster

In [44]:
# Cluster 1

toronto_merged_cleaned.loc[toronto_merged_cleaned['Cluster Labels'] == 0, toronto_merged_cleaned.columns[[1] + list(range(5, toronto_merged_cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,0.0,Playground,College Auditorium,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
57,North York,0.0,Baseball Field,Food Service,Yoga Studio,Department Store,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
63,Toronto/York,0.0,Grocery Store,Breakfast Spot,Caribbean Restaurant,Department Store,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
78,Scarborough,0.0,Latin American Restaurant,Breakfast Spot,Chinese Restaurant,Lounge,Yoga Studio,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore
101,Etobicoke,0.0,Breakfast Spot,Baseball Field,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop


In [45]:
# Verifying each cluster

# Cluster 2

toronto_merged_cleaned.loc[toronto_merged_cleaned['Cluster Labels'] == 1, toronto_merged_cleaned.columns[[1] + list(range(5, toronto_merged_cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,1.0,Fast Food Restaurant,Yoga Studio,Falafel Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center


In [46]:
# Verifying each cluster

# Cluster 3

toronto_merged_cleaned.loc[toronto_merged_cleaned['Cluster Labels'] == 2, toronto_merged_cleaned.columns[[1] + list(range(5, toronto_merged_cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,2.0,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Financial or Legal Service,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,Downtown Toronto,2.0,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,French Restaurant,Performing Arts Venue,Chocolate Shop,Restaurant
3,North York,2.0,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Gift Shop,Vietnamese Restaurant,Boutique,Discount Store,Dessert Shop,Dim Sum Restaurant
4,Downtown Toronto,2.0,Coffee Shop,Sushi Restaurant,Gym,Fried Chicken Joint,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café
7,North York,2.0,Gym,Coffee Shop,Restaurant,Supermarket,Italian Restaurant,Japanese Restaurant,Discount Store,Dim Sum Restaurant,Clothing Store,Chinese Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,2.0,Italian Restaurant,Café,Restaurant,Coffee Shop,Bakery,Pet Store,Gastropub,Beer Store,Butcher,Pub
97,Downtown Toronto,2.0,Café,Coffee Shop,Restaurant,Seafood Restaurant,Gastropub,Tea Room,Concert Hall,Pizza Place,Pub,Bookstore
99,Downtown Toronto,2.0,Gay Bar,Indian Restaurant,Steakhouse,Escape Room,Beer Bar,Italian Restaurant,Japanese Restaurant,Bookstore,Breakfast Spot,Bubble Tea Shop
100,East Toronto,2.0,Light Rail Station,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Park


In [47]:
# Verifying each cluster

# Cluster 4

toronto_merged_cleaned.loc[toronto_merged_cleaned['Cluster Labels'] == 3, toronto_merged_cleaned.columns[[1] + list(range(5, toronto_merged_cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,3.0,Home Service,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
62,Central Toronto,3.0,Garden,Home Service,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run


In [136]:
# Verifying each cluster

# Cluster 5

toronto_merged_cleaned.loc[toronto_merged_cleaned['Cluster Labels'] == 4, toronto_merged_cleaned.columns[[1] + list(range(5, toronto_merged_cleaned.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,4.0,Pool,Park,Food & Drink Shop,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
21,York,4.0,Park,Women's Store,Pool,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
35,East York,4.0,Pizza Place,Park,Convenience Store,Intersection,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
52,North York,4.0,Park,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
64,York,4.0,Park,Jewelry Store,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
66,North York,4.0,Park,Convenience Store,Yoga Studio,Deli / Bodega,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
68,Central Toronto,4.0,Sushi Restaurant,Park,Trail,Jewelry Store,Distribution Center,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Yoga Studio
85,Scarborough,4.0,Playground,Park,Intersection,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
91,Downtown Toronto,4.0,Park,Playground,Trail,Yoga Studio,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Distribution Center
98,Etobicoke,4.0,Park,River,Yoga Studio,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center


# London's Neighbourhoods <a name="london"></a>

### Extracting Data

To get the neighbourhoods in london, we start by scraping the list of areas of london wiki page.

In [49]:
london_url = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
wiki_london_url = requests.get(london_url)
wiki_london_url

<Response [200]>

##### To make a connection, the response should be 200. So, we are OK.

In [50]:
wiki_london_data = pd.read_html(wiki_london_url.text)
wiki_london_data

[                                                   0
 0  Map all coordinates in "Category:Areas of Lond...
 1                       Download coordinates as: KML,
             Location                     London borough       Post town  \
 0         Abbey Wood              Bexley, Greenwich [7]          LONDON   
 1              Acton  Ealing, Hammersmith and Fulham[8]          LONDON   
 2          Addington                         Croydon[8]         CROYDON   
 3         Addiscombe                         Croydon[8]         CROYDON   
 4        Albany Park                             Bexley  BEXLEY, SIDCUP   
 ..               ...                                ...             ...   
 526         Woolwich                          Greenwich          LONDON   
 527   Worcester Park       Sutton, Kingston upon Thames  WORCESTER PARK   
 528  Wormwood Scrubs             Hammersmith and Fulham          LONDON   
 529          Yeading                         Hillingdon           HAYES   
 

##### We just need the second column

In [51]:
wiki_london_data = wiki_london_data[1]
wiki_london_data

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


### Data Preprocessing

##### Now, in the column titles, we should remove the spaces and add "_" in the words.

In [52]:
wiki_london_data.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)
wiki_london_data

Unnamed: 0,Location,London borough,Post_town,Postcode district,Dial code,OS_grid_ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


##### Some columns doesn't have'_' in the words even after the step before. So, it probably has special characters.

### Selection Features

##### Getting only Post Town, Boroughs and Postal Codes and dropping the rest.

In [53]:
df_london = wiki_london_data.drop( [ wiki_london_data.columns[0], wiki_london_data.columns[4], wiki_london_data.columns[5] ], axis=1)

In [54]:
df_london.head()

Unnamed: 0,London borough,Post_town,Postcode district
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


##### Changing names to simpler names

In [55]:
df_london.columns = ['Borough','Town','Post_Code']
df_london

Unnamed: 0,Borough,Town,Post_Code
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...
526,Greenwich,LONDON,SE18
527,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
528,Hammersmith and Fulham,LONDON,W12
529,Hillingdon,HAYES,UB4


Let's remove the Square brackets [ ] and numbers from the borough column

In [57]:
df_london['Borough'] = df_london['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df_london

Unnamed: 0,Borough,Town,Post_Code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Croydon,CROYDON,CR0
3,Croydon,CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
...,...,...,...
526,Greenwich,LONDON,SE18
527,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4
528,Hammersmith and Fulham,LONDON,W12
529,Hillingdon,HAYES,UB4


In [59]:
df_london.shape

(531, 3)

##### Columns = 3
##### Records = 531

### Feature Engineering

##### Selecting neighbourhoods of London

In [60]:
df_london = df_london[df_london['Town'].str.contains('LONDON')]
df_london

Unnamed: 0,Borough,Town,Post_Code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,City,LONDON,EC3
7,Westminster,LONDON,WC2
9,Bromley,LONDON,SE20
...,...,...,...
521,Redbridge,LONDON,"IG8, E18"
522,"Redbridge, Waltham Forest","LONDON, WOODFORD GREEN",IG8
525,Barnet,LONDON,N12
526,Greenwich,LONDON,SE18


In [61]:
df_london.shape

(308, 3)

In [62]:
df_london.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 308 entries, 0 to 528
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Borough    308 non-null    object
 1   Town       308 non-null    object
 2   Post_Code  308 non-null    object
dtypes: object(3)
memory usage: 9.6+ KB


##  London's Neighbourhoods Geolocations

### ArcGis API

##### Using ArcGis to get geographical co-ordinates plot out the map.

In [69]:
pip install arcgis

Note: you may need to restart the kernel to use updated packages.


In [70]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

##### Function for latitude and longitude

In [64]:
def get_x_y_london(address1):
   lat_coords = 0
   lng_coords = 0
   g = geocode(address='{}, London, England, GBR'.format(address1))[0]
   lng_coords = g['location']['x']
   lat_coords = g['location']['y']
   return str(lat_coords) +","+ str(lng_coords)

##### Copying the postal codes of london to the geolocator function above

In [72]:
geo_coordinates_london = df_london['Post_Code']    
geo_coordinates_london

0           SE2
1        W3, W4
6           EC3
7           WC2
9          SE20
         ...   
521    IG8, E18
522         IG8
525         N12
526        SE18
528         W12
Name: Post_Code, Length: 308, dtype: object

Passing postal codes of london to get the geographical co-ordinates

In [73]:
latlong_london = geo_coordinates_london.apply(lambda x: get_x_y_london(x))
latlong_london

0       51.492450000000076,0.12127000000003818
1        51.51324000000005,-0.2674599999999714
6       51.51200000000006,-0.08057999999994081
7       51.51651000000004,-0.11967999999995982
9       51.41009000000008,-0.05682999999993399
                        ...                   
521    51.589770000000044,0.030520000000024083
522      51.50642000000005,-0.1272099999999341
525     51.615920000000074,-0.1767399999999384
526      51.48207000000008,0.07143000000002075
528      51.50645000000003,-0.2369099999999662
Name: Post_Code, Length: 308, dtype: object

### Latitude

Extracting Latitude from coordinates above

In [74]:
lat_london = latlong_london.apply(lambda x: x.split(',')[0])
lat_london

0      51.492450000000076
1       51.51324000000005
6       51.51200000000006
7       51.51651000000004
9       51.41009000000008
              ...        
521    51.589770000000044
522     51.50642000000005
525    51.615920000000074
526     51.48207000000008
528     51.50645000000003
Name: Post_Code, Length: 308, dtype: object

### Longitude

Extracting Longitude from coordinates above

In [75]:
long_london = latlong_london.apply(lambda x: x.split(',')[1])
long_london

0       0.12127000000003818
1       -0.2674599999999714
6      -0.08057999999994081
7      -0.11967999999995982
9      -0.05682999999993399
               ...         
521    0.030520000000024083
522     -0.1272099999999341
525     -0.1767399999999384
526     0.07143000000002075
528     -0.2369099999999662
Name: Post_Code, Length: 308, dtype: object

##### Time to merge the raw data with the geographical co-ordinates

In [76]:
london_merge = pd.concat([df_london,lat_london.astype(float), long_london.astype(float)], axis=1)
london_merge.columns= ['Borough','Town','Post_Code','Latitude','Longitude']
london_merge

Unnamed: 0,Borough,Town,Post_Code,Latitude,Longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
6,City,LONDON,EC3,51.51200,-0.08058
7,Westminster,LONDON,WC2,51.51651,-0.11968
9,Bromley,LONDON,SE20,51.41009,-0.05683
...,...,...,...,...,...
521,Redbridge,LONDON,"IG8, E18",51.58977,0.03052
522,"Redbridge, Waltham Forest","LONDON, WOODFORD GREEN",IG8,51.50642,-0.12721
525,Barnet,LONDON,N12,51.61592,-0.17674
526,Greenwich,LONDON,SE18,51.48207,0.07143


In [77]:
london_merge.dtypes

Borough       object
Town          object
Post_Code     object
Latitude     float64
Longitude    float64
dtype: object

### Co-ordinates for London

Getting the geocode of London to visualize on the map

In [78]:
london = geocode(address='London, England, GBR')[0]
london_long_coords = london['location']['x']
london_lat_coords = london['location']['y']
london_lat_coords

51.50642000000005

In [79]:
london_long_coords

-0.1272099999999341

## London's Map

In [95]:
# Creating the map of London
map_London = folium.Map(location=[london_lat_coords, london_long_coords], zoom_start=12)
map_London

# adding markers to map
for Latitude, Longitude, Borough, Town in zip(london_merge['Latitude'], london_merge['Longitude'], london_merge['Borough'], london_merge['Town']):
    label = '{}, {}'.format(Town, Borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True
        ).add_to(map_London)  
    
map_London

### London's Venues

Foursquare

In [81]:
CLIENT_ID = 'WC4JTINZUHMN11ID1CB1WFZ0UAEN4F22MNISRYVFJ1NIHA2F' 
CLIENT_SECRET = 'WN5JHVJJ1MG3O2JAZGZWOAV5CCZN2Z2HTETHSFBBZATRA4EP'
VERSION = '20210404' # Foursquare API version

##### Getting venues in the neighbourhood. This will help to get venue categories, very important for our analysis

In [82]:
LIMIT=100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

##### Getting the venues in London

In [87]:
london_venues = getNearbyVenues(london_merge['Borough'], london_merge['Latitude'], london_merge['Longitude'])

Bexley, Greenwich 
Ealing, Hammersmith and Fulham
City
Westminster
Bromley
Islington
Islington
Barnet
Enfield
Wandsworth
Southwark
City
Richmond upon Thames
Barnet
Islington
Wandsworth
Westminster
Bromley
Newham
Ealing
Westminster
Lewisham
Camden
Southwark
Tower Hamlets
Bexley
City
Lewisham
Greenwich
Tower Hamlets
Camden
Haringey
Tower Hamlets
Haringey
Barnet
Brent
Lambeth
Lewisham
Tower Hamlets
Kensington and Chelsea, Hammersmith and Fulham
Brent
Barnet
Barnet
Southwark
Tower Hamlets
Camden
Tower Hamlets
Waltham Forest
Newham
Islington
Richmond upon Thames
Lewisham
Camden
Westminster
Greenwich
Kensington and Chelsea
Barnet
Westminster
Lewisham
Waltham Forest
Hounslow, Ealing, Hammersmith and Fulham
Brent
Barnet
Lambeth, Wandsworth
Islington
Barnet
Merton
Barnet
Westminster
Barnet, Brent, Camden
Lewisham
Bexley
Haringey
Bromley
Tower Hamlets
Newham
Hackney
Islington
Southwark
Lewisham
Brent
Southwark
Ealing
Kensington and Chelsea
Wandsworth
Southwark
Barnet
Newham
Richmond upon Thames


In [119]:
london_venues.head(15)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Bexley, Greenwich",51.49245,0.12127,Lesnes Abbey,Historic Site
1,"Bexley, Greenwich",51.49245,0.12127,Sainsbury's,Supermarket
2,"Bexley, Greenwich",51.49245,0.12127,Lidl,Supermarket
3,"Bexley, Greenwich",51.49245,0.12127,Abbey Wood Railway Station (ABW),Train Station
4,"Bexley, Greenwich",51.49245,0.12127,Bean @ Work,Coffee Shop
5,"Bexley, Greenwich",51.49245,0.12127,Platform 1,Platform
6,"Ealing, Hammersmith and Fulham",51.51324,-0.26746,Sainsbury's Local,Grocery Store
7,"Ealing, Hammersmith and Fulham",51.51324,-0.26746,Acton Main Line Railway Station (AML),Train Station
8,"Ealing, Hammersmith and Fulham",51.51324,-0.26746,Co-op Food,Grocery Store
9,"Ealing, Hammersmith and Fulham",51.51324,-0.26746,The Balti House,Indian Restaurant


In [97]:
london_venues.shape

(10343, 5)

##### Registers = 10295

### Grouping by Venue Categories
Selecting how many Venue Categories are, so we can proccess

In [98]:
london_venues.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,Westminster,51.51656,-0.11968,James Smith & Sons
Adult Boutique,Islington,51.52969,-0.08697,Sh! Women's Erotic Emporium
African Restaurant,Westminster,51.52587,-0.08808,Red Sea Restaurant
American Restaurant,Waltham Forest,51.61780,0.02795,Spielburger
Antique Shop,Westminster,51.51651,-0.11968,The London Silver Vaults
...,...,...,...,...
Wings Joint,Hammersmith and Fulham,51.54187,-0.19795,Wingmans
Women's Store,Westminster,51.55457,-0.11478,Vivien of Holloway
Xinjiang Restaurant,Southwark,51.47480,-0.09313,Silk Road
Yoga Studio,Westminster,51.55457,-0.03558,yogahaven


By seeing 306 records, it shows how diverse and interesting the place is.

### One Hot Encoding 
By Encoding our venue categories, we get a better result for our clustering

In [134]:
london_venue_cat = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")
london_venue_cat.tail(20)

Unnamed: 0,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
10323,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10324,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10325,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10326,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10327,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10328,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10329,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10330,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10331,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10332,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


##### Adding Neighbourhood into the mix.

In [100]:
london_venue_cat['Neighbourhood'] = london_venues['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [london_venue_cat.columns[-1]] + list(london_venue_cat.columns[:-1])
london_venue_cat = london_venue_cat[fixed_columns]

london_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Venue categories mean value
Grouping Neighbourhoods and calculate the mean venue categories value in each one

In [101]:
london_grouped = london_venue_cat.groupby('Neighbourhood').mean().reset_index()
london_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barnet,0.0,0.0,0.0,0.001825,0.0,0.0,0.0,0.007299,0.0,...,0.001825,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Barnet, Brent, Camden",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0



##### Function to get the top most common venue categories

In [102]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


##### There are a lot of venue categories. We need to take the top 10 to cluster the neighbourhoods.

Function to label the columns of the venue correctly

In [103]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


### Top venue categories

Getting the top venue categories in London

In [104]:
# create a new dataframe for London
neighborhoods_venues_sorted_london = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_london['Neighbourhood'] = london_grouped['Neighbourhood']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted_london.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_london.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barnet,Coffee Shop,Café,Grocery Store,Bus Stop,Pub,Italian Restaurant,Supermarket,Pharmacy,Turkish Restaurant,Gym / Fitness Center
1,"Barnet, Brent, Camden",Gym / Fitness Center,Hardware Store,Clothing Store,Supermarket,Zoo Exhibit,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market
2,Bexley,Supermarket,Historic Site,Train Station,Coffee Shop,Platform,Park,Construction & Landscaping,Golf Course,Bus Stop,Fish Market
3,"Bexley, Greenwich",Bus Stop,Park,Golf Course,Home Service,Construction & Landscaping,Historic Site,Sports Club,Daycare,Food & Drink Shop,Flower Shop
4,"Bexley, Greenwich",Supermarket,Platform,Train Station,Historic Site,Coffee Shop,Film Studio,Event Space,Exhibit,Falafel Restaurant,Farmers Market


## Model Building

### K Means
Clustering the city of london to roughly 5 to make it easier to analyze, by using the K Means clustering technique.

In [105]:
# set number of clusters
k_num_clusters = 5

London_grouped_clustering = london_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=k_num_clusters, random_state=0).fit(London_grouped_clustering)
kmeans_london

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [106]:
kmeans_london.labels_

array([0, 3, 2, 0, 2, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0])

In [107]:
neighborhoods_venues_sorted_london.insert(0, 'Cluster Labels', kmeans_london.labels_ +1)

##### Join London_merge with our neighbourhood venues sorted to add latitude & longitude for each of the neighborhood. And finally to prepare it for plotting

In [108]:
london_data = london_merge

london_data = london_data.join(neighborhoods_venues_sorted_london.set_index('Neighbourhood'), on='Borough')

london_data.head()

Unnamed: 0,Borough,Town,Post_Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127,3,Supermarket,Platform,Train Station,Historic Site,Coffee Shop,Film Studio,Event Space,Exhibit,Falafel Restaurant,Farmers Market
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746,2,Grocery Store,Park,Indian Restaurant,Breakfast Spot,Train Station,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant
6,City,LONDON,EC3,51.512,-0.08058,1,Hotel,Coffee Shop,Gym / Fitness Center,Italian Restaurant,Pub,Restaurant,Sandwich Place,Wine Bar,Garden,French Restaurant
7,Westminster,LONDON,WC2,51.51651,-0.11968,1,Coffee Shop,Hotel,Pub,Café,Sandwich Place,Italian Restaurant,Theater,Burger Joint,Sushi Restaurant,Bakery
9,Bromley,LONDON,SE20,51.41009,-0.05683,1,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Indian Restaurant,Bus Stop,Gastropub,Bistro



Drop all the NaN values to prevent data skew

In [109]:
london_data_nonan = london_data.dropna(subset=['Cluster Labels'])

### Visualizing the clustered neighbourhood
Let's plot the clusters

In [110]:
map_clusters_london = folium.Map(location=[london_lat_coords, london_long_coords], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_data_nonan['Latitude'], london_data_nonan['Longitude'], london_data_nonan['Borough'], london_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters_london)
        
map_clusters_london

## Examining our Clusters

Cluster 1

In [111]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 1, london_data_nonan.columns[[0] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,City,1,Hotel,Coffee Shop,Gym / Fitness Center,Italian Restaurant,Pub,Restaurant,Sandwich Place,Wine Bar,Garden,French Restaurant
7,Westminster,1,Coffee Shop,Hotel,Pub,Café,Sandwich Place,Italian Restaurant,Theater,Burger Joint,Sushi Restaurant,Bakery
9,Bromley,1,Supermarket,Convenience Store,Fast Food Restaurant,Hotel,Grocery Store,Park,Indian Restaurant,Bus Stop,Gastropub,Bistro
10,Islington,1,Coffee Shop,Pub,Food Truck,Café,Park,Vietnamese Restaurant,Italian Restaurant,Cocktail Bar,Hotel,Gym / Fitness Center
12,Islington,1,Coffee Shop,Pub,Food Truck,Café,Park,Vietnamese Restaurant,Italian Restaurant,Cocktail Bar,Hotel,Gym / Fitness Center
...,...,...,...,...,...,...,...,...,...,...,...,...
521,Redbridge,1,Café,Pub,Coffee Shop,Convenience Store,Bakery,Bar,Grocery Store,Park,Liquor Store,BBQ Joint
522,"Redbridge, Waltham Forest",1,Hotel,Café,Monument / Landmark,Plaza,Garden,Theater,Pub,Bakery,Pharmacy,Sandwich Place
525,Barnet,1,Coffee Shop,Café,Grocery Store,Bus Stop,Pub,Italian Restaurant,Supermarket,Pharmacy,Turkish Restaurant,Gym / Fitness Center
526,Greenwich,1,Pub,Grocery Store,Bus Stop,Indian Restaurant,Coffee Shop,Café,Historic Site,Pier,Convenience Store,Construction & Landscaping


Cluster 2

In [112]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 2, london_data_nonan.columns[[0] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Ealing, Hammersmith and Fulham",2,Grocery Store,Park,Indian Restaurant,Breakfast Spot,Train Station,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant


Cluster 3

In [113]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 3, london_data_nonan.columns[[0] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",3,Supermarket,Platform,Train Station,Historic Site,Coffee Shop,Film Studio,Event Space,Exhibit,Falafel Restaurant,Farmers Market
45,Bexley,3,Supermarket,Historic Site,Train Station,Coffee Shop,Platform,Park,Construction & Landscaping,Golf Course,Bus Stop,Fish Market
124,Bexley,3,Supermarket,Historic Site,Train Station,Coffee Shop,Platform,Park,Construction & Landscaping,Golf Course,Bus Stop,Fish Market
291,Bexley,3,Supermarket,Historic Site,Train Station,Coffee Shop,Platform,Park,Construction & Landscaping,Golf Course,Bus Stop,Fish Market
505,Bexley,3,Supermarket,Historic Site,Train Station,Coffee Shop,Platform,Park,Construction & Landscaping,Golf Course,Bus Stop,Fish Market


Cluster 4

In [114]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 4, london_data_nonan.columns[[0] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
121,"Barnet, Brent, Camden",4,Gym / Fitness Center,Hardware Store,Clothing Store,Supermarket,Zoo Exhibit,Filipino Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market


Cluster 5

In [115]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 5, london_data_nonan.columns[[0] + list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
356,"Brent, Ealing",5,Convenience Store,Fast Food Restaurant,Warehouse Store,Chinese Restaurant,Pharmacy,Zoo Exhibit,Film Studio,Event Space,Exhibit,Falafel Restaurant


# Results and Discussion <a name="results"></a>

Even being smaller than London, Toronto is a super city! It has huge amount of cool places to go and it is very supportive, infrastructuraly speaking. We can see a huge variety of cusines, suchs as Caribbean, Portuguese, Mexican, Chinese, Vietnamese etc. There are a lot of hangout spots: many pubs, coffe shops, parks and Restaurants. On the supportive side, there are a lot of options too: Gym, Parks, Drugstores, Stores, Convenience Store, etc. Overall, Toronto seems like the relaxing vacation place with a mix fun outdoor places and hangout spots, food spots and a wide variety of cusines to try out. And if you are an immigrant it seems perfefect to build a new life. You have everything, everywhere in the neighborhoods, making Toronto very supportive when you need something, because if you have a lot of support places to go in your neighborhood, you can go fast and don't rely on public/private transportation.

Bigger than Toronto, London seems to be super multicultural, bohemian and also supportive. There are a huge variety of cusines including Indian, Italian, Turkish and Chinese. London exceeds a lot Toronto in the amount of fun places to go: Restaurants, bars, pubs, bakeris, coffee shops, Fish and Chips shop and Breakfast spots. A interest point: a lot of Hotels too, much more than Toronto. It's impressive, because the city seems to be built for fun. About infrastructure, it has a lot of shopping and supportive places, such as supermarkets, farm markets, train stations, grocery markets, fish markets, clothing stores. The main modes of transport seem to be Buses and trains. For outdoor leisure, the neighbourhoods also have lots of parks, golf courses, zoo, gyms and Historic sites.

# Conclusion <a name="Conclusion"></a>

The main objective of this project was explore the cities of Toronto and London to see how attractive it is to potential tourists and immigrants. Both cities were explored using their postal codes and then extrapolated the common venues present in each of the neighbourhoods, concluding with clustering similar neighbourhoods together.

As result all neighbourhoods in both the cities have a huge variety of leisure places and experiences to offer. By seeing the variety of cuisines and all different kinds of activities, both cities seems to be prepared to hug anyone, tourist or immigrant, because it gives a lot a feeling of inclusion.

Both Toronto and London seem to offer a perfect vacation stay or a excellent beggining of a new life. They have a lot of places to explore, all kinds of outdoor activites, great infrastructure and a huge variety of culture. 

Toronto has the perfect balance. You have everything, everywhere. All neighbourhoods are perfectly balanced with a lot of places to go for fun and a lot of supportive places to routine. It's perfect for immigrants e good for tourists.

London exceeds in culture diversity and fun. It seems to be a paradise for a tourist. All neighbourhoods has every kind of places to go with variety: all kinds of restaurants, bars, cuisines, pubs, hotels. It's impressive. It is probably expensive, because it seems to be a city built for tourism, so, despite of have a good infrastructure too (good public transportation and some supportive places), immigrants should be carefull.

In the end, it's up to the stakeholders to decide which experience they would prefer. They should be diligent with the client needs to meet its spectations.