# Toronto Neighbourhoods 

**NOTE:** To see all rendered maps please visit [this link](https://nbviewer.jupyter.org/github/MarketaM/Coursera_Capstone/blob/main/notebooks/Toronto_neighborhoods.ipynb). 

## Part 1: 
Scrape a table from Wikipedia using pandas, then drop missing values and clean the table.

In [1]:
import pandas as pd

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df_list = pd.read_html(url)
len(df_list)

3

In [2]:
# Access the table that is in the first place of our list and save it as df_toronto.
df_toronto = df_list[0]
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [3]:
# Delete rows where not assigned.
df_toronto = df_toronto[df_toronto.Borough != "Not assigned"]
df_toronto.reset_index(drop=True, inplace=True)
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [4]:
import numpy as np

# Checks whether all listed neighbourhoods are assigned to each postal code within one row
print(df_toronto.shape)
print(df_toronto["Postal Code"].value_counts)

(103, 3)
<bound method IndexOpsMixin.value_counts of 0      M3A
1      M4A
2      M5A
3      M6A
4      M7A
      ... 
98     M8X
99     M4Y
100    M7Y
101    M8Y
102    M8Z
Name: Postal Code, Length: 103, dtype: object>


In [5]:
# There are no not assigned neighbourhood values.
df_toronto.loc[df_toronto["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postal Code,Borough,Neighbourhood


In [6]:
df_toronto.shape

(103, 3)

## Part 2:
Get latitude and longitude coordinates for each postal code.

In [7]:
import pgeocode

nomi = pgeocode.Nominatim('ca')
postal_code = df_toronto["Postal Code"].to_list()
location = nomi.query_postal_code(postal_code)
df_location = pd.DataFrame(data=location)
df_location.rename(columns={"postal_code":"Postal Code"}, inplace=True)
df_toronto = df_toronto.merge(df_location[["latitude", "longitude", "Postal Code"]], on="Postal Code", how="left")
df_toronto.rename({"latitude":"Latitude", "longitude":"Longitude"}, axis="columns", inplace=True)

In [8]:
# drop rows with null values
df_toronto.dropna(axis=0, inplace=True)

## Part 3:
Explore and cluster Toronto neighbourhoods. Then visualise them with maps.

**NOTE:** To see all rendered maps please visit [this link](https://nbviewer.jupyter.org/github/MarketaM/Coursera_Capstone/blob/main/notebooks/Toronto_neighborhoods.ipynb). 


In [9]:
# get Toronto's coordinates

from geopy.geocoders import Nominatim

address = "Toronto, Canada"

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
toronto_lat = location.latitude
toronto_lng = location.longitude
print(toronto_lat, toronto_lng)

43.6534817 -79.3839347


In [10]:
# create a map of Toronto with marked neighbourhoods

import folium 

toronto_map = folium.Map(location=[toronto_lat, toronto_lng], zoom_start=11)

for lat, lng, label in zip(df_toronto["Latitude"], df_toronto["Longitude"], df_toronto["Neighbourhood"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        location=[lat, lng],
        radius=5,
        popup=label,
        color="cadetblue",
        fill=True,
        fill_color="lightblue",
        fill_opacity=0.7).add_to(toronto_map)
    
toronto_map

In [11]:
# get Foursquare client information
CLIENT_ID = 'paste_your_client_id'
CLIENT_SECRET = 'paste_your_client_secret' 
VERSION = '20180605' 
LIMIT = 100

In [12]:
import requests

# define a function that will get venues and their coordinates from Foursquare

def getVenues(names, latitudes, longitudes, radius=500):
    
    venues_list = []
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
    
        venues_list.append([(
            name, 
            lat, 
            lng,
            v["venue"]["name"],
            v["venue"]["location"]["lat"],
            v["venue"]["location"]["lng"],
            v["venue"]["categories"][0]["name"]) for v in results])
        
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
    


In [13]:
# run the function to get Toronto neighbourhood venues
toronto_venues = getVenues(names=df_toronto["Neighbourhood"], latitudes=df_toronto["Latitude"], longitudes=df_toronto["Longitude"])
toronto_venues.head()

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.7545,-79.33,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.7545,-79.33,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.7276,-79.3148,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.7276,-79.3148,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.7276,-79.3148,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [14]:
# converts categorical values to int
toronto_onehot = pd.get_dummies(toronto_venues[["Venue Category"]], prefix="", prefix_sep="")
toronto_onehot["Neighbourhood"] = toronto_venues["Neighbourhood"]
toronto_onehot = toronto_onehot[[toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])]
toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [15]:
# group the dataframe by neighbourhoods
toronto_grouped = toronto_onehot.groupby("Neighbourhood").mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Clustering - K Means
Cluster the neighbourhoods into 5 clusters based on the similarity of venue occurence.

In [16]:
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

number_of_clusters = 5
toronto_clustering = toronto_grouped.drop("Neighbourhood", axis=1)
k_means = KMeans(n_clusters = number_of_clusters, random_state = 4).fit(toronto_clustering)
toronto_grouped.insert(0, "Cluster Group", k_means.labels_)
df_toronto = df_toronto.join(toronto_grouped.set_index("Neighbourhood"), on="Neighbourhood")

In [17]:
df_toronto.dropna(axis=0, inplace=True)
df_toronto["Cluster Group"] = df_toronto["Cluster Group"].astype(int)
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Group,Accessories Store,Afghan Restaurant,Airport,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M3A,North York,Parkwoods,43.7545,-79.33,0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4A,North York,Victoria Village,43.7276,-79.3148,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504,1,0.014085,0.0,0.0,0.014085,...,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.042254,0.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
# Map the clusters.

cluster_map = folium.Map(location=[toronto_lat, toronto_lng], zoom_start=11)

x = np.arange(number_of_clusters)
ys = [i + x + (i*x)**2 for i in range(number_of_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lng, hood, cluster in zip(df_toronto["Latitude"], df_toronto["Longitude"], df_toronto["Neighbourhood"], df_toronto["Cluster Group"]):
    label = folium.Popup(str(hood) + ", cluster " + str(cluster), parse_html=True)
    folium.CircleMarker(
        location=[lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster - 1],
        fill=True,
        fill_color=rainbow[cluster - 1],
        fill_opacity=0.7).add_to(cluster_map)

cluster_map

In [19]:
# group the dataframe so as each cluster would have shown the most frequent venues on average

cluster_grouped = df_toronto.loc[:, "Cluster Group":]
cluster_grouped = cluster_grouped.groupby("Cluster Group").mean()
cluster_grouped.reset_index(inplace=True)


In [20]:
# Show what venues are the most frequent within each cluster.

for cluster in cluster_grouped["Cluster Group"]:
    print("Cluster ", cluster)
    temp = cluster_grouped[cluster_grouped['Cluster Group'] == cluster].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head())
    print('\n')

Cluster  0
                                      venue  freq
0                                      Park  0.56
1                         Convenience Store  0.14
2  Residential Building (Apartment / Condo)  0.08
3                         Food & Drink Shop  0.08
4                        Photography Studio  0.08


Cluster  1
            venue  freq
0     Coffee Shop  0.07
1     Pizza Place  0.04
2  Sandwich Place  0.03
3        Pharmacy  0.03
4            Café  0.03


Cluster  2
               venue  freq
0             Bakery   1.0
1  Accessories Store   0.0
2      Movie Theater   0.0
3       Neighborhood   0.0
4        Music Venue   0.0


Cluster  3
           venue  freq
0           Park  0.23
1          Trail  0.09
2     Playground  0.07
3  Grocery Store  0.06
4            Gym  0.05


Cluster  4
                       venue  freq
0                        Bar   1.0
1          Accessories Store   0.0
2  Middle Eastern Restaurant   0.0
3         Miscellaneous Shop   0.0
4          Mobile 

### Cluster 0 (red dots on the map) 
- comprises of residential neighbourhoods within reach of parks, convenience stores or food&drink shops.

### Cluster 1 (purple dots) 
- is the most frequent one with typical urban venues such as coffee shops, cafés, pizza places etc.

### Cluster 2 (blue dot) 
- a single neighbourhood with a bakery nearby.

### Cluster 3 (turqoise dots) 
- another residential neighbourhoods with parks, children playgrounds, grocery stores etc.

### Cluster 4 (orange dot) 
- a single neighbourhood at the city outskirts... what to do there? Maybe go to a bar?