# Toronto Neighbourhoods 

**NOTE:** To see all rendered maps please visit [this link](https://nbviewer.jupyter.org/github/MarketaM/Coursera_Capstone/blob/main/notebooks/Toronto_neighborhoods.ipynb). 

## Part 1: 
Scrape a table from Wikipedia using pandas, then drop missing values and clean the table.

In [1]:
import pandas as pd

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df_list = pd.read_html(url)
len(df_list)

3

In [2]:
# Access the table that is in the first place of our list and save it as df_toronto.
df_toronto = df_list[0]
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [3]:
# Delete rows where not assigned.
df_toronto = df_toronto[df_toronto.Borough != "Not assigned"]
df_toronto.reset_index(drop=True, inplace=True)
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [4]:
import numpy as np

# Checks whether all listed neighbourhoods are assigned to each postal code within one row
print(df_toronto.shape)
print(df_toronto["Postal Code"].value_counts)

(103, 3)
<bound method IndexOpsMixin.value_counts of 0      M3A
1      M4A
2      M5A
3      M6A
4      M7A
      ... 
98     M8X
99     M4Y
100    M7Y
101    M8Y
102    M8Z
Name: Postal Code, Length: 103, dtype: object>


In [5]:
# There are no not assigned neighbourhood values.
df_toronto.loc[df_toronto["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postal Code,Borough,Neighbourhood


In [6]:
df_toronto.shape

(103, 3)

## Part 2:
Get latitude and longitude coordinates for each postal code.

In [7]:
import pgeocode

nomi = pgeocode.Nominatim('ca')
postal_code = df_toronto["Postal Code"].to_list()
location = nomi.query_postal_code(postal_code)
df_location = pd.DataFrame(data=location)
df_location.rename(columns={"postal_code":"Postal Code"}, inplace=True)
df_toronto = df_toronto.merge(df_location[["latitude", "longitude", "Postal Code"]], on="Postal Code", how="left")
df_toronto.rename({"latitude":"Latitude", "longitude":"Longitude"}, axis="columns", inplace=True)

In [8]:
# drop rows with null values
df_toronto.dropna(axis=0, inplace=True)

## Part 3:
Explore and cluster Toronto neighbourhoods. Then visualise them with maps.

**NOTE:** To see all rendered maps please visit [this link](https://nbviewer.jupyter.org/github/MarketaM/Coursera_Capstone/blob/main/notebooks/Toronto_neighborhoods.ipynb). 


In [9]:
# get Toronto's coordinates

from geopy.geocoders import Nominatim

address = "Toronto, Canada"

geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
toronto_lat = location.latitude
toronto_lng = location.longitude
print(toronto_lat, toronto_lng)

43.6534817 -79.3839347


In [10]:
# create a map of Toronto with marked neighbourhoods

import folium 

toronto_map = folium.Map(location=[toronto_lat, toronto_lng], zoom_start=11)

for lat, lng, label in zip(df_toronto["Latitude"], df_toronto["Longitude"], df_toronto["Neighbourhood"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        location=[lat, lng],
        radius=5,
        popup=label,
        color="cadetblue",
        fill=True,
        fill_color="lightblue",
        fill_opacity=0.7).add_to(toronto_map)
    
toronto_map