## Canada Postal Codes Coordinate

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
* If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [82]:
import pandas as pd
import numpy as np
# !pip install beautifulsoup4
from bs4 import BeautifulSoup
import requests as rq

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = rq.get(url)
html_doc = response.text

soup = BeautifulSoup(html_doc, 'html.parser')

postal_table = soup.find('table', {'class': 'wikitable sortable'})

columns = [th.text.replace('\n', '') for th in postal_table.find('tr').find_all('th')]
print(columns)
trs = postal_table.find_all('tr')[1:]
rows = list()
for tr in trs:
    postal_row = []
    for td in tr.find_all('td'):
        postal_row.append(td.text.replace('\n', ''))
        
    if (postal_row[1] != 'Not assigned'):
        if (postal_row[2] == 'Not assigned'):
            postal_row[2] = postal_row[1]
        rows.append(postal_row)

postal_df = pd.DataFrame(data=rows, columns=columns)
postal_df.head()

['Postal Code', 'Borough', 'Neighbourhood']


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [83]:
cor_df = pd.read_csv('Geospatial_Coordinates.csv')
cor_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [84]:
# join postal_df and cor_df
postal_cor_df = pd.merge(postal_df,cor_df)
postal_cor_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Explore and cluster the neighborhoods in Toronto. 
1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.
1. to generate maps to visualize your neighborhoods and how they cluster together.

In [86]:
toronto_df = postal_cor_df[postal_cor_df['Borough'].str.contains('Toronto', regex=False)]
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [78]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [87]:
# set number of clusters
kclusters = 5

toronto_df_clustering = toronto_df[['Latitude', 'Longitude']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 4, 0, 0, 3, 0, 1], dtype=int32)

In [88]:
toronto_df.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_df.head()

Unnamed: 0,Cluster Labels,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,0,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,0,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,0,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [94]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[toronto_df['Latitude'].mean(), toronto_df['Longitude'].mean()], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [73]:
toronto_df.groupby('Borough').size()

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
West Toronto         6
dtype: int64

Although there are only four boroughes in Toronto, I tried to divid all boroughes to five parts by k-means. The East and Central borough are same with borough assigned. However, Part of Downton and West groups is built to a new borough. The reason why West Toronto can not be the same category may be low-density of this borough. The Downton Toronto Cluster got some other boroughes including West, Central. The reason why some boroughes assigned to Downtown may be unbalanced amount of boroughes. The Downtown Toronto is about a half of boroughes. Unlike boroughes assigned is by city planning, this report only use coordinates to cluster all boroughes. More features should be considered in this analysis.