<h1> Segmenting and Clustering Neighbourhoods in Toronto - Week 3 - Part 1 </h1>

<h2>Import required libraries</h2>

In [1]:
import pandas as pd
import numpy as np
import requests as req
import csv
from bs4 import BeautifulSoup as bsp

<h2>Get the data from the Wikipedia page for the list of postal codes in Canada</h2>

In [2]:
scrape = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df = pd.read_html(scrape)

<h3>Read the data on the page</h3>

In [3]:
df[0]

Unnamed: 0,0,1,2
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights
8,M6A,North York,Lawrence Manor
9,M7A,Queen's Park,Not assigned


<h2>Convert the above data into a dataframe and clean it.</h2>

<h3>Convert the dataframe</h3>

In [4]:
df_tronto=pd.DataFrame(df[0])
df_tronto.rename(index=int, columns={0:'PostalCode', 1:'Borough', 2:'Neighborhood'}, inplace=True)

df_tronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


<h3>Clean the dataframe</h3>

In [5]:
df_tc=df_tronto[df_tronto['Borough']!='Not assigned'].copy()
df_tc['Neighborhood'].replace('Not assigned',  df_tronto.Borough, inplace=True)
df_tc.drop(0, inplace=True)

df_tc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights


<h2>Grouping the dataframe to combine the postal codes</h2>

In [6]:
df_tcf= df_tc.groupby('PostalCode').agg({'Borough':'first', 'Neighborhood': ', '.join,}).reset_index()

df_tcf.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<h1>Get the CSV file containing the Geospatial data</h1>

In [7]:
!wget -O GeoCord.csv http://cocl.us/Geospatial_data/

--2019-05-26 17:18:38--  http://cocl.us/Geospatial_data/
Resolving cocl.us (cocl.us)... 169.48.113.201
Connecting to cocl.us (cocl.us)|169.48.113.201|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data/ [following]
--2019-05-26 17:18:38--  https://cocl.us/Geospatial_data/
Connecting to cocl.us (cocl.us)|169.48.113.201|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-05-26 17:18:39--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.27.197
Connecting to ibm.box.com (ibm.box.com)|107.152.27.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-05-26 17:18:39--  https://ibm.box.com/public/static/9afzr83pps4pwf2sm

<h2>Create the data frame using the data</h2>

In [9]:
df_gc = pd.read_csv('GeoCord.csv')
df_gc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<h2>Change Postal Code name to PostalCode in order to merge the 2 files</h2>

In [10]:
df_gc.rename(index=int, columns={'Postal Code':'PostalCode'}, inplace=True)

df_gc.head(2)

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497


<h2>Merge the two csv files</h2>

In [13]:
df_mer=pd.merge(df_tcf, df_gc, how='left', on='PostalCode')

df_mer.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
