# Clustering Neighborhoods in Toronto 
### Part 1 : Wiki Scrap

Install Beautifulsoup Module

In [21]:
!pip install beautifulsoup4



Import necessary libraries

In [33]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

I will use the BeautifulSoup module with the Python’s html.parser to scrape the wiki page and read into a Pandas Dataframe.   
Then I will remove "Not assigned" Boroughs (dropping the rows) and reset the index.  
Finally check any Neighbourhoods "Not assigned" and replace the value with the Borough value

In [82]:
r = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
content = r.text
soup = BeautifulSoup(content, "html.parser")
table = soup.find_all('table',class_ = 'wikitable')[0]
df = pd.read_html(str(table))[0]
df = df[df.Borough != "Not assigned"].reset_index(drop=True) 
df.Neighbourhood = df.Neighbourhood.where(df.Neighbourhood != "Not assigned", df.Borough.values)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [83]:
df.shape

(103, 3)

### Part 2 - Incorporate latitude/longitud 

Geocoder seems not to be yet compatible with the new 3.7 version of Python so I will go for the CVS file (I avoid posting the error as it was too long and made the .  

In [89]:
!pip install wget
import wget
wget.download('http://cocl.us/Geospatial_data', 'coords.csv')

Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25ldone
[?25h  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9681 sha256=d8efaa99483bbcf3c8eeace2155899f824e97c84237846277a80bf6724c54fb0
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


'coords.csv'

Read the coords file into a pandas dataframe

In [90]:
coords = pd.read_csv('coords.csv')
coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge both files

In [93]:
dfcoords = pd.merge(df, coords, left_on=['Postal Code'],right_on=['Postal Code'],how='left')
dfcoords.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
