# Segmenting and Clustering Neighborhoods in Toronto

##### Table of contents:

Part 1 - Create initial table 


Part 2 - Get the latitude and the longitude coordinates of each neighborhood


Part 3 - Generate maps to visual neighborhoods

## Part 1 - Create initial table

### Web scraping using the BeautifulSoup package

In [None]:
import numpy as np 
import pandas as pd 

In [53]:
from bs4 import BeautifulSoup
import requests

source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')
table = soup.find(class_='wikitable')
columns = [head.findAll(text=True)[0].strip() for head in table.find_all("th")]
data = [[td.findAll(text=True)[0].strip() for td in tr.find_all("td")] for tr in table.find_all("tr")]
data = [row for row in data if len(row) == len(columns)]
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


### Data preprocessing

In [54]:
#Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

df2 = df[df.Borough != "Not assigned"].reset_index(drop=True)
df2.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [56]:
#More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, 
#you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. 
#These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 
#in the above table.

df3 = df2.groupby(['Postal code','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df3.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [57]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

for index, row in df3.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]
        
df3.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [63]:
#Save for task 2
df3.to_csv('Canada_data.csv',index=False)

In [64]:
#In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

df3.shape

(103, 3)

## Part 2 - Get the latitude and the longitude coordinates of each neighborhood

In [65]:
#Load the coordinates from the csv file on Coursera
coordinates = pd.read_csv("Geospatial_Coordinates.csv")
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [74]:
# rename the column "Postal Code"
coordinates.rename(columns={"Postal Code": "Postal code"}, inplace=True)

# merge the two table on the column "Postal code"
df_toronto = df3.merge(coordinates, on="Postal code", how="left")
df_toronto.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Part 3 - Generate maps to visual neighborhoods

In [73]:
from sklearn.cluster import KMeans
import folium # map rendering library

In [76]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
  label = '{},{}'.format(neighborhood,borough)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
      [lat, lng],
      radius=5,
      popup=label,
      color='blue',
      fill=True,
      fill_color='#3186cc',
      fill_opacity=0.7).add_to(map_toronto) 
    
map_toronto