# Segmenting and Clustering Neighborhoods in Toronto

Objective of this assignment is to explore, segment, and cluster the neighborhoods in the city of Toronto.

### Part 1: Scrape the wikipedia page for Toronto neighborhod data, clean & display thetop10 rows

Import the required packages like pandas, beautifulSoup and transform data into a dataframe from a html website

In [1]:
# Import the required packages

from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
# Download url data from internet

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(url).text
CanadaPostaldata = BeautifulSoup(source, 'lxml')
# print(Canada_data)

In [3]:
# creat a new Dataframe

column_names = ['PostalCode','Borough','Neighborhood']
toronto = pd.DataFrame(columns = column_names)


In [4]:
#Scrape the toronto pincodes from html table 

content = CanadaPostaldata.find('div', class_ = 'mw-parser-output')
table = content.table.tbody
postcode = '' 
borough = '' 
neighborhood = ''
for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text.strip('\n')
            i = i + 1
        elif i == 1:
            borough = td.text.strip('\n')
            i = i + 1
        elif i == 2: 
            neighborhood = td.text.strip('\n')
    toronto = toronto.append({'PostalCode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)
    
toronto = toronto[toronto.Borough!='Not assigned']
toronto = toronto[toronto.Borough!= 0]
toronto.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,toronto.shape[0]):
    if toronto.iloc[i][2] == 'Not assigned':
        toronto.iloc[i][2] = toronto.iloc[i][1]
        i = i+1

#Format the Neighborhood column
df = toronto.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()

# Drop NA & remove Not assigned values
df = df.dropna()
empty = 'Not assigned'
df = df[(df.PostalCode != empty ) & (df.Borough != empty) & (df.Neighborhood != empty)]

indexName = df[ df['PostalCode'] == '' ].index
df.drop(indexName , inplace=True)

In [14]:
# Group Neighborhood together
def group_neighborhood_list(grouped):    
    return ', '.join(sorted(grouped['Neighborhood'].tolist()))
                    
grouped_toronto = df.groupby(['PostalCode', 'Borough'])
df_totonto = grouped_toronto.apply(group_neighborhood_list).reset_index(name='Neighborhood')
df_totonto.reset_index()
df_totonto.columns = ['Postal Code','Borough','Neighborhood']
df_totonto.head(11)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [15]:
# Print the shape for the final dataframe
print('The shape of dataframe is',df_totonto.shape)

The shape of dataframe is (103, 3)


### Part 2: Get the latitude & longitude of the toronto pincodes and like with the realier data ser 

Import the required packages like pandas, beautifulSoup and transform data into a dataframe from a html website

In [16]:
#get the url from the Geo cordinates
geoCordinates_toronto = 'https://cocl.us/Geospatial_data'

#download the file
!wget -q -O 'toronto_m.geospatial_data.csv' geoCordinates_toronto

#Read CSV
df_geo_toronto = pd.read_csv(geoCordinates_toronto).set_index("Postal Code")
df_geo_toronto.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [19]:
# Merge the two  data sets
df = pd.merge(df_totonto, df_geo_toronto, on='Postal Code')
df.head()


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
