<h1>Segmenting and Clustering Neighborhoods in Toronto</h1>
<hr>
<h3><i>Importing the data using Pandas library</i></h3>

In [1]:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_list = pd.read_html(url)

<b><i>We have to access First Table of the HTML page</i></b>

In [2]:
req_df = df_list[0]

In [3]:
type(req_df)

pandas.core.frame.DataFrame

<b><i>Checking the null value in database</i></b>

In [4]:
req_df.isnull().sum()

Postal Code      0
Borough          0
Neighbourhood    0
dtype: int64

In [5]:
req_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
import numpy as np

<b><i>Converting the Not assigned Borough to NaN so that pandas DataFrame can identify it as NULL Value</i></b>

In [7]:
req_df['Borough'].replace('Not assigned',np.nan , inplace=True)

In [8]:
req_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,,Not assigned
1,M2A,,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


<b><i>Dropping all the NULL value Row</i></b>

In [9]:
req_df.dropna(inplace=True)

In [10]:
req_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


<b><i>After Performing deletion index value was not in nice sequence like 0,1,2.. So bring it back into it 0,1,2,... Form</i></b>

In [11]:
req_df_1 = pd.DataFrame(req_df.values)

In [12]:
req_df_1.head()

Unnamed: 0,0,1,2
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


<b><i>By making new DataFrame Column name Change into 0,1,2 so we have to rename the column</i></b>

In [13]:
req_df_1.rename(columns={0:'Postal Code' , 1:'Borough' , 2:'Neighbourhood'} , inplace=True)

In [14]:
req_df_1.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


<b><i>At last Printing the Shape of required DataFrame</i></b>

In [15]:
req_df_1.shape

(103, 3)

In [16]:
lat_long = pd.read_csv('Geospatial_Coordinates.csv')

In [17]:
lat_long.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<b><i>Merging Both DataFrame </i></b>

In [18]:
req_df_2 = pd.merge(req_df_1 , lat_long , on='Postal Code')

In [19]:
req_df_2.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


<b><i>Visualizing the Map of Toronto with Marker on Provided Postal Code</i></b>

In [53]:
import folium


m = folium.Map(
    location=[43.6634, -79.3832],
    zoom_start=12
)


for i in req_df_2.index:
    stri = 'Toronto'
    x = req_df_2['Borough'][i]
    if stri in x:
        folium.Marker(
            [req_df_2['Latitude'][i] , req_df_2['Longitude'][i]] 
        ).add_to(m)


        
m