<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto_Part Final</font></h1>

## Final Statement

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make. 
to generate maps to visualize your neighborhoods and how they cluster together. 

In [63]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

###### Scrape the List of postal codes of Canada

In [46]:
postal_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
postal_data = requests.get(postal_url).text

In [47]:
soup_fetch = BeautifulSoup(postal_data, 'xml')

In [48]:
table=soup_fetch.find('table')

In [49]:
#dataframe will consist of three columns: PostalCode, Borough, and Neighbourhood
column_names = ['Postalcode','Borough','Neighbourhood']
postal_df = pd.DataFrame(columns = column_names)

In [50]:
# Search all the postcode, borough, neighborhood 
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        postal_df.loc[len(postal_df)] = row_data

In [51]:
postal_df.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


###### Data Processing

In [53]:
postal_df['Borough'].value_counts()

Not assigned        77
North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East Toronto         5
York                 5
East York            5
Mississauga          1
Name: Borough, dtype: int64

In [54]:
# Dropping the rows where Borough is 'Not assigned'
postal_df_NA = postal_df[postal_df.Borough != 'Not assigned']

# Combining the neighbourhoods with same Postalcode
postal_final = postal_df_NA.groupby(['Postalcode','Borough'], sort=False).agg(', '.join)
postal_final.reset_index(inplace=True)

# Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough
postal_final['Neighbourhood'] = np.where(postal_final['Neighbourhood'] == 'Not assigned',postal_final['Borough'], postal_final['Neighbourhood'])

postal_final.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [55]:
postal_final.shape

(103, 3)

## Combining GeoSpatial Data with our dataframe

In [57]:
geo_data=pd.read_csv('http://cocl.us/Geospatial_data')

In [59]:
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [60]:
geo_data.rename(columns={'Postal Code':'Postalcode'},inplace=True)
geo_merged = pd.merge(geo_data, postal_final, on='Postalcode')

In [61]:
geo_merged=geo_merged[['Postalcode','Borough','Neighbourhood','Latitude','Longitude']]

In [62]:
geo_merged.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [68]:
!pip install folium
import folium # plotting library
from sklearn.cluster import KMeans

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.5 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


## Explore and cluster the neighborhoods in Toronto

In [69]:
toronto_data = geo_merged[geo_merged['Borough'].str.contains('Toronto',regex=False)]
toronto_data.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [72]:
toronto_map = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

for lat,lng,borough,neighbourhood in zip(toronto_data['Latitude'],toronto_data['Longitude'],toronto_data['Borough'],toronto_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.6,
    parse_html=False).add_to(toronto_map)
toronto_map

In [74]:
k=6
toronto_clusters = toronto_data.drop(['Postalcode','Borough','Neighbourhood'],axis=1)
kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clusters)
kmeans.labels_
toronto_data.insert(0, 'Clusters', kmeans.labels_)

In [75]:
toronto_data.head()

Unnamed: 0,Clusters,Postalcode,Borough,Neighbourhood,Latitude,Longitude
37,5,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,5,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
43,5,M4M,East Toronto,Studio District,43.659526,-79.340923
44,2,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [76]:
import matplotlib.cm as cm
import matplotlib.colors as colors

cluster_map = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.Accent(np.linspace(0, 1, len(ys)))
color_scheme = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood'], toronto_data['Clusters']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=color_scheme[cluster-1],
        fill=True,
        fill_color=color_scheme[cluster-1],
        fill_opacity=0.7).add_to(cluster_map)
       
cluster_map

## links for map, as map are not displayed in github
[link to Location Map](https://github.com/Roshandev95/Coursera_Capstone/raw/main/1.jpg)<br>
[link to Cluster Map](https://github.com/Roshandev95/Coursera_Capstone/raw/main/map2.jpg)