# Segmenting and Clustering Neighborhoods in Toronto

## 1. Get the Neighborhoods

We'll need to download from a Wkipage in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

In [186]:
import pandas as pd
import numpy as np
import matplotlib as mp
import matplotlib.pyplot as plt
import json 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium 



In [190]:
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"  
wiki = pd.read_html(url)
toronto_df = wiki[0]  #get the data into a Dataframe
toronto_df.columns = ['PostalCode','Borough', 'Neighborhood'] #rename the columns

for i, bor in enumerate(toronto_df['Borough']):   # first see if there are any 'Not Assigned' Neighborhoods and assign them the Borough name (if there is one)
    #print(i,bor)
    if toronto_df.iloc[i]['Neighborhood'] == 'Not assigned':
        toronto_df.iloc[i]['Neighborhood'] = bor


toronto_df.head(10)




Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
9,M8A,Not assigned,Not assigned


__Then, we're going to drop all the rows were Borough is 'Not assigned'__

In [191]:


toronto_df.replace('Not assigned', np.nan,inplace = True) # replace 'Not assigned' with NaN
toronto_df.dropna(subset = ['Borough'],axis = 0 , inplace=True)# drop all the  Borough rows with Nan

toronto_df.head(10)


Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


__Now we need to fuse the rows with the same Borough__

In [193]:

toronto_merged = pd.DataFrame(columns = ['PostalCode','Borough','Neighborhood'])  #create a new dataframe with 3 columns
toronto_merged['PostalCode'] = toronto_df['PostalCode'].unique()  # import the unique values of PostCodes into the first column

for i, pos in enumerate(toronto_merged['PostalCode']):
    temp = toronto_df[toronto_df['PostalCode']==pos]
    toronto_merged.loc[i,'Borough'] = temp['Borough'].iloc[0] # for every unique postcode, assign the Borough name
    toronto_merged.loc[i,'Neighborhood'] = ', '.join(list(temp['Neighborhood'])) # for every unique postcode, assign a list of neighborhood 
    
toronto_merged.head()


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


__Now we get the shape of our dataframe__

In [194]:
toronto_merged.shape

(103, 3)