# City of Toronto's Neigborhoods

In his work, I am going to explore, segment and cluster the City of Toronto's neighborhoods. I will start with extracting the neighborhood's information available on the Wikipedia page and geocode it using the Python's Geocoder library.
Then, we will explore each of the neighborhood using Foursquare API and cluster the returned venues based on their category. These clusters will be used to create the segments.

In [27]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from bs4 import BeautifulSoup

#!conda install -c conda-forge geocoder --yes
import geocoder

print('Libraries imported.')

Libraries imported.


## Getting Neighborhhod's data from Wikipedia

In [49]:
website_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
website_data=requests.get(website_url).text

In [50]:

df_tor=pd.DataFrame(columns=['Postcode','Borough','Neighborhood'])
df_tor.append({'Postcode':1,'Borough':2,'Neighborhood':3}, ignore_index=True)

soup = BeautifulSoup(website_data,'lxml')
neigborhood_table = soup.find('table',{'class':'wikitable sortable'})
rows=neigborhood_table.findAll('tr')

skip=0;
b = "[]"


for row in rows:
    cells=row.findAll('td')
    items_list=[]
    skip=skip+1
    for cell in cells:
        item = cell.text.strip()
        for char in b:            
            item = item.replace(char,"")
        items_list.append(item)
    if skip>1:
        df_tor.loc[len(df_tor)] = items_list

#df_tor[df_tor.apply(lambda x: x.str.contains('Not assigned'))] = 'NA'

df_tor = df_tor[df_tor.Borough.str.contains('Not assigned') == False]
df_tor = df_tor.sort_values('Postcode')
df_tor = df_tor.reset_index(drop=True)


df_tor =df_tor.groupby(['Postcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()


In [51]:
df_tor.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [52]:
df_tor.shape

(103, 3)

## Geocoding the Postcodes

In [53]:

def get_longlat(postal_code):
   
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng

    return lat_lng_coords[0], lat_lng_coords[1]


In [54]:

#df_tor['long']='abcdef'
df_tor.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [55]:
long_list=[]
lat_list=[]

for p in df_tor['Postcode']:
    lat, long = get_longlat(p)
    long_list.append(long)
    lat_list.append(lat)

In [62]:
df_tor['Latitude']=lat_list
df_tor['Longitude']=long_list

103
103


In [65]:
df_tor.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
