## Toronto Neighborhood Analysis

Importing all the required libraries!

In [1]:
import requests
import lxml.html as lh
import pandas as pd
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import folium
from IPython.display import display

Using the requests library to get the HTML content from the wikipedia webpage and then extracting all the data which are in the **tr** block of the HTML as those are the rows of the table. 

In [2]:
URL = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
toronto_page = requests.get(URL)

doc = lh.fromstring(toronto_page.content)

rows = doc.xpath('//tr')

The first row in the extracted rows is the name of columns. So extracting it and storing it in a list. 

In [3]:
cols = []

for column in rows[0]:
    cols.append(column.text_content()[:-1])
    
cols

['Postal Code', 'Borough', 'Neighborhood']

Getting all other rows and storing it in the list variable named _'data'_.

In [4]:
data = []

for index, r in enumerate(rows):
    if index > 0 and index < 181:    # Index 180 marks the end of the table. 
        row = []
        for value in r:
            row.append(value.text_content()[:-1])
        data.append(row)

The **data** list is converted to a pandas dataframe and every _Borough_ which is _Not assigned_ is removed from the dataframe. 

In [5]:
toronto_neighborhood = pd.DataFrame(data, columns=cols)
toronto_neighborhood = toronto_neighborhood[toronto_neighborhood['Borough'] != 'Not assigned'].reset_index(drop=True)

toronto_neighborhood.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
toronto_neighborhood.shape

(103, 3)

In [7]:
print(toronto_neighborhood['Postal Code'].nunique())
print("-----------------")
print(toronto_neighborhood[toronto_neighborhood['Neighborhood'] == 'Not assigned'].count())

103
-----------------
Postal Code     0
Borough         0
Neighborhood    0
dtype: int64


As there are total 103 rows in the table and also 103 unique values of postal codes, so there are no two postal codes which are the same. Also, there are no neighborhoods with **Not assigned** entry. 

Downloading the geodata of toronto neighborhoods!

In [8]:
!wget http://cocl.us/Geospatial_data

--2020-07-02 12:49:57--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 161.202.50.39
Connecting to cocl.us (cocl.us)|161.202.50.39|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-07-02 12:49:57--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|161.202.50.39|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-07-02 12:49:59--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 103.116.4.197
Connecting to ibm.box.com (ibm.box.com)|103.116.4.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-07-02 12:50:00--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5m

Merging the two dataframes. 

In [9]:
geodata = pd.read_csv("Geospatial_data")
toronto_geodata = pd.merge(toronto_neighborhood, geodata, how='inner')

toronto_geodata.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Analysing all the boroughs wich have the word **York** in it. 

In [10]:
york_data = toronto_geodata[toronto_geodata['Borough'].apply(lambda x: 'York' in x)].reset_index(drop = True)

## Exploring Places near Neighborhoods

Using Foursquare API Credentials

In [11]:
CLIENT_ID = 'XR2NWQZAARITNVZDYKA5USHASBG0Y2K4WX3NBCVQBFQZX1HV' 
CLIENT_SECRET = '3QFZVGYG4I25JKUUH5ZFCP30WWEOMJAXAD5CS2XH3O5SYOMH' 
VERSION = '20180604'

Limiting the results of the API to a 100 per venue, and exploring around the radius of 1000 meters per venue. 

In [12]:
LIMIT = 100
radius = 1000
venues = []

for latitude, longitude in zip(york_data['Latitude'], york_data['Longitude']):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
    venues.append(requests.get(url).json())

In [13]:
york_popular = pd.DataFrame(columns=['Postal Code', 'Borough', 'Neighborhood', 'Latitude', 'Longitude', 'Place'])
row_nums = 0

for index, row in enumerate(venues):
    try:
        for i in range(len(row['response']['groups'][0]['items'])):
            york_popular.loc[row_nums] = list(york_data.iloc[index]) + [row['response']['groups'][0]['items'][i]['venue']['categories'][0]['name']]
            row_nums += 1
    except:
        pass
    
york_popular.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Place
0,M3A,North York,Parkwoods,43.753259,-79.329656,Caribbean Restaurant
1,M3A,North York,Parkwoods,43.753259,-79.329656,Park
2,M3A,North York,Parkwoods,43.753259,-79.329656,Café
3,M3A,North York,Parkwoods,43.753259,-79.329656,Grocery Store
4,M3A,North York,Parkwoods,43.753259,-79.329656,Fish & Chips Shop


Now we have the places around each neighborhood with the help of the Foursquare API. After that I try to find the eight most common venues for each neighborhood.

In [14]:
york_onehot = pd.get_dummies(york_popular['Place'])
york_onehot.insert(0, 'Neighborhood', york_popular['Neighborhood'])

york_grouped = york_onehot.groupby('Neighborhood').mean().reset_index()

york_sorted = pd.DataFrame(columns=['Neighborhood', '1', '2', '3', '4', '5', '6', '7', '8'])
york_sorted['Neighborhood'] = york_grouped['Neighborhood']

for index in range(york_grouped.shape[0]):
    temp = york_grouped.iloc[index, 1:]
    temp_sort = temp.sort_values(ascending=False)
    york_sorted.iloc[index, 1:] = list(temp_sort.index.values[0:8])

york_sorted.head()

Unnamed: 0,Neighborhood,1,2,3,4,5,6,7,8
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Pizza Place,Bank,Mobile Phone Shop,Ski Area,Shopping Mall,Mediterranean Restaurant,Dog Run
1,Bayview Village,Japanese Restaurant,Bank,Grocery Store,Gas Station,Restaurant,Park,Café,Trail
2,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Bank,Restaurant,Pharmacy,Butcher,Café
3,Caledonia-Fairbanks,Pharmacy,Bus Stop,Park,Pizza Place,Japanese Restaurant,Fast Food Restaurant,Falafel Restaurant,Sporting Goods Shop
4,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",Furniture / Home Store,Discount Store,Grocery Store,Restaurant,Fast Food Restaurant,Sandwich Place,Dessert Shop,Gas Station


Based on the frequency of venues, I divide them into four different cluster using the KMeans Clustering Method. 

In [15]:
model = KMeans(init='k-means++', n_clusters=4, random_state=1).fit(york_grouped.iloc[:, 1:])

york_sorted.insert(1, 'Cluster Label', model.labels_)

york_final = york_data.copy()
york_final = york_final.join(york_sorted.set_index('Neighborhood'), on='Neighborhood')
york_final.dropna(axis=0, inplace=True)

york_final.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1,2,3,4,5,6,7,8
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Park,Shopping Mall,Pharmacy,Bus Stop,Convenience Store,Chinese Restaurant,Discount Store,Fast Food Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,Coffee Shop,Hockey Arena,Intersection,Playground,Pizza Place,Park,Café,Sporting Goods Shop
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Vietnamese Restaurant,Sushi Restaurant,Restaurant
3,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Coffee Shop,Restaurant,Japanese Restaurant,Gym,Burger Joint,Supermarket,Asian Restaurant,Bank
4,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,3.0,Pizza Place,Gym / Fitness Center,Fast Food Restaurant,Brewery,Bakery,Pharmacy,Pet Store,Coffee Shop


## Visualization

All the clusters made are visualized on the map below using the Folium library.  

In [16]:
latitude = 43.6957
longitude = -79.4504

york_map = folium.Map(location=[latitude, longitude], zoom_start=11)

colors = ['red', 'blue', 'green', 'yellow']

for lat, lon, neighborhood, cluster in zip(york_final['Latitude'], york_final['Longitude'], york_final['Neighborhood'], york_final['Cluster Label']):
    label = folium.Popup(str(neighborhood))
    folium.CircleMarker(
        [lat, lon], 
        radius=5,
        popup=label,
        color=colors[int(cluster)],
        fill=True, 
        fill_opacity=0.7
    ).add_to(york_map)
    
display(york_map)

## Examiming the clusters

The clusters formed can be analysed indivisually. 

In [17]:
york_final[york_final['Cluster Label'] == 0]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1,2,3,4,5,6,7,8
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Furniture / Home Store,Coffee Shop,Fast Food Restaurant,Fried Chicken Joint,Vietnamese Restaurant,Sushi Restaurant,Restaurant
3,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Coffee Shop,Restaurant,Japanese Restaurant,Gym,Burger Joint,Supermarket,Asian Restaurant,Bank
5,M6B,North York,Glencairn,43.709577,-79.445073,0.0,Grocery Store,Fast Food Restaurant,Italian Restaurant,Gas Station,Park,Coffee Shop,Pizza Place,Fish Market
6,M3C,North York,Don Mills,43.7259,-79.340923,0.0,Coffee Shop,Restaurant,Japanese Restaurant,Gym,Burger Joint,Supermarket,Asian Restaurant,Bank
9,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512,0.0,Pharmacy,Bus Stop,Park,Pizza Place,Japanese Restaurant,Fast Food Restaurant,Falafel Restaurant,Sporting Goods Shop
10,M4G,East York,Leaside,43.70906,-79.363452,0.0,Coffee Shop,Sporting Goods Shop,Grocery Store,Electronics Store,Furniture / Home Store,Sandwich Place,Bank,Restaurant
12,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,0.0,Coffee Shop,Pizza Place,Bank,Mobile Phone Shop,Ski Area,Shopping Mall,Mediterranean Restaurant,Dog Run
14,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,0.0,Coffee Shop,Clothing Store,Sandwich Place,Japanese Restaurant,Restaurant,Bank,Bakery,Pharmacy
17,M2K,North York,Bayview Village,43.786947,-79.385975,0.0,Japanese Restaurant,Bank,Grocery Store,Gas Station,Restaurant,Park,Café,Trail
22,M9L,North York,Humber Summit,43.756303,-79.565963,0.0,Pizza Place,Pharmacy,Arts & Crafts Store,Electronics Store,Italian Restaurant,Shopping Mall,Park,Bakery


In [18]:
york_final[york_final['Cluster Label'] == 1]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1,2,3,4,5,6,7,8
19,M2L,North York,"York Mills, Silver Hills",43.75749,-79.374714,1.0,Park,Pool,Yoga Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Electronics Store


In [19]:
york_final[york_final['Cluster Label'] == 2]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1,2,3,4,5,6,7,8
27,M9M,North York,"Humberlea, Emery",43.724766,-79.532242,2.0,Discount Store,Park,Convenience Store,Intersection,Golf Course,Storage Facility,Gas Station,Bakery


In [20]:
york_final[york_final['Cluster Label'] == 3]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1,2,3,4,5,6,7,8
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Park,Shopping Mall,Pharmacy,Bus Stop,Convenience Store,Chinese Restaurant,Discount Store,Fast Food Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,Coffee Shop,Hockey Arena,Intersection,Playground,Pizza Place,Park,Café,Sporting Goods Shop
4,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,3.0,Pizza Place,Gym / Fitness Center,Fast Food Restaurant,Brewery,Bakery,Pharmacy,Pet Store,Coffee Shop
7,M4C,East York,Woodbine Heights,43.695344,-79.318389,3.0,Park,Coffee Shop,Sandwich Place,Skating Rink,Pizza Place,Pub,Beer Store,Curling Ice
8,M6C,York,Humewood-Cedarvale,43.693781,-79.428191,3.0,Pizza Place,Bagel Shop,Convenience Store,Park,Coffee Shop,Grocery Store,Field,Bank
11,M2H,North York,Hillcrest Village,43.803762,-79.363452,3.0,Pharmacy,Coffee Shop,Park,Pizza Place,Shopping Mall,Sandwich Place,Chinese Restaurant,Korean Restaurant
13,M4H,East York,Thorncliffe Park,43.705369,-79.349372,3.0,Coffee Shop,Indian Restaurant,Grocery Store,Pizza Place,Gym,Afghan Restaurant,Brewery,Burger Joint
15,M3J,North York,"Northwood Park, York University",43.76798,-79.487262,3.0,Furniture / Home Store,Coffee Shop,Pizza Place,Bank,Bar,Falafel Restaurant,Metro Station,Massage Studio
18,M3K,North York,Downsview,43.737473,-79.464763,3.0,Vietnamese Restaurant,Coffee Shop,Hotel,Pizza Place,Gas Station,Park,Pharmacy,Discount Store
20,M3L,North York,Downsview,43.739015,-79.506944,3.0,Vietnamese Restaurant,Coffee Shop,Hotel,Pizza Place,Gas Station,Park,Pharmacy,Discount Store
