# Toronto Neighborhood Visualizations
## Maps for Potential Movers

#### Author: Patrick de Guzman 

# Introduction
The following will be a report for the final project of the capstone course within the Applied Data Science Specialization Program (offered by Coursera). 

The Foursquare API will be used to identify types of venues within neighbourhoods of Toronto, Ontario. Data from the [City of Toronto website](https://www.toronto.ca/) will be used to create choropleth maps to visualize several dimensions per neighborhood (these dimensions for exploration being distribution of average ages, household sizes, unit sizes, and income).

By clustering neighbourhoods based on venue-type via K-means clustering and combining this visualization with choropleth maps for each major dimension mentioned above, we provide a simple tool for individuals looking to potentially move into Toronto to gain a better understanding of the character and distribution of residents within. 



## Table of Contents
1. Downloading and Preprocessing the Dataset  
    a) Toronto Postal Code & Neighborhood Names  
    b) Toronto Neighborhood Coordinates  
    c) GeoJSON Data  
    d) City Data - Age, Income, Household Sizes, Unit Sizes
2. Building Neighborhood Maps  
    a) Foursquare API Request & Wrangling  
    b) K-means Clustering  
    c) Neighborhood Maps: 
        i) Average Age Distribution  
        ii) Average Income Distribution  
        iii) Average Household Sizes  
        iv) Average Unit Sizes
    d) Neighborhood Cluster Listings  
3. References  


##### Required Dependencies

In [44]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from bs4 import BeautifulSoup # import BeautifulSoup for web scraping

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

from zipfile import ZipFile

## 1. Downloading and Preprocessing the Dataset 

### a) Obtaining Toronto Postal Codes & Neighborhoods

First, we will download a table of Toronto postal codes and their accompanying Borough and Neighborhood names: 

In [45]:
# Set up url of wikipedia page to extract table of Toronto postal codes
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'lxml')

# From BeautifulSoup object, extract table
my_table = soup.find('table',{'class':'wikitable sortable'})

df = pd.read_html(str(my_table))[0]
df.Borough.replace('Not assigned', np.nan, inplace = True)
df = df.dropna(axis = 0) 
df.reset_index(drop = True, inplace = True)
df.columns = ['PostalCode', 'Borough','Neighborhood']
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


Additional cleaning will be performed to do the following: 
1. Assign Borough names to Neighborhood names where Borough is present but Neighborhood is 'Not assigned', and
2. Combine neighbourhoods together in the event of duplicate postal codes

First, we assign the borough names to neighborhoods with missing values: 

In [46]:
df['Neighborhood'].replace('Not assigned', df['Borough'], inplace = True)
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


Next, we replace the 'Neighborhood' column with a concatenated version of all neighborhoods within each unique 'PostalCode'. To do this, we can create a for loop to cycle through each unique postal code, subset the original data frame based on each postal code, and within this subset, create a concatenated string of all resulting neighborhood values. 

The resulting final listing will be a data frame of unique postal codes and boroughs with strings of several neighborhoods (for those boroughs that have several neighborhoods).

In [47]:
bor_list = []
nh_list = []

for pcode in df.PostalCode.unique(): 
    subset = df[df.PostalCode == pcode] # subset original df by postal code
    
    neighborhoods = "" # initialize empty borough string
    for neighborhood in subset.Neighborhood:
        neighborhoods += str(neighborhood) + ", "
    
    neighborhoods = neighborhoods.rstrip()[:-1] # remove trailing spaces + ending comma

    bor_list.append(subset.Borough.iloc[0])
    nh_list.append(neighborhoods)


df_final = pd.DataFrame(data = {'PostalCode':df.PostalCode.unique(), 'Borough': bor_list, 'Neighborhoods': nh_list})

df_final = df_final.sort_values(by = ['PostalCode'])
df_final.reset_index(drop = True, inplace = True) 
df_final.head()

Unnamed: 0,PostalCode,Borough,Neighborhoods
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### b) Obtaining Neighborhood Coordinates

Now that we have the preprocessed dataframe, we will obtain the coordinates for each neighborhood in the dataframe using the geocoder package. We will use the coordinate data provided from the link [here](https://cocl.us/Geospatial_data):

In [48]:
coordinates_url = 'https://cocl.us/Geospatial_data'
coordinates = pd.read_csv(coordinates_url)

df_coordinates = pd.merge(left = df_final, right = coordinates, left_on = 'PostalCode', right_on = 'Postal Code')
df_coordinates.drop('Postal Code', axis = 1, inplace = True)
df_coordinates.head()

Unnamed: 0,PostalCode,Borough,Neighborhoods,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### c) Downloading GeoJSON File 

Next, we will download the following data:  
- GeoJSON data to create choropleth map of Toronto (segregated by neighborhood) (SOURCE: [adamw523](http://adamw523.com/toronto-geojson/)) 
- CSV data of datapoints including population, age, income (segregated by neighborhood) (SOURCE: [City of Toronto](https://open.toronto.ca/dataset/neighbourhood-profiles/))

In [49]:
import urllib.request
import zipfile

zipurl = 'https://github.com/adamw523/toronto-geojson/zipball/master'

urllib.request.urlretrieve(zipurl, filename = 'geojsondata.zip')
zipfile.ZipFile('geojsondata.zip').extractall()

In [50]:
TorontoJSON = 'adamw523-toronto-geojson-3b02b53/simple.geojson'

with open(TorontoJSON) as f: 
    gj = json.load(f)
    
gj['features'][0]


{'type': 'Feature',
 'properties': {'DAUID': '35200879',
  'PRUID': '35',
  'CSDUID': '3520005',
  'HOODNUM': 81,
  'HOOD': 'Trinity-Bellwoods',
  'FULLHOOD': 'Trinity-Bellwoods (81)'},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-79.40428280044927, 43.64797961606815],
    [-79.403956753622, 43.64718271074494],
    [-79.42236786578222, 43.643467621011894],
    [-79.42640543946513, 43.65360764326518],
    [-79.41868792113178, 43.65521730993704],
    [-79.41769878521191, 43.65524323486715],
    [-79.41514736685951, 43.65496322517198],
    [-79.40767889826175, 43.65646442447146],
    [-79.40428280044927, 43.64797961606815]]]}}

In [51]:
hoods_list = []
for i in range (0, len (gj['features'])):
    hoods_list.append(gj['features'][i]['properties']['HOOD'])

hoods_list[0:20]

['Trinity-Bellwoods',
 'West Humber-Clairville',
 'Mount Olive-Silverstone-Jamestown',
 'Humber Summit',
 'Thistletown-Beaumond Heights',
 'Humbermede',
 'Rexdale-Kipling',
 'Elms-Old Rexdale',
 'Pelmo Park-Humberlea',
 'Downsview-Roding-CFB',
 'Kingsview Village-The Westway',
 'Weston',
 'Rustic',
 'Humber Heights-Westmount',
 'Brookhaven-Amesbury',
 'Mount Dennis',
 'Willowridge-Martingrove-Richview',
 'Princess-Rosethorn',
 'Eringate-Centennial-West Deane',
 'Edenbridge-Humber Valley']

### d) Downloading City of Toronto Data - Age, Income, Household Sizes, Unit Sizes

In [52]:
data_url = 'https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv'

urllib.request.urlretrieve(data_url, filename = 'torontodata.csv')


('torontodata.csv', <http.client.HTTPMessage at 0x25cd9ae8b70>)

In [53]:
df = pd.read_csv('torontodata.csv')

df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,Bedford Park-Nortown,Beechborough-Greenbrook,Bendale,Birchcliffe-Cliffside,Black Creek,Blake-Jones,Briar Hill-Belgravia,Bridle Path-Sunnybrook-York Mills,Broadview North,Brookhaven-Amesbury,Cabbagetown-South St. James Town,Caledonia-Fairbank,Casa Loma,Centennial Scarborough,Church-Yonge Corridor,Clairlea-Birchmount,Clanton Park,Cliffcrest,Corso Italia-Davenport,Danforth,Danforth East York,Don Valley Village,Dorset Park,Dovercourt-Wallace Emerson-Junction,Downsview-Roding-CFB,Dufferin Grove,East End-Danforth,Edenbridge-Humber Valley,Eglinton East,Elms-Old Rexdale,Englemount-Lawrence,Eringate-Centennial-West Deane,Etobicoke West Mall,Flemingdon Park,Forest Hill North,Forest Hill South,Glenfield-Jane Heights,Greenwood-Coxwell,Guildwood,Henry Farm,High Park North,High Park-Swansea,Highland Creek,Hillcrest Village,Humber Heights-Westmount,Humber Summit,Humbermede,Humewood-Cedarvale,Ionview,Islington-City Centre West,Junction Area,Keelesdale-Eglinton West,Kennedy Park,Kensington-Chinatown,Kingsview Village-The Westway,Kingsway South,Lambton Baby Point,L'Amoreaux,Lansing-Westgate,Lawrence Park North,Lawrence Park South,Leaside-Bennington,Little Portugal,Long Branch,Malvern,Maple Leaf,Markland Wood,Milliken,Mimico (includes Humber Bay Shores),Morningside,Moss Park,Mount Dennis,Mount Olive-Silverstone-Jamestown,Mount Pleasant East,Mount Pleasant West,New Toronto,Newtonbrook East,Newtonbrook West,Niagara,North Riverdale,North St. James Town,Oakridge,Oakwood Village,O'Connor-Parkview,Old East York,Palmerston-Little Italy,Parkwoods-Donalda,Pelmo Park-Humberlea,Playter Estates-Danforth,Pleasant View,Princess-Rosethorn,Regent Park,Rexdale-Kipling,Rockcliffe-Smythe,Roncesvalles,Rosedale-Moore Park,Rouge,Runnymede-Bloor West Village,Rustic,Scarborough Village,South Parkdale,South Riverdale,St.Andrew-Windfields,Steeles,Stonegate-Queensway,Tam O'Shanter-Sullivan,Taylor-Massey,The Beaches,Thistletown-Beaumond Heights,Thorncliffe Park,Trinity-Bellwoods,University,Victoria Village,Waterfront Communities-The Island,West Hill,West Humber-Clairville,Westminster-Branson,Weston,Weston-Pelham Park,Wexford/Maryvale,Willowdale East,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,1,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,42,34,76,52,49,39,112,127,122,24,69,108,41,57,30,71,109,96,133,75,120,33,123,92,66,59,47,126,93,26,83,62,9,138,5,32,11,13,44,102,101,25,65,140,53,88,87,134,48,8,21,22,106,125,14,90,110,124,78,6,15,114,117,38,105,103,56,84,19,132,29,12,130,17,135,73,115,2,99,104,18,50,36,82,68,74,121,107,54,58,80,45,23,67,46,10,72,4,111,86,98,131,89,28,139,85,70,40,116,16,118,61,63,3,55,81,79,43,77,136,1,35,113,91,119,51,37,7,137,64,60,94,100,97,27,31
1,2,Neighbourhood Information,Neighbourhood Information,City of Toronto,TSNS2020 Designation,,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,Emerging Neighbourhood,No Designation,NIA,No Designation,No Designation,No Designation,NIA,NIA,Emerging Neighbourhood,No Designation,No Designation,NIA,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,Emerging Neighbourhood,NIA,NIA,No Designation,NIA,No Designation,No Designation,NIA,NIA,No Designation,NIA,No Designation,No Designation,Emerging Neighbourhood,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,Emerging Neighbourhood,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,NIA,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,NIA,NIA,NIA,No Designation,No Designation,Emerging Neighbourhood,No Designation,No Designation,NIA,No Designation,NIA,NIA,No Designation,No Designation,NIA,No Designation,NIA,No Designation,Emerging Neighbourhood,NIA,NIA,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,Emerging Neighbourhood
2,3,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2016",2731571,29113,23757,12054,30526,27695,15873,25797,21396,13154,23236,6577,29960,22291,21737,7727,14257,9266,11499,17757,11669,9955,10968,13362,31340,26984,16472,15935,14133,9666,17180,27051,25003,36625,35052,11785,21381,15535,22776,9456,22372,18588,11848,21933,12806,10732,30491,14417,9917,15723,22162,23925,12494,16934,10948,12416,15545,14365,13641,43965,14366,11058,17123,17945,22000,9271,7985,43993,16164,14607,15179,16828,15559,10084,43794,10111,10554,26572,33964,17455,20506,13593,32954,16775,29658,11463,16097,23831,31180,11916,18615,13845,21210,18675,9233,13826,34805,10722,7804,15818,11051,10803,10529,22246,14974,20923,46496,10070,9941,16724,21849,27876,17812,24623,25051,27446,15683,21567,10360,21108,16556,7607,17510,65913,27392,33312,26274,17992,11098,27917,50434,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
3,4,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2011",2615060,30279,21988,11904,29177,26918,15434,19348,17671,13530,23185,6488,27876,21856,22057,7763,14302,8713,11563,17787,12053,9851,10487,13093,28349,24770,14612,15703,13743,9444,16712,26739,24363,34631,34659,11449,20839,14943,22829,9550,22086,18810,10927,22168,12474,10926,31390,14083,9816,11333,21292,21740,13097,17656,10583,12525,15853,14108,13091,38084,14027,10638,17058,18495,21723,9170,7921,44919,14642,14541,15070,17011,12050,9632,45086,10197,10436,27167,26541,17587,16306,13145,32788,15982,28593,10900,16423,23052,21274,12191,17832,13497,21073,18316,9118,13746,34617,8710,7653,16144,11197,10007,10488,22267,15050,20631,45912,9632,9951,16609,21251,25642,17958,25017,24691,27398,15594,21130,10138,19225,16802,7782,17182,43361,26547,34100,25446,18170,12010,27018,45041,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
4,5,Population,Population and dwellings,Census Profile 98-316-X2016001,Population Change 2011-2016,4.50%,-3.90%,8.00%,1.30%,4.60%,2.90%,2.80%,33.30%,21.10%,-2.80%,0.20%,1.40%,7.50%,2.00%,-1.50%,-0.50%,-0.30%,6.30%,-0.60%,-0.20%,-3.20%,1.10%,4.60%,2.10%,10.60%,8.90%,12.70%,1.50%,2.80%,2.40%,2.80%,1.20%,2.60%,5.80%,1.10%,2.90%,2.60%,4.00%,-0.20%,-1.00%,1.30%,-1.20%,8.40%,-1.10%,2.70%,-1.80%,-2.90%,2.40%,1.00%,38.70%,4.10%,10.10%,-4.60%,-4.10%,3.40%,-0.90%,-1.90%,1.80%,4.20%,15.40%,2.40%,3.90%,0.40%,-3.00%,1.30%,1.10%,0.80%,-2.10%,10.40%,0.50%,0.70%,-1.10%,29.10%,4.70%,-2.90%,-0.80%,1.10%,-2.20%,28.00%,-0.80%,25.80%,3.40%,0.50%,5.00%,3.70%,5.20%,-2.00%,3.40%,46.60%,-2.30%,4.40%,2.60%,0.70%,2.00%,1.30%,0.60%,0.50%,23.10%,2.00%,-2.00%,-1.30%,8.00%,0.40%,-0.10%,-0.50%,1.40%,1.30%,4.50%,-0.10%,0.70%,2.80%,8.70%,-0.80%,-1.60%,1.50%,0.20%,0.60%,2.10%,2.20%,9.80%,-1.50%,-2.20%,1.90%,52.00%,3.20%,-2.30%,3.30%,-1.00%,-7.60%,3.30%,12.00%,12.90%,3.80%,0.30%,7.20%,0.50%,2.60%,11.70%,7.50%,-0.40%,0.80%


For this analysis, we will remove unneeded data from the file and reshape to have the neighborhoods occur as individual rows (and replacing columns with the data of interest per each neighborhood).

In [54]:
del df['Data Source']
del df['Category']
del df['City of Toronto']

rowtopics_to_keep = ['Neighbourhood Information',
                     'Age characteristics',
                     'Household and dwelling characteristics', 
                     'Household characteristics', 
                     'Income of individuals in 2015']
rowcharacteristics_to_keep = ['Children (0-14 years)',
                              'Youth (15-24 years)',
                              'Working Age (25-54 years)',
                              'Pre-retirement (55-64 years)',
                              'Seniors (65+ years)',
                              'Older Seniors (85+ years)',
                              '  Single-detached house',
                              '  Apartment in a building that has five or more storeys',
                              '    Semi-detached house',
                              '    Row house',
                              '    Apartment or flat in a duplex',
                              '    Apartment in a building that has fewer than five storeys',
                              '    Other single-attached house',
                              '  Movable dwelling',
                              '  1 person',
                              '  2 persons',
                              '  3 persons',
                              '  4 persons',
                              '  5 or more persons',
                              ' Average household size',
                              '  No bedrooms',
                              '  1 bedroom',
                              '  2 bedrooms',
                              '  3 bedrooms',
                              '  4 or more bedrooms'
                              , 'Neighbourhood Number',
                             '    Under $10,000 (including loss)',
                              '    $10,000 to $19,999',
                              '    $20,000 to $29,999',
                              '    $30,000 to $39,999',
                              '    $40,000 to $49,999',
                              '    $50,000 to $59,999',
                              '    $60,000 to $69,999',
                              '    $70,000 to $79,999',
                              '    $80,000 and over']

df = df[df.Topic.isin(rowtopics_to_keep)]
df = df[df.Characteristic.isin(rowcharacteristics_to_keep)]



In [55]:
# Remove duplicates of income ranges (original data set contained duplicates of each range as labels were not properly indexed according to tidy data principles)
ids_to_remove = df[df.Topic == 'Income of individuals in 2015'][~df['_id'].isin(range(989,998))]['_id']

df.set_index('_id', inplace = True)
df.drop(ids_to_remove, inplace = True)
del df['Topic']

  


In [56]:
# TRANSPOSE DATA 
df.transpose().to_csv('torontocleaned.csv', sep = ",", header = False)

In [57]:
df = pd.read_csv('torontocleaned.csv', thousands = ",")
df.rename(columns = {'Characteristic':'Hood'}, inplace = True)

df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
df.head()

Unnamed: 0,hood,neighbourhood_number,children_0-14_years,youth_15-24_years,working_age_25-54_years,pre-retirement_55-64_years,seniors_65+_years,older_seniors_85+_years,single-detached_house,apartment_in_a_building_that_has_five_or_more_storeys,semi-detached_house,row_house,apartment_or_flat_in_a_duplex,apartment_in_a_building_that_has_fewer_than_five_storeys,other_single-attached_house,movable_dwelling,1_person,2_persons,3_persons,4_persons,5_or_more_persons,average_household_size,"under_$10,000_including_loss","$10,000_to_$19,999","$20,000_to_$29,999","$30,000_to_$39,999","$40,000_to_$49,999","$50,000_to_$59,999","$60,000_to_$69,999","$70,000_to_$79,999","$80,000_and_over",no_bedrooms,1_bedroom,2_bedrooms,3_bedrooms,4_or_more_bedrooms
0,Agincourt North,129,3840,3705,11305,4230,6045,925,3345,2120,805,1440,645,735,15,5,1350,2370,1995,1750,1645,3.16,5225,6475,3945,2780,2005,1220,745,525,720,20,965,1790,2860,3490
1,Agincourt South-Malvern West,128,3075,3360,9965,3265,4105,555,2790,3145,330,515,695,610,65,0,1610,2325,1680,1335,1175,2.88,4590,4640,3075,2270,1665,1150,695,500,710,15,915,2235,2560,2405
2,Alderwood,20,1760,1235,5220,1825,2015,320,2840,255,545,85,330,560,0,0,1105,1440,885,795,390,2.6,1395,1565,1590,1325,1170,930,625,510,830,20,325,1145,2055,1090
3,Annex,95,2360,3750,15040,3480,5910,1040,645,8165,1185,595,455,4795,90,0,7885,5220,1540,885,390,1.8,3995,3790,2955,2590,2370,1930,1630,1215,5200,855,6995,4555,2000,1540
4,Banbury-Don Mills,42,3605,2730,10810,3555,6975,1640,3485,6270,285,740,40,1315,0,0,4360,3820,1755,1515,675,2.23,3320,3265,2725,2450,2360,1905,1550,1135,3675,50,3010,4245,2545,2280


We will subset the data according to the following features for development of our final maps:  
- Age,  
- Income,  
- Household sizes,  
- Unit sizes  

#### Subset: Ages 

For this subset, we will create a weighted average column of the ages per neighborhood for use in plotting the neighborhoods via choropleth map. 

In [58]:
from statistics import mean
age_bins = [mean([0,14]),mean([15,24]),mean([25,54]),mean([55,64]),mean([65,84]),mean([85,100])]
cols = df.columns[2:8]

df_ages = df.iloc[:,0:8]

index = 0
for col in cols:
    df_ages[col] = df_ages[col]*age_bins[index]
    index += 1

df_ages['average_age'] = df_ages.iloc[:,2:8].mean(axis = 1)/df.iloc[:,2:8].mean(axis = 1)
    
df_ages

Unnamed: 0,hood,neighbourhood_number,children_0-14_years,youth_15-24_years,working_age_25-54_years,pre-retirement_55-64_years,seniors_65+_years,older_seniors_85+_years,average_age
0,Agincourt North,129,26880,72247.5,446547.5,251685.0,450352.5,85562.5,44.368552
1,Agincourt South-Malvern West,128,21525,65520.0,393617.5,194267.5,305822.5,51337.5,42.429188
2,Alderwood,20,12320,24082.5,206190.0,108587.5,150117.5,29600.0,42.900808
3,Annex,95,16520,73125.0,594080.0,207060.0,440295.0,96200.0,45.195693
4,Banbury-Don Mills,42,25235,53235.0,426995.0,211522.5,519637.5,151700.0,47.358861
5,Bathurst Manor,34,16275,37830.0,262872.5,120785.0,219030.0,65675.0,43.522139
6,Bay Street Corridor,76,11865,133770.0,516067.5,104720.0,180290.0,30525.0,37.399062
7,Bayview Village,52,16905,48847.5,407245.0,151130.0,269317.5,56425.0,43.185724
8,Bayview Woods-Steeles,49,10605,31882.5,177355.0,108587.5,274532.5,68450.0,48.337833
9,Bedford Park-Nortown,39,31885,62595.0,332195.0,182962.5,296510.0,61050.0,40.485454


#### Subset: Household Size

In [60]:
df_hhsize = df.iloc[:,np.r_[0:2,16:22]]
df_hhsize.head()

Unnamed: 0,hood,neighbourhood_number,1_person,2_persons,3_persons,4_persons,5_or_more_persons,average_household_size
0,Agincourt North,129,1350,2370,1995,1750,1645,3.16
1,Agincourt South-Malvern West,128,1610,2325,1680,1335,1175,2.88
2,Alderwood,20,1105,1440,885,795,390,2.6
3,Annex,95,7885,5220,1540,885,390,1.8
4,Banbury-Don Mills,42,4360,3820,1755,1515,675,2.23


#### Subset: Income

In [61]:
df_income = df.iloc[:,np.r_[0:2,22:31]]

income_bins = [mean([0,9999]),
               mean([10000,19999]),
               mean([20000,29999]),
               mean([30000,39999]),
               mean([40000,49999]),
               mean([50000,59999]),
               mean([60000,69999]),
               mean([70000,79999]),
               mean([80000,150000])]
cols = df_income.columns[2:11]

index = 0
for col in cols:
    df_income[col] = df_income[col]*income_bins[index]
    index += 1

df_income['average_income'] = df_income.iloc[:,2:11].mean(axis = 1)/df.iloc[:,22:31].mean(axis = 1)
df_income

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,hood,neighbourhood_number,"under_$10,000_including_loss","$10,000_to_$19,999","$20,000_to_$29,999","$30,000_to_$39,999","$40,000_to_$49,999","$50,000_to_$59,999","$60,000_to_$69,999","$70,000_to_$79,999","$80,000_and_over",average_income
0,Agincourt North,129,26122387.5,97121762.5,98623027.5,97298610.0,90223997.5,67099390.0,48424627.5,39374737.5,82800000,27372.611675
1,Agincourt South-Malvern West,128,22947705.0,69597680.0,76873462.5,79448865.0,74924167.5,63249425.0,45174652.5,37499750.0,81650000,28575.574372
2,Alderwood,20,6974302.5,23474217.5,39749205.0,46374337.5,52649415.0,51149535.0,40624687.5,38249745.0,95450000,39707.791247
3,Annex,95,19973002.5,56848105.0,73873522.5,90648705.0,106648815.0,106149035.0,105949185.0,91124392.5,598000000,48654.907984
4,Banbury-Don Mills,42,16598340.0,48973367.5,68123637.5,85748775.0,106198820.0,104774047.5,100749225.0,85124432.5,422625000,46411.24168
5,Bathurst Manor,34,10798920.0,38548715.0,50748985.0,54074227.5,53999400.0,49224552.5,43874662.5,35249765.0,131100000,36863.951715
6,Bay Street Corridor,76,35446455.0,44923502.5,54498910.0,68774017.5,77849135.0,86074217.5,100424227.5,84374437.5,312225000,37738.537866
7,Bayview Village,52,19448055.0,42073597.5,53873922.5,63174097.5,76499150.0,76999300.0,72149445.0,64499570.0,234025000,39568.814048
8,Bayview Woods-Steeles,49,11073892.5,29774007.5,35624287.5,37974457.5,41174542.5,40149635.0,37699710.0,34124772.5,135700000,38154.711921
9,Bedford Park-Nortown,39,12398760.0,32098930.0,51248975.0,57224182.5,60749325.0,61874437.5,60774532.5,60749595.0,592825000,55992.292845


#### Subset: Unit Sizes (i.e., # of bedrooms)

In [62]:
df_bedrooms = df.iloc[:,np.r_[0:2,31:36]]

br_bins = [0.0000000000001,1,2,3,4]
cols = df_bedrooms.columns[2:7]

index = 0
for col in cols:
    df_bedrooms[col] = df_bedrooms[col]*br_bins[index]
    index += 1

df_bedrooms['average_brs'] = df_bedrooms.iloc[:,2:7].mean(axis = 1)/df.iloc[:,-5:].mean(axis = 1)
# df_bedrooms['average_brs'] = df_bed
# df_bedrooms['top_br'] = df_bedrooms.iloc[:,2:8].idxmax(axis = 1).str[0].astype(int)

df_bedrooms.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,hood,neighbourhood_number,no_bedrooms,1_bedroom,2_bedrooms,3_bedrooms,4_or_more_bedrooms,average_brs
0,Agincourt North,129,2e-12,965,3580,8580,13960,2.968219
1,Agincourt South-Malvern West,128,1.5e-12,915,4470,7680,9620,2.790283
2,Alderwood,20,2e-12,325,2290,6165,4360,2.834951
3,Annex,95,8.55e-11,6995,9110,6000,6160,1.772656
4,Banbury-Don Mills,42,5e-12,3010,8490,7635,9120,2.329349


## 2. Building Maps 

Using the Foursquare API, we can develop a map of Toronto's neighborhoods based on venue types.

In [63]:
# Obtain coordinates of Toronto, Canada
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))



The geograpical coordinates of Toronto are 43.653963, -79.387207.


In [64]:
CLIENT_ID = 'P02OB2VFEUPYSEAGURXEOSFE2UVRYFRSZOW1CWA1XSEG0LIM'
CLIENT_SECRET = 'LVEBZNC05SC2BF4140E5J3N1VJB5QHIGMWZNRDJAE424A31A'
VERSION = '20190829'

### a) Foursquare API Request & Wrangling

In [82]:
# FUNCTION TO SCRAPE FOURSQUARE API
LIMIT = 100  # Set limit of venue matches per each neighborhood queried

def getNearbyVenues(names, latitudes, longitudes, radius=250):
    
    venues_list=[] # initialize empty list of venues for each neighborhood
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [83]:
# Run function to extract venues on each neighborhood while returning each neighborhood name once processed
toronto_venues = getNearbyVenues(names = df_coordinates['Neighborhoods'], 
                                 latitudes = df_coordinates['Latitude'],
                                 longitudes = df_coordinates['Longitude']
                                )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

By using one-hot encoding on the venue types per neighborhood, we can quantify the presence of certain venue types in preparation for use within a k-means clustering model. 

In [84]:
# One-hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix = "", prefix_sep = "")

# Add neighborhood column back to dataframe and move to first column
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

fixed_columns = [['Neighborhood'] + list(toronto_onehot.columns[toronto_onehot.columns != 'Neighborhood'])]
toronto_onehot = toronto_onehot[fixed_columns[0]]

print(toronto_onehot.shape)
toronto_onehot.head(10)

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

print(toronto_grouped.shape)
toronto_grouped.head(10)

(857, 191)
(76, 191)


Unnamed: 0,Neighborhood,Adult Boutique,Airport Lounge,Airport Service,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Business Service,Café,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Festival,Field,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,Health & Beauty Service,History Museum,Home Service,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Martial Arts Dojo,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Museum,Music Store,Music Venue,New American Restaurant,Noodle House,Opera House,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Portuguese Restaurant,Print Shop,Pub,Ramen Restaurant,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Smoothie Shop,Snack Place,Social Club,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.027027,0.0,0.0,0.027027,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0


Next, we can sort the frequencies of each venue type by neighborhood and simply re-arrange the dataframe to show the most common types of venues by neighborhood in ranked column order. 

In [85]:
## Function to sort the venues 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Set up the dataframe parameters 
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Steakhouse,Coffee Shop,Hotel,Bar,Asian Restaurant,Seafood Restaurant,Salad Place,Pharmacy,Café,Japanese Restaurant
1,Agincourt,Sandwich Place,Concert Hall,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant
2,"Alderwood, Long Branch",Pizza Place,Pharmacy,Dance Studio,Coffee Shop,Donut Shop,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant
3,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Video Store,Sandwich Place,Restaurant,Fast Food Restaurant,Sushi Restaurant,Deli / Bodega,Fried Chicken Joint,Middle Eastern Restaurant,Pizza Place
4,Bayview Village,Shopping Plaza,Yoga Studio,Concert Hall,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store


### b) K-Means Clustering

Using the above data, we will use a k-means clustering algorithm to group similar neighborhoods together based on venue type. Note that a high number of clusters is used to better highlight the diversity in Toronto's neighborhoods.

In [104]:
kclusters = 20

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# Run k-means Clustering 
kmeans = KMeans(init = 'k-means++', n_clusters = kclusters, random_state = 0, n_init = 12).fit(toronto_grouped_clustering)

kmeans.labels_

array([ 0, 19,  0,  0,  7,  0, 17, 16,  0,  0, 17, 17,  0, 12, 17,  0, 17,
        0,  0,  8, 13, 17, 14,  0,  0, 17,  0,  0,  0,  3, 17,  5, 15,  0,
        2,  0,  0,  0,  0,  0,  0,  0, 17,  1,  0,  2, 17,  0, 17,  0,  0,
        2, 18,  2, 17,  2,  4, 17,  0,  9,  0, 17,  0, 11, 17, 17, 17,  3,
        0,  0, 12,  2,  6, 17, 10, 17])

In [105]:
# add clustering labels
if neighborhoods_venues_sorted.columns.contains('Cluster Labels'):
    neighborhoods_venues_sorted.drop(['Cluster Labels'], axis = 'columns', inplace = True)

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_coordinates

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = pd.merge(left = toronto_merged, right = neighborhoods_venues_sorted.set_index('Neighborhood'), left_on='Neighborhoods', right_on='Neighborhood')

toronto_merged # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhoods,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1G,Scarborough,Woburn,43.770992,-79.216917,6,Korean Restaurant,Yoga Studio,Flower Shop,Fish & Chips Shop,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant
1,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,17,Hakka Restaurant,Lounge,Thai Restaurant,Dumpling Restaurant,Fish & Chips Shop,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant
2,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,9,Playground,Dog Run,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant
3,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577,8,Business Service,Yoga Studio,Convenience Store,Fish & Chips Shop,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant
4,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476,17,American Restaurant,Motel,Movie Theater,Yoga Studio,Dumpling Restaurant,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant
5,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,16,Pizza Place,Café,Yoga Studio,Donut Shop,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant
6,M1S,Scarborough,Agincourt,43.7942,-79.262029,19,Sandwich Place,Concert Hall,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store,Dumpling Restaurant
7,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302,13,Shopping Mall,Yoga Studio,Concert Hall,Field,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store
8,M1W,Scarborough,L'Amoreaux West,43.799525,-79.318389,0,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Pharmacy,Sandwich Place,Thrift / Vintage Store,Pizza Place,Grocery Store,Dance Studio,Convenience Store
9,M2H,North York,Hillcrest Village,43.803762,-79.363452,17,Mediterranean Restaurant,Pool,Golf Course,Yoga Studio,Dog Run,Festival,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant


### c) Neighborhood Maps

To build the neighborhood maps, we use the folium package with a choropleth map for each dimension (i.e., age, income, etc.) and add markers based on the neighborhood coordinate data (colored by clusters created from the k-means algorithm). 

#### i) Age Distribution (Weighted Average by Neighborhood)

In [106]:
# Choose data to plot on choropleth
data = df_ages
variable = "average_age"

# Create map of Toronto, ON 
age_map = folium.Map(location = [latitude, longitude], zoom_start = 11)

folium.Choropleth(
    geo_data = gj,
    data = data,
    columns = ['neighbourhood_number', variable],
    name = 'choropleth',
    key_on = 'feature.properties.HOODNUM',
    fill_color='BuPu').add_to(age_map)

# Set color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to cluster map 
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhoods'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat,lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True, 
        fill_color = rainbow[cluster-1],
        fill_opacity=0.7).add_to(age_map)

age_map

#### ii) Income Distribution

In [108]:
# Choose data to plot on choropleth
data = df_income
variable = "average_income"

# Create map of Toronto, ON 
income_map = folium.Map(location = [latitude, longitude], zoom_start = 11)

folium.Choropleth(
    geo_data = gj,
    data = data,
    columns = ['neighbourhood_number', variable],
    name = 'choropleth',
    key_on = 'feature.properties.HOODNUM',
    fill_color='YlGn').add_to(income_map)

# Set color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to cluster map 
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhoods'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat,lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True, 
        fill_color = rainbow[cluster-1],
        fill_opacity=0.7).add_to(income_map)

income_map

#### iii) Household Sizes

In [109]:
# Choose data to plot on choropleth
data = df_hhsize
variable = "average_household_size"

# Create map of Toronto, ON 
hh_map = folium.Map(location = [latitude, longitude], zoom_start = 11)

folium.Choropleth(
    geo_data = gj,
    data = data,
    columns = ['neighbourhood_number', variable],
    name = 'choropleth',
    key_on = 'feature.properties.HOODNUM',
    fill_color='Blues').add_to(hh_map)

# Set color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to cluster map 
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhoods'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat,lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True, 
        fill_color = rainbow[cluster-1],
        fill_opacity=0.7).add_to(hh_map)

hh_map

#### iv) Average Number of Bedrooms 

In [110]:
# Choose data to plot on choropleth
data = df_bedrooms
variable = "average_brs"

# Create map of Toronto, ON 
br_map = folium.Map(location = [latitude, longitude], zoom_start = 11)

folium.Choropleth(
    geo_data = gj,
    data = data,
    columns = ['neighbourhood_number', variable],
    name = 'choropleth',
    key_on = 'feature.properties.HOODNUM',
    fill_color='YlOrRd').add_to(br_map)

# Set color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to cluster map 
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhoods'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat,lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True, 
        fill_color = rainbow[cluster-1],
        fill_opacity=0.7).add_to(br_map)

br_map

### d) Neighborhood Character Listings (by Cluster)

In [156]:
for cluster in np.unique(kmeans.labels_): 
    print("---- "+'Cluster #'+cluster.astype(str)+" ----")
    cluster_table = toronto_merged.loc[toronto_merged['Cluster Labels'] == cluster, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
    cluster_table = pd.DataFrame(pd.melt(cluster_table, id_vars = ['Borough','Cluster Labels'])['value'].value_counts().head(10)).reset_index().iloc[:,0]
    print(cluster_table)
    print('\n')

---- Cluster #0 ----
0             Coffee Shop
1              Restaurant
2                    Café
3    Fast Food Restaurant
4          Sandwich Place
5             Pizza Place
6          Farmers Market
7                Festival
8      Italian Restaurant
9      Falafel Restaurant
Name: index, dtype: object


---- Cluster #1 ----
0       Electronics Store
1             Pizza Place
2                Festival
3                 Dog Run
4             Yoga Studio
5          Farmers Market
6    Ethiopian Restaurant
7                   Field
8      Falafel Restaurant
9    Fast Food Restaurant
Name: index, dtype: object


---- Cluster #2 ----
0      Falafel Restaurant
1                Festival
2          Farmers Market
3                   Field
4                    Park
5             Yoga Studio
6    Ethiopian Restaurant
7    Fast Food Restaurant
8       Electronics Store
9                 Dog Run
Name: index, dtype: object


---- Cluster #3 ----
0      Falafel Restaurant
1                Festiv

## References 

- Toronto Postal Code & Neighborhoods ([Wikipedia](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M))  
- Toronto GeoJSON Data ([Adam Wisniewski Github](http://adamw523.com/toronto-geojson/))
- Toronto Neighborhood Data: Age, Income, Household Sizes, Unit Sizes ([City of Toronto](https://open.toronto.ca/dataset/neighbourhood-profiles/))  
- Toronto Neighborhood Venue Data ([Foursquare](https://foursquare.com/))  