Segmenting and Clustering Neighborhoods in Toronto

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

Start by creating a new Notebook for this assignment. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below.

Import libraries

In [1]:
import pandas as pd
import numpy as np

#json tools
import json
from pandas.io.json import json_normalize

#scraping
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup

#geocoders
from geopy.geocoders import Nominatim

#visualization libraries
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

#kmeans clustering
from sklearn.cluster import KMeans

print('Done!')

ImportError: No module named 'folium'

Task 1 - Scraping Wikipedia page, creating Pandas DF, cleaning data

Using BeautifulSoup and URLopen libraries

In [2]:
wlink = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
raw_page = urlopen(wlink).read().decode('utf-8')
page = BeautifulSoup(raw_page, 'html.parser')
table = page.body.table.tbody

Next, transforming the table data to Pandas Dataframe

In [3]:
#functions for getting cell and row data

def table_cell(i):
    cells = i.find_all('td')
    row = []
    
    for cell in cells:
        if cell.a:            
            if (cell.a.text):
                row.append(cell.a.text)
                continue
        row.append(cell.string.strip())
        
    return row

def table_row():    
    data = []  
    
    for tr in table.find_all('tr'):
        row = table_cell(tr)
        if len(row) != 3:
            continue
        data.append(row)        
    
    return data

In [4]:
#writing into pandas dataframe
data = table_row()
columns = ['Postcode', 'Borough', 'Neighbourhood']
df = pd.DataFrame(data, columns=columns)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Cleaning the data:
1.Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
2.More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
3.If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
4.Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
5.In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [5]:
#dropping the "Not Assigned" borough
df1 = df[df.Borough != 'Not assigned']
df1 = df1.sort_values(by=['Postcode','Borough'])

df1.reset_index(inplace=True)
df1.drop('index',axis=1,inplace=True)
df1.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,Rouge
1,M1B,Scarborough,Malvern
2,M1C,Scarborough,Highland Creek
3,M1C,Scarborough,Rouge Hill
4,M1C,Scarborough,Port Union


In [6]:
#Consolidating the neighbourhoods that share the postcode

df_postcodes = df1['Postcode']
df_postcodes.drop_duplicates(inplace=True)
df2 = pd.DataFrame(df_postcodes)
df2['Borough'] = '';
df2['Neighbourhood'] = '';


df2.reset_index(inplace=True)
df2.drop('index', axis=1, inplace=True)
df1.reset_index(inplace=True)
df1.drop('index', axis=1, inplace=True)

for i in df2.index:
    for j in df1.index:
        if df2.iloc[i, 0] == df1.iloc[j, 0]:
            df2.iloc[i, 1] = df1.iloc[j, 1]
            df2.iloc[i, 2] = df2.iloc[i, 2] + ',' + df1.iloc[j, 2]
            
for i in df2.index:
    s = df2.iloc[i, 2]
    if s[0] == ',':
        s =s [1:]
    df2.iloc[i,2 ] = s
    
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
#Checking dataframe shape
df2.shape

(103, 3)

Task 2 - Get Coordinates

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

Using the provided Geospatial_Coordinates.csv file to get the coordinates:

In [8]:

#reading the file to coord dataframe
df2['Latitude'] = '0';
df2['Longitude'] = '0';

coord = pd.read_csv('https://cocl.us/Geospatial_data')

In [9]:

#merging dataframe that contain coordinates with the one that contains borough names
for i in df2.index:
    for j in coord.index:
        if df2.iloc[i, 0] == coord.iloc[j, 0]:
            df2.iloc[i, 3] = coord.iloc[j, 1]
            df2.iloc[i, 4] = coord.iloc[j, 2]

#checking the results            
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.8067,-79.1944
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.7845,-79.1605
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.7636,-79.1887
3,M1G,Scarborough,Woburn,43.771,-79.2169
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395


Task 3 - Analysis


Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

1.To add enough Markdown cells to explain what you decided to do and to report any observations you make.

2.To generate maps to visualize your neighborhoods and how they cluster together.

3.1 Select only the neighbourhoods of Downtown Toronto

Choose the neighbourhoods that contain word " Downtown Toronto"

In [10]:
toronto = df2[df2['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
toronto.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.6796,-79.3775
1,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.668,-79.3677
2,M4Y,Downtown Toronto,Church and Wellesley,43.6659,-79.3832
3,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.6543,-79.3606
4,M5B,Downtown Toronto,"Ryerson,Garden District",43.6572,-79.3789


In [11]:
#get the coordinates for Toronto
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


In [12]:
#create the Folium map of Downtown Toronto
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(toronto['Latitude'], toronto['Longitude'], toronto['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

NameError: name 'folium' is not defined

3.2 Utilizing the Foursquare API to get top 100 venues in Downtown Toronto

In [13]:
#set credintials
CLIENT_ID = 'IPTYUZQHVW5OCDTT331BXA1SFQCJ3QCNQ2NVFZHQI5M4ZJLY' # your Foursquare ID
CLIENT_SECRET = '4ARF5SHATZIHJ2FJURJBFIUBZWYKR0UZ4FP5XHAGRE4BCZJ1' # your Foursquare Secret
VERSION = '20190323' # Foursquare API version

Borrowing the function from the lab to get Top 100 venues in Downtown Toronto within a radius of 500m:

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues.

In [15]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

downtown_venues = getNearbyVenues(names=toronto['Neighbourhood'],
                                   latitudes=toronto['Latitude'],
                                   longitudes=toronto['Longitude']
                                  )

Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie


In [16]:
#checking the size of venues dataframe
downtown_venues.shape

(1281, 7)

In [17]:
#checking how many unique categories of venues are there
print('There are {} unique categories.'.format(len(downtown_venues['Venue Category'].unique())))

There are 207 unique categories.


3.3 Analyze each neighbourhood

In [18]:
# one hot encoding
dt_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

dt_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
#checking the dataframe size
dt_onehot.shape

(1281, 207)

Gouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category:

In [20]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
dt_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.214286,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0
5,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.01,0.0,0.0,0.06,0.0,0.03,0.01,0.0,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.011364,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011364,0.011364,0.0,0.011364,0.0
8,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
9,"Design Exchange,Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0


In [21]:
#checking the grouped dataframe size
dt_grouped.shape

(18, 207)

Printing out each neighborhood along with the top 5 most common venues in it:

In [22]:
num_top_venues = 5

for hood in dt_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2      Thai Restaurant  0.04
3                  Bar  0.04
4  American Restaurant  0.04


----Berczy Park----
          venue  freq
0   Coffee Shop  0.07
1  Cocktail Bar  0.05
2    Restaurant  0.04
3          Café  0.04
4        Bakery  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.21
1    Airport Lounge  0.14
2  Airport Terminal  0.14
3   Harbor / Marina  0.07
4  Sculpture Garden  0.07


----Cabbagetown,St. James Town----
                venue  freq
0         Coffee Shop  0.10
1                Café  0.05
2  Italian Restaurant  0.05
3                 Pub  0.05
4         Pizza Place  0.05


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.16
1                Café  0.06
2  Italian Restaurant  0.05
3        Burger Joint  0.03


Converting the results to Pandas dataframe:

In [23]:
#function to sort the venues in descending order:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
#create the new dataframe and display the top 10 venues for each neighborhood:

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,American Restaurant,Bar,Steakhouse,Thai Restaurant,Hotel,Cosmetics Shop,Bakery,Burger Joint
1,Berczy Park,Coffee Shop,Cocktail Bar,Farmers Market,Italian Restaurant,Restaurant,Cheese Shop,Beer Bar,Seafood Restaurant,Bakery,Steakhouse
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court,Airport Gate,Boutique,Boat or Ferry
3,"Cabbagetown,St. James Town",Coffee Shop,Café,Pizza Place,Bakery,Restaurant,Pub,Italian Restaurant,Park,Deli / Bodega,General Entertainment
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Burger Joint,Sandwich Place,Spa,Indian Restaurant,Japanese Restaurant,Sushi Restaurant,Middle Eastern Restaurant


3.4 Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [25]:
# set number of clusters
kclusters = 4

dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

NameError: name 'KMeans' is not defined

In [26]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Labels', kmeans.labels_)

dt_merged = toronto

# merge downtown_grouped with toronto data to add latitude/longitude for each neighborhood
# I realized that I've misspelled the NeighboUrhood column name in Toronto dataframe. oops...
dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

dt_merged.head()

NameError: name 'kmeans' is not defined

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighbourhood'], dt_merged['Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=9,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

NameError: name 'folium' is not defined

3.5 Examine the clusters

In [28]:
#Cluster 1
dt_merged.loc[dt_merged['Labels'] == 0, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

NameError: name 'dt_merged' is not defined

In [29]:
#Cluster 2
dt_merged.loc[dt_merged['Labels'] == 1, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

NameError: name 'dt_merged' is not defined

In [30]:
#Cluster 3
dt_merged.loc[dt_merged['Labels'] == 2, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

NameError: name 'dt_merged' is not defined

In [31]:
#Cluster 4
dt_merged.loc[dt_merged['Labels'] == 3, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

NameError: name 'dt_merged' is not defined

3.6 Conclusion:

As seen from the above dataframes corresponding to each cluster label, the following conclusions can be made:

Cluster 1: the most common venue type is Coffee Shop, followed by restaurants and bars.
Cluster 2: the most common venue type is Park or Playground.
Cluster 3: the most common venue type is Airport Lounge.
Cluster 4: the most common venue type is Grocery Store.
The most popular venue type in Downtown Toronto is Coffee Shop, containing 16 venues total.

1) Introduction/Business Problem
The basis of this study is to help a small group of investors planning to open their first U.S. based brewery / restaurant expansion in Toronto. They are interested in building in an area that meets the following criteria:

A neighborhood with an average to above average total population
Above average populations of 25-40 year old male and female professionals
A high concentration of the population having secondary education
Average to above average median net household incomes
The objective is to locate and recommend to the investors, the target audience, which neighborhood(s) of Toronto will be the best choice to start their international growth plan. The information gained will assist in chosing the right location by providing data about the population of each neighborhood, in addition to other established venues present in these areas.

Additionally, this information could be of interest to other potential investors looking to open a new restaurant in Toronto.

2) Data
The necessary information needed by the investing group will come from the following sources:

City of Toronto Neighborhood Profiles for providing an overview of the neighborhoods in Toronto
City of Toronto Open Data Catalogue : The Census of Population is held across Canada every five years (the last being in 2016), and collects data about age and sex, families and households, language, immigration and internal migration, ethnocultural diversity, Aboriginal peoples, housing, education, income, and labour. City of Toronto Neighborhood Profiles use this Census data to provide a portrait of the demographic, social and economic characteristics of the people and households in each City of Toronto neighborhood. The profiles present selected highlights from the data, but these accompanying data files provide the full data set assembled for each neighborhood.

In these profiles of the City of Toronto's 140 social planning neighbourhoods. These social planning neighbourhoods were developed by the City of Toronto to help government and community organizations with local planning by providing socio-economic data at a meaningful geographic area. The boundaries of these social planning neighbourhoods are consistent over time, allowing for comparison between Census years. Neighbourhood level data from a variety of other sources are also available through the City's Wellbeing Toronto mapping application and here on the Open Data portal.

Each data point in this file is presented for the City's 140 neighbourhoods, as well as for the City of Toronto as a whole. The data is sourced from a number of Census tables released by Statistics Canada. The general Census Profile is the main source table for this data, but other Census tables have also been used to provide additional information. CSV File

City of Toronto Neighborhood Shapes for mapping : GeoJSON File
Wikipedia for Toronto Neighborhood Borough Designation : Each of the 140 social planning neighborhoods of Toronto reside within a defined borough. While the City of Toronto is by definition a singular municipality, the 140 neighbordhoods are still grouped into six destinct boroughs.
Foursquare API to collect information on other venues/competitors in the neighborhoods
To assess the neighborhoods and provide guidance to the investors, we will be utilizing the data from the 2016 Toronto Census, Toronto Neighborhood shapes to map the neighborhoods, and the Foursquare API to collect information on venues/competitors in the neighborhoods.

Import and install the necessary libaries and tools

In [32]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

from bs4 import BeautifulSoup
import xml

import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cm

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print('Libraries imported and loaded.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

   

Pull in the Toronto Census data file and create a dataframe

In [33]:
# City of Toronto Open Data Catalogue - City of Toronto Neighborhoods Profile Census CSV File

path = 'https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv'
df = pd.read_csv(path,encoding='latin1')
df.head()

Unnamed: 0,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,Bedford Park-Nortown,Beechborough-Greenbrook,Bendale,Birchcliffe-Cliffside,Black Creek,Blake-Jones,Briar Hill-Belgravia,Bridle Path-Sunnybrook-York Mills,Broadview North,Brookhaven-Amesbury,Cabbagetown-South St. James Town,Caledonia-Fairbank,Casa Loma,Centennial Scarborough,Church-Yonge Corridor,Clairlea-Birchmount,Clanton Park,Cliffcrest,Corso Italia-Davenport,Danforth,Danforth East York,Don Valley Village,Dorset Park,Dovercourt-Wallace Emerson-Junction,Downsview-Roding-CFB,Dufferin Grove,East End-Danforth,Edenbridge-Humber Valley,Eglinton East,Elms-Old Rexdale,Englemount-Lawrence,Eringate-Centennial-West Deane,Etobicoke West Mall,Flemingdon Park,Forest Hill North,Forest Hill South,Glenfield-Jane Heights,Greenwood-Coxwell,Guildwood,Henry Farm,High Park North,High Park-Swansea,Highland Creek,Hillcrest Village,Humber Heights-Westmount,Humber Summit,Humbermede,Humewood-Cedarvale,Ionview,Islington-City Centre West,Junction Area,Keelesdale-Eglinton West,Kennedy Park,Kensington-Chinatown,Kingsview Village-The Westway,Kingsway South,Lambton Baby Point,L'Amoreaux,Lansing-Westgate,Lawrence Park North,Lawrence Park South,Leaside-Bennington,Little Portugal,Long Branch,Malvern,Maple Leaf,Markland Wood,Milliken,Mimico (includes Humber Bay Shores),Morningside,Moss Park,Mount Dennis,Mount Olive-Silverstone-Jamestown,Mount Pleasant East,Mount Pleasant West,New Toronto,Newtonbrook East,Newtonbrook West,Niagara,North Riverdale,North St. James Town,Oakridge,Oakwood Village,O'Connor-Parkview,Old East York,Palmerston-Little Italy,Parkwoods-Donalda,Pelmo Park-Humberlea,Playter Estates-Danforth,Pleasant View,Princess-Rosethorn,Regent Park,Rexdale-Kipling,Rockcliffe-Smythe,Roncesvalles,Rosedale-Moore Park,Rouge,Runnymede-Bloor West Village,Rustic,Scarborough Village,South Parkdale,South Riverdale,St.Andrew-Windfields,Steeles,Stonegate-Queensway,Tam O'Shanter-Sullivan,Taylor-Massey,The Beaches,Thistletown-Beaumond Heights,Thorncliffe Park,Trinity-Bellwoods,University,Victoria Village,Waterfront Communities-The Island,West Hill,West Humber-Clairville,Westminster-Branson,Weston,Weston-Pelham Park,Wexford/Maryvale,Willowdale East,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,42,34,76,52,49,39,112,127,122,24,69,108,41,57,30,71,109,96,133,75,120,33,123,92,66,59,47,126,93,26,83,62,9,138,5,32,11,13,44,102,101,25,65,140,53,88,87,134,48,8,21,22,106,125,14,90,110,124,78,6,15,114,117,38,105,103,56,84,19,132,29,12,130,17,135,73,115,2,99,104,18,50,36,82,68,74,121,107,54,58,80,45,23,67,46,10,72,4,111,86,98,131,89,28,139,85,70,40,116,16,118,61,63,3,55,81,79,43,77,136,1,35,113,91,119,51,37,7,137,64,60,94,100,97,27,31
1,Neighbourhood Information,Neighbourhood Information,City of Toronto,TSNS2020 Designation,,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,Emerging Neighbourhood,No Designation,NIA,No Designation,No Designation,No Designation,NIA,NIA,Emerging Neighbourhood,No Designation,No Designation,NIA,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,Emerging Neighbourhood,NIA,NIA,No Designation,NIA,No Designation,No Designation,NIA,NIA,No Designation,NIA,No Designation,No Designation,Emerging Neighbourhood,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,Emerging Neighbourhood,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,NIA,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,NIA,NIA,NIA,No Designation,No Designation,Emerging Neighbourhood,No Designation,No Designation,NIA,No Designation,NIA,NIA,No Designation,No Designation,NIA,No Designation,NIA,No Designation,Emerging Neighbourhood,NIA,NIA,No Designation,No Designation,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,Emerging Neighbourhood
2,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2016",2731571,29113,23757,12054,30526,27695,15873,25797,21396,13154,23236,6577,29960,22291,21737,7727,14257,9266,11499,17757,11669,9955,10968,13362,31340,26984,16472,15935,14133,9666,17180,27051,25003,36625,35052,11785,21381,15535,22776,9456,22372,18588,11848,21933,12806,10732,30491,14417,9917,15723,22162,23925,12494,16934,10948,12416,15545,14365,13641,43965,14366,11058,17123,17945,22000,9271,7985,43993,16164,14607,15179,16828,15559,10084,43794,10111,10554,26572,33964,17455,20506,13593,32954,16775,29658,11463,16097,23831,31180,11916,18615,13845,21210,18675,9233,13826,34805,10722,7804,15818,11051,10803,10529,22246,14974,20923,46496,10070,9941,16724,21849,27876,17812,24623,25051,27446,15683,21567,10360,21108,16556,7607,17510,65913,27392,33312,26274,17992,11098,27917,50434,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
3,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2011",2615060,30279,21988,11904,29177,26918,15434,19348,17671,13530,23185,6488,27876,21856,22057,7763,14302,8713,11563,17787,12053,9851,10487,13093,28349,24770,14612,15703,13743,9444,16712,26739,24363,34631,34659,11449,20839,14943,22829,9550,22086,18810,10927,22168,12474,10926,31390,14083,9816,11333,21292,21740,13097,17656,10583,12525,15853,14108,13091,38084,14027,10638,17058,18495,21723,9170,7921,44919,14642,14541,15070,17011,12050,9632,45086,10197,10436,27167,26541,17587,16306,13145,32788,15982,28593,10900,16423,23052,21274,12191,17832,13497,21073,18316,9118,13746,34617,8710,7653,16144,11197,10007,10488,22267,15050,20631,45912,9632,9951,16609,21251,25642,17958,25017,24691,27398,15594,21130,10138,19225,16802,7782,17182,43361,26547,34100,25446,18170,12010,27018,45041,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
4,Population,Population and dwellings,Census Profile 98-316-X2016001,Population Change 2011-2016,4.50%,-3.90%,8.00%,1.30%,4.60%,2.90%,2.80%,33.30%,21.10%,-2.80%,0.20%,1.40%,7.50%,2.00%,-1.50%,-0.50%,-0.30%,6.30%,-0.60%,-0.20%,-3.20%,1.10%,4.60%,2.10%,10.60%,8.90%,12.70%,1.50%,2.80%,2.40%,2.80%,1.20%,2.60%,5.80%,1.10%,2.90%,2.60%,4.00%,-0.20%,-1.00%,1.30%,-1.20%,8.40%,-1.10%,2.70%,-1.80%,-2.90%,2.40%,1.00%,38.70%,4.10%,10.10%,-4.60%,-4.10%,3.40%,-0.90%,-1.90%,1.80%,4.20%,15.40%,2.40%,3.90%,0.40%,-3.00%,1.30%,1.10%,0.80%,-2.10%,10.40%,0.50%,0.70%,-1.10%,29.10%,4.70%,-2.90%,-0.80%,1.10%,-2.20%,28.00%,-0.80%,25.80%,3.40%,0.50%,5.00%,3.70%,5.20%,-2.00%,3.40%,46.60%,-2.30%,4.40%,2.60%,0.70%,2.00%,1.30%,0.60%,0.50%,23.10%,2.00%,-2.00%,-1.30%,8.00%,0.40%,-0.10%,-0.50%,1.40%,1.30%,4.50%,-0.10%,0.70%,2.80%,8.70%,-0.80%,-1.60%,1.50%,0.20%,0.60%,2.10%,2.20%,9.80%,-1.50%,-2.20%,1.90%,52.00%,3.20%,-2.30%,3.30%,-1.00%,-7.60%,3.30%,12.00%,12.90%,3.80%,0.30%,7.20%,0.50%,2.60%,11.70%,7.50%,-0.40%,0.80%


After reviewing the data, create a list of neighbordhoods in Toronto

In [34]:
neighborhoods = list(df.columns.values)
neighborhoods = neighborhoods[5:]
print(neighborhoods)

['Agincourt North', 'Agincourt South-Malvern West', 'Alderwood', 'Annex', 'Banbury-Don Mills', 'Bathurst Manor', 'Bay Street Corridor', 'Bayview Village', 'Bayview Woods-Steeles', 'Bedford Park-Nortown', 'Beechborough-Greenbrook', 'Bendale', 'Birchcliffe-Cliffside', 'Black Creek', 'Blake-Jones', 'Briar Hill-Belgravia', 'Bridle Path-Sunnybrook-York Mills', 'Broadview North', 'Brookhaven-Amesbury', 'Cabbagetown-South St. James Town', 'Caledonia-Fairbank', 'Casa Loma', 'Centennial Scarborough', 'Church-Yonge Corridor', 'Clairlea-Birchmount', 'Clanton Park', 'Cliffcrest', 'Corso Italia-Davenport', 'Danforth', 'Danforth East York', 'Don Valley Village', 'Dorset Park', 'Dovercourt-Wallace Emerson-Junction', 'Downsview-Roding-CFB', 'Dufferin Grove', 'East End-Danforth', 'Edenbridge-Humber Valley', 'Eglinton East', 'Elms-Old Rexdale', 'Englemount-Lawrence', 'Eringate-Centennial-West Deane', 'Etobicoke West Mall', 'Flemingdon Park', 'Forest Hill North', 'Forest Hill South', 'Glenfield-Jane Heig

Create dataframe idexing the neighborhoods of Toronto and populate the dataframe with necessary data

In [35]:
toronto_hoods = pd.DataFrame(index=neighborhoods, columns=["population", "male", "female",'higher_education', "after_tax_income"])

# population = Population 2016 per Census Profile 98-316-X2016001
# m2 = Male: 25 to 29 years
# m3 = Male: 30 to 34 years
# m4 = Male: 35 to 39 years
# f2 = Female: 25 to 29 years
# f3 = Female: 30 to 34 years
# f4 = Female: 35 to 39 years
# higher_education = Total - University certificate, diploma or degree at bachelor level or above for the population aged 25 to 64 years in private households - 25% sample data
# after_tax_income =   After-tax income: Average amount ($)

for index, row in toronto_hoods.iterrows():
    toronto_hoods.at[index, 'population'] = df[index][2]
    toronto_hoods.at[index, 'm2'] = df[index][20]
    toronto_hoods.at[index, 'm3'] = df[index][21]
    toronto_hoods.at[index, 'm4'] = df[index][22]
    toronto_hoods.at[index, 'f2'] = df[index][41]
    toronto_hoods.at[index, 'f3'] = df[index][42]
    toronto_hoods.at[index, 'f4'] = df[index][43]
    toronto_hoods.at[index, 'higher_education'] = df[index][1723]
    toronto_hoods.at[index, 'after_tax_income'] = df[index][2354]
toronto_hoods.reset_index(inplace=True)
toronto_hoods.head()

Unnamed: 0,index,population,male,female,higher_education,after_tax_income,m2,m3,m4,f2,f3,f4
0,Agincourt North,29113,,,4240,26955,1015,835,680,1005,935,775
1,Agincourt South-Malvern West,23757,,,4615,27928,1045,820,625,975,835,715
2,Alderwood,12054,,,1980,39159,355,410,455,350,430,450
3,Annex,30526,,,12640,80138,2080,1610,1055,2265,1675,1040
4,Banbury-Don Mills,27695,,,8060,51874,645,735,735,745,860,895


In [36]:
toronto_hoods.columns = toronto_hoods.columns.str.replace('index', 'neighborhood')
toronto_hoods['population'] = toronto_hoods['population'].str.replace(',','').astype(int)
toronto_hoods['male'] = (toronto_hoods['m2'].astype(int) + toronto_hoods['m3'].astype(int) + toronto_hoods['m4'].astype(int))
toronto_hoods['female'] = (toronto_hoods['f2'].astype(int) + toronto_hoods['f3'].astype(int) + toronto_hoods['f4'].astype(int))
toronto_hoods['higher_education'] = toronto_hoods['higher_education'].astype(int)
toronto_hoods['after_tax_income'] = toronto_hoods['after_tax_income'].str.replace(',','').astype(int)
toronto_hoods.head()

Unnamed: 0,neighborhood,population,male,female,higher_education,after_tax_income,m2,m3,m4,f2,f3,f4
0,Agincourt North,29113,2530,2715,4240,26955,1015,835,680,1005,935,775
1,Agincourt South-Malvern West,23757,2490,2525,4615,27928,1045,820,625,975,835,715
2,Alderwood,12054,1220,1230,1980,39159,355,410,455,350,430,450
3,Annex,30526,4745,4980,12640,80138,2080,1610,1055,2265,1675,1040
4,Banbury-Don Mills,27695,2115,2500,8060,51874,645,735,735,745,860,895


Combine male age group demoographics and female age group demographics into male and female categories

In [37]:
toronto_hoods = toronto_hoods.drop(['m2', 'm3', 'm4', 'f2', 'f3', 'f4'], axis=1)
toronto_hoods.head()

Unnamed: 0,neighborhood,population,male,female,higher_education,after_tax_income
0,Agincourt North,29113,2530,2715,4240,26955
1,Agincourt South-Malvern West,23757,2490,2525,4615,27928
2,Alderwood,12054,1220,1230,1980,39159
3,Annex,30526,4745,4980,12640,80138
4,Banbury-Don Mills,27695,2115,2500,8060,51874


Web scrapping borough data from Wikipedia page via BeautifulSoup

In [38]:
# Wikipedia for Toronto Neighborhood Borough Designation
url = requests.get('https://en.wikipedia.org/wiki/List_of_city-designated_neighbourhoods_in_Toronto').text
soup = BeautifulSoup(url,'lxml')

In [39]:
table_post = soup.find('table')
fields = table_post.find_all('td')

CDN_number = []
city_designated_area = []
former_city_borough = []


for i in range(0, len(fields), 5):
    CDN_number.append(fields[i].text.strip())
    city_designated_area.append(fields[i+1].text.strip())
    former_city_borough.append(fields[i+2].text.strip())
            
df_pc = pd.DataFrame(data=[CDN_number, city_designated_area, former_city_borough]).transpose()
df_pc.columns = ['CDN', 'neighborhood', 'borough']
df_pc = df_pc.drop(['CDN'], axis=1)
df_pc.reset_index()
df_pc.head()

Unnamed: 0,neighborhood,borough
0,Agincourt North,Scarborough
1,Agincourt South-Malvern West,Scarborough
2,Alderwood,Etobicoke
3,Annex,Old City of Toronto
4,Banbury-Don Mills,North York


In [40]:
df_xy = pd.merge(df_pc, toronto_hoods, left_index=True, right_index=True, how='inner')
df_xy = df_xy.drop(['neighborhood_y'], axis=1)
df_xy = df_xy.rename(columns={'neighborhood_x':'neighborhood'})
df_xy.head()

Unnamed: 0,neighborhood,borough,population,male,female,higher_education,after_tax_income
0,Agincourt North,Scarborough,29113,2530,2715,4240,26955
1,Agincourt South-Malvern West,Scarborough,23757,2490,2525,4615,27928
2,Alderwood,Etobicoke,12054,1220,1230,1980,39159
3,Annex,Old City of Toronto,30526,4745,4980,12640,80138
4,Banbury-Don Mills,North York,27695,2115,2500,8060,51874


Pull in the Toronto shape data file, create a dataframe, sort by AREA_NAME

In [41]:
path2 = 'https://ckan0.cf.opendata.inter.sandbox-toronto.ca/download_resource/1d02b0f0-d735-4469-8f71-ea6d96b319e4?format=csv&projection=4326'
df_2 = pd.read_csv(path2,encoding='latin1')
df_2.sort_values('AREA_NAME').head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
74,1335,25886428,25926736,49885,129,129,Agincourt North (129),Agincourt North (129),,,-79.266712,43.805441,16492689,13951450.0,17159.740667,"{u'type': u'Polygon', u'coordinates': (((-79.2..."
75,1336,25886449,25926737,49885,128,128,Agincourt South-Malvern West (128),Agincourt South-Malvern West (128),,,-79.265612,43.788658,16492705,15117360.0,21320.849547,"{u'type': u'Polygon', u'coordinates': (((-79.2..."
76,1337,25886794,25926738,49885,20,20,Alderwood (20),Alderwood (20),,,-79.541611,43.604937,16492721,9502180.0,12667.013917,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
77,1338,25886874,25926739,49885,95,95,Annex (95),Annex (95),,,-79.404001,43.671585,16492737,5337192.0,10513.883143,"{u'type': u'Polygon', u'coordinates': (((-79.3..."
78,1339,25886643,25926740,49885,42,42,Banbury-Don Mills (42),Banbury-Don Mills (42),,,-79.349718,43.737657,16492753,19248970.0,25141.57229,"{u'type': u'Polygon', u'coordinates': (((-79.3..."


Remove unnessary data and sort our dataframe

In [42]:

df_3 = df_2.filter(['AREA_NAME','AREA_DESC','LONGITUDE','LATITUDE'], axis=1)
df_3['AREA_NAME'] = df_3['AREA_NAME'].str.replace('\d+','').str.replace("(","").str.replace(')', '')
df_3.columns = map(str.lower, df_3.columns)
df_3.columns = df_3.columns.str.replace('area_name', 'neighborhood')
df_3 = df_3.sort_values(by=['neighborhood'])
df_3 = df_3.reset_index()
df_3 = df_3.drop(['index'], axis=1)
df_3.head()

Unnamed: 0,neighborhood,area_desc,longitude,latitude
0,Agincourt North,Agincourt North (129),-79.266712,43.805441
1,Agincourt South-Malvern West,Agincourt South-Malvern West (128),-79.265612,43.788658
2,Alderwood,Alderwood (20),-79.541611,43.604937
3,Annex,Annex (95),-79.404001,43.671585
4,Banbury-Don Mills,Banbury-Don Mills (42),-79.349718,43.737657


Merge the two dataframes together to get a working dataframe

In [43]:
df_pos = pd.merge(df_3, df_xy, left_index=True, right_index=True, how='inner')
df_pos = df_pos.drop(['neighborhood_y'], axis=1)
df_pos.columns = df_pos.columns.str.replace('neighborhood_x', 'neighborhood')
df_pos = df_pos[['borough','neighborhood','area_desc', 'longitude','latitude','population', 'male', 'female', 'higher_education', 'after_tax_income']]
df_pos = df_pos.drop(['neighborhood'], axis=1)
df_pos.columns = df_pos.columns.str.replace('area_desc', 'neighborhood') 
df_pos.head()

Unnamed: 0,borough,neighborhood,longitude,latitude,population,male,female,higher_education,after_tax_income
0,Scarborough,Agincourt North (129),-79.266712,43.805441,29113,2530,2715,4240,26955
1,Scarborough,Agincourt South-Malvern West (128),-79.265612,43.788658,23757,2490,2525,4615,27928
2,Etobicoke,Alderwood (20),-79.541611,43.604937,12054,1220,1230,1980,39159
3,Old City of Toronto,Annex (95),-79.404001,43.671585,30526,4745,4980,12640,80138
4,North York,Banbury-Don Mills (42),-79.349718,43.737657,27695,2115,2500,8060,51874


Calculate medians for Toronto neighborhoods to assist in creating the scoring system

In [44]:
pop_med = df_pos['population'].median()
male_med = df_pos['male'].median()
female_med = df_pos['female'].median()
edu_med = df_pos['higher_education'].median()
income_med = df_pos['after_tax_income'].median()

print('Median Population:',pop_med)
print('Median Male: ',male_med)
print('Median Female: ',female_med)
print('Median Higher Education: ',edu_med)
print('Median After Tax Income: ',income_med)

Median Population: 16749.5
Median Male:  1800.0
Median Female:  1952.5
Median Higher Education:  4122.5
Median After Tax Income:  36538.5


Calculate Neighboorhood Scores based off Toronto neighboorhood medians
   Scores are calulated based on the importance of each category established by the investor group

In [45]:
df_score = pd.DataFrame(columns=["neighborhood","pop_score", "male_score", "female_score", "edu_score", "income_score", "total_score"])
df_score['neighborhood'] = df_pos['neighborhood']

# Each score category was derived by taking the value of the category, dividing by the median of said category, 
# and multiplying value by the importance factor given by the investors. 

pop_score = [] # 15% importance
for x in df_pos['population']:
  if x / pop_med >0:
    pop_score.append((x / pop_med)*.15)
  else:
    pop_score.append(0)
df_score['pop_score'] = pop_score

male_score = [] # 25% importance
for z in df_pos['male']:
  if z / male_med >0:
    male_score.append((z / male_med)*.25)
  else:
    male_score.append(0)
df_score['male_score'] = male_score

female_score = [] # 25% importance
for z in df_pos['female']:
  if z / female_med >0:
    female_score.append((z / female_med)*.25)
  else:
    female_score.append(0)
df_score['female_score'] = female_score

edu_score = [] # 15% importance
for z in df_pos['higher_education']:
  if z / edu_med >0:
    edu_score.append((z / edu_med)*.15)
  else:
    edu_score.append(0)
df_score['edu_score'] = edu_score

income_score = [] # 20% importance
for z in df_pos['after_tax_income']:
  if z / income_med >0:
    income_score.append((z / income_med)*.2)
  else:
    income_score.append(0)
df_score['income_score'] = income_score

# Add each category to get overall neighborhood score
df_score['total_score'] = round(df_score.iloc[:,-7:].sum(axis=1),2)

df_score.head()

Unnamed: 0,neighborhood,pop_score,male_score,female_score,edu_score,income_score,total_score
0,Agincourt North (129),0.260721,0.351389,0.347631,0.154275,0.147543,1.26
1,Agincourt South-Malvern West (128),0.212756,0.345833,0.323303,0.16792,0.152869,1.2
2,Alderwood (20),0.107949,0.169444,0.15749,0.072044,0.214344,0.72
3,Annex (95),0.273375,0.659028,0.637644,0.459915,0.43865,2.47
4,Banbury-Don Mills (42),0.248022,0.29375,0.320102,0.293269,0.283942,1.44


Merge Into full Toronto neighborhoods working dataframe and drop unnecessary data

In [46]:
df_final = pd.merge(df_pos, df_score, left_index=True, right_index=True, how='inner')
df_final = df_final.drop(['neighborhood_y'], axis=1)
df_final.columns = df_final.columns.str.replace('neighborhood_x', 'neighborhood')
df_final = df_final.drop(columns = ['population','male','female','higher_education','after_tax_income'])
df_final.head()

Unnamed: 0,borough,neighborhood,longitude,latitude,pop_score,male_score,female_score,edu_score,income_score,total_score
0,Scarborough,Agincourt North (129),-79.266712,43.805441,0.260721,0.351389,0.347631,0.154275,0.147543,1.26
1,Scarborough,Agincourt South-Malvern West (128),-79.265612,43.788658,0.212756,0.345833,0.323303,0.16792,0.152869,1.2
2,Etobicoke,Alderwood (20),-79.541611,43.604937,0.107949,0.169444,0.15749,0.072044,0.214344,0.72
3,Old City of Toronto,Annex (95),-79.404001,43.671585,0.273375,0.659028,0.637644,0.459915,0.43865,2.47
4,North York,Banbury-Don Mills (42),-79.349718,43.737657,0.248022,0.29375,0.320102,0.293269,0.283942,1.44


Create a map to evaluate neighborhood scores

In [47]:
!wget --quiet https://ckan0.cf.opendata.inter.sandbox-toronto.ca/download_resource/1d02b0f0-d735-4469-8f71-ea6d96b319e4?format=geojson&projection=4326 
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [48]:
toronto_geo = r'Neighbourhoods.geojson'

address = 'Toronto'

geolocator = Nominatim(user_agent="torcan_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10.5, tiles = 'Mapbox Bright')

In [49]:
# create a numpy array of length 5 and has linear spacing from the minium total score to the maximum total score
threshold_scale = np.linspace(df_final['total_score'].min(),
                              df_final['total_score'].max(),
                              5, dtype=float)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum score

# let Folium determine the scale.
map_toronto.choropleth(
    geo_data=toronto_geo,
    data=df_final,
    columns=['neighborhood', 'total_score'],
    key_on='feature.properties.AREA_NAME',
    threshold_scale=threshold_scale,
    fill_color='Blues', 
    fill_opacity=0.8, 
    line_opacity=0.5,
    legend_name='Toronto Neighborhood Scores',
    reset=True
)

for lat, lng, neighborhood, total_score in zip(df_final['latitude'], df_final['longitude'], df_final['neighborhood'], df_final['total_score']):
    label = '{}, Total Score: {}'.format(neighborhood, total_score)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='#071521',
        fill=False,
        fill_color='#071521',
        fill_opacity=0.1,
        parse_html=False).add_to(map_toronto) 
    
    map_toronto

FileNotFoundError: [Errno 2] No such file or directory: 'Neighbourhoods.geojson'

Create dataframe to review the top 15 of the neighborhood scores

In [50]:
top15_hoods = df_final.sort_values(by=['total_score'],ascending=False).head(15)
top15_hoods = top15_hoods.reset_index().drop('index', axis=1)
top15_hoods

Unnamed: 0,borough,neighborhood,longitude,latitude,pop_score,male_score,female_score,edu_score,income_score,total_score
0,Old City of Toronto,Waterfront Communities-The Island (77),-79.377202,43.63388,0.590283,2.525,2.151729,1.298059,0.297177,6.86
1,North York,Niagara (82),-79.41242,43.636681,0.279232,1.230556,1.142125,0.573257,0.299974,3.53
2,North York,Willowdale East (51),-79.401484,43.770602,0.451661,1.027083,1.013444,0.771195,0.201103,3.46
3,Old City of Toronto,Church-Yonge Corridor (75),-79.379017,43.659649,0.280665,0.961806,0.748399,0.500849,0.235582,2.73
4,Scarborough,Islington-City Centre West (14),-79.543317,43.633463,0.393728,0.796528,0.748399,0.455549,0.232905,2.63
5,Scarborough,Dovercourt-Wallace Emerson-Junction (93),-79.438541,43.665677,0.327995,0.831944,0.78169,0.352213,0.184884,2.48
6,Old City of Toronto,Mount Pleasant West (104),-79.39336,43.704435,0.265602,0.74375,0.756722,0.462644,0.248729,2.48
7,Old City of Toronto,Annex (95),-79.404001,43.671585,0.273375,0.659028,0.637644,0.459915,0.43865,2.47
8,Scarborough,Woburn (137),-79.228586,43.76674,0.478984,0.739583,0.762484,0.318011,0.149656,2.45
9,Scarborough,Mimico (includes Humber Bay Shores) (17),-79.500137,43.615924,0.304164,0.711806,0.659411,0.352577,0.240683,2.27


Define Foursquare credentials and version

In [51]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


Get the top venues for all neighborhoods within a radius of 1610 meters

In [56]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT = 100, radius=1610): # 1610 meters = 1.00 mile
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['neighborhood', 
                  'neighborhood_latitude', 
                  'neighborhood_longitude', 
                  'venue', 
                  'venue_latitude', 
                  'venue_longitude', 
                  'venue_category']
    return(nearby_venues)

Problem Statement

Settling down in Toronto – Best place for an Indian based on number of Indian restaurants

Toronto is a large city and it is not easy for an Indian migrating to Toronto to determine the neighbourhood he should choose to settle down as it has a lot of diversity in demographics and various other factors. Let us take the number of Indian restaurants in the neighbourhood to choose which neighbourhood would be best for him to settle down.

We visualise all neighbourhoods of Toronto and also compare the number of Indian restaurants as a parameter to choose the best place to settle down.

Data

For this project we need the following data:
    
Data source : http://cocl.us/Geospatial_data

Description : We will use this data for Toronto City if required, which contains a list of Neighbourhoods along with their latitude and longitude.

Data source : Fousquare API

Description : Using Foursquare api, we will get all the venues in each neighbourhood and then we will filter these venues for only Indian restaurants. I will use the Kmeans

Solution methodology

From the above 2 data sources, I will be conducting neighborhood analysis leveraging primarily Foursquare APIs to deliver recommendations options to the target user community primarily comprising of the Indian community.

In [1]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

   

In [2]:
CLIENT_ID = '1UKCODX3PQXELJTD4TVKCTIQAXGVDZ3JPZXGTTEPZGGRTXLY' # your Foursquare ID
CLIENT_SECRET = 'NQJ2ZKJXTR1BYOIDLE3HZK1RH3NQNSWLMZSBUB3IN5ASJQ3E' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1UKCODX3PQXELJTD4TVKCTIQAXGVDZ3JPZXGTTEPZGGRTXLY
CLIENT_SECRET:NQJ2ZKJXTR1BYOIDLE3HZK1RH3NQNSWLMZSBUB3IN5ASJQ3E


In [3]:
address = 'Toronto, Ontario'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

  app.launch_new_instance()


43.653963 -79.387207


In [4]:
search_query = '"Indian Restaurant"'
radius = 10000
LIMIT=100

In [5]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/search?client_id=1UKCODX3PQXELJTD4TVKCTIQAXGVDZ3JPZXGTTEPZGGRTXLY&client_secret=NQJ2ZKJXTR1BYOIDLE3HZK1RH3NQNSWLMZSBUB3IN5ASJQ3E&ll=43.653963,-79.387207&v=20180605&query="Indian Restaurant"&radius=10000&limit=100


In [6]:
# Get the json output for the search query "Indian"
result = requests.get(url).json()  
result

{'meta': {'code': 200, 'requestId': '5cf196739fb6b7757ed704a4'},
 'response': {'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d10f941735',
      'name': 'Indian Restaurant',
      'pluralName': 'Indian Restaurants',
      'primary': True,
      'shortName': 'Indian'}],
    'hasPerk': False,
    'id': '4aef8854f964a5201cd921e3',
    'location': {'address': '287 King St. W',
     'cc': 'CA',
     'city': 'Toronto',
     'country': 'Canada',
     'crossStreet': 'at John St.',
     'distance': 857,
     'formattedAddress': ['287 King St. W (at John St.)',
      'Toronto ON M5V 1J5',
      'Canada'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.646462521503445,
       'lng': -79.38964414801342}],
     'lat': 43.646462521503445,
     'lng': -79.38964414801342,
     'postalCode': 'M5V 1J5',
     'state': 'ON'},
    'name': 'Aroma Fine Indian Restaurant',
    'referralId

In [7]:
# assigning relevant part of JSON to venues
venues = result['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d10f941735', 'pluralNam...",False,4aef8854f964a5201cd921e3,287 King St. W,CA,Toronto,Canada,at John St.,857,"[287 King St. W (at John St.), Toronto ON M5V ...","[{'lat': 43.646462521503445, 'label': 'display...",43.646463,-79.389644,,M5V 1J5,ON,Aroma Fine Indian Restaurant,v-1559336564,
1,"[{'id': '4bf58dd8d48988d10f941735', 'pluralNam...",False,5165c333e4b07a7ad88d8a69,,CA,,Canada,,651,[Canada],"[{'lat': 43.65814977325445, 'label': 'display'...",43.65815,-79.381563,,,,Joe's Indian Restaurant,v-1559336564,
2,"[{'id': '4bf58dd8d48988d16e941735', 'pluralNam...",False,53a07ba3498ee8946e98a7de,552 Mt Pleasant,CA,Toronto,Canada,,1098,"[552 Mt Pleasant, Toronto ON M4S 2M6, Canada]","[{'lat': 43.64430171166487, 'label': 'display'...",43.644302,-79.390002,,M4S 2M6,ON,Marigold Indian Bistro | Indian Restaurants in...,v-1559336564,
3,"[{'id': '4bf58dd8d48988d10f941735', 'pluralNam...",False,4ae4c793f964a5201b9e21e3,1410 Gerrard St E,CA,Toronto,Canada,at Ashdale Ave,5639,"[1410 Gerrard St E (at Ashdale Ave), Toronto O...","[{'lat': 43.672339, 'label': 'display', 'lng':...",43.672339,-79.321941,"Little India, Toronto, ON",M4L 1Z2,ON,The Famous Indian Restaurant,v-1559336564,
4,"[{'id': '4bf58dd8d48988d144941735', 'pluralNam...",False,4dbb05154b222080d36d3d2f,,CA,Toronto,Canada,,6690,"[Toronto ON, Canada]","[{'lat': 43.69659146611285, 'label': 'display'...",43.696591,-79.445784,,,ON,Roti King West Indian Restaurant,v-1559336564,


In [8]:
# Obtaining only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Aroma Fine Indian Restaurant,Indian Restaurant,287 King St. W,CA,Toronto,Canada,at John St.,857,"[287 King St. W (at John St.), Toronto ON M5V ...","[{'lat': 43.646462521503445, 'label': 'display...",43.646463,-79.389644,,M5V 1J5,ON,4aef8854f964a5201cd921e3
1,Joe's Indian Restaurant,Indian Restaurant,,CA,,Canada,,651,[Canada],"[{'lat': 43.65814977325445, 'label': 'display'...",43.65815,-79.381563,,,,5165c333e4b07a7ad88d8a69
2,Marigold Indian Bistro | Indian Restaurants in...,Fast Food Restaurant,552 Mt Pleasant,CA,Toronto,Canada,,1098,"[552 Mt Pleasant, Toronto ON M4S 2M6, Canada]","[{'lat': 43.64430171166487, 'label': 'display'...",43.644302,-79.390002,,M4S 2M6,ON,53a07ba3498ee8946e98a7de
3,The Famous Indian Restaurant,Indian Restaurant,1410 Gerrard St E,CA,Toronto,Canada,at Ashdale Ave,5639,"[1410 Gerrard St E (at Ashdale Ave), Toronto O...","[{'lat': 43.672339, 'label': 'display', 'lng':...",43.672339,-79.321941,"Little India, Toronto, ON",M4L 1Z2,ON,4ae4c793f964a5201b9e21e3
4,Roti King West Indian Restaurant,Caribbean Restaurant,,CA,Toronto,Canada,,6690,"[Toronto ON, Canada]","[{'lat': 43.69659146611285, 'label': 'display'...",43.696591,-79.445784,,,ON,4dbb05154b222080d36d3d2f
5,Patio Indian Restaurant,Indian Restaurant,15 Gervais Dr.,CA,Toronto,Canada,across superstore,8646,"[15 Gervais Dr. (across superstore), Toronto O...","[{'lat': 43.722103, 'label': 'display', 'lng':...",43.722103,-79.335655,,M3C 1Y8,ON,59e94c0260255e613b025e38
6,Blue Water Curry & Roti West Indian Restaurant,Indian Restaurant,1646 Victoria Park Ave,CA,North York,Canada,,10788,"[1646 Victoria Park Ave, North York ON M1R 1P7...","[{'lat': 43.7309565, 'label': 'display', 'lng'...",43.730956,-79.305799,,M1R 1P7,ON,5c7da4b08ad62e0039395a08
7,Hemispheres Restaurant & Bistro,American Restaurant,110 Chestnut Street,CA,Toronto,Canada,,145,"[110 Chestnut Street, Toronto ON M5G 1R3, Canada]","[{'lat': 43.65488413420439, 'label': 'display'...",43.654884,-79.385931,,M5G 1R3,ON,4ad4c05ff964a52048f720e3
8,Indian Biriyani House,Indian Restaurant,181 Dundas St W,CA,Toronto,Canada,W of Chestnut St,136,"[181 Dundas St W (W of Chestnut St), Toronto O...","[{'lat': 43.65511996683289, 'label': 'display'...",43.65512,-79.386645,,M5G 1C7,ON,4afd920ff964a520ad2822e3
9,Hong Shing Chinese Restaurant,Chinese Restaurant,195 Dundas St W,CA,Toronto,Canada,at University Ave,107,"[195 Dundas St W (at University Ave), Toronto ...","[{'lat': 43.65492521335936, 'label': 'display'...",43.654925,-79.387089,,M5G 1C7,ON,4b2027b5f964a520f82d24e3


In [9]:
dataframe_filtered.describe()

Unnamed: 0,distance,lat,lng
count,50.0,50.0,50.0
mean,1503.2,43.657617,-79.387907
std,2180.361691,0.016769,0.022916
min,96.0,43.63906,-79.459608
25%,541.0,43.650117,-79.397295
50%,892.0,43.654453,-79.387066
75%,1198.5,43.656496,-79.382412
max,10788.0,43.730956,-79.305799


In [10]:
dataframe_cleaned = dataframe_filtered[dataframe_filtered['address'].notnull()]  # get rid records with address "Not available" 
dataframe_TOR=dataframe_cleaned[dataframe_cleaned.state == 'ON']   # get rid of "non ON" states
df_withpostcode=dataframe_TOR[dataframe_TOR['postalCode'].notnull()]  # get rid records with no Postcode
df_withpostcode

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Aroma Fine Indian Restaurant,Indian Restaurant,287 King St. W,CA,Toronto,Canada,at John St.,857,"[287 King St. W (at John St.), Toronto ON M5V ...","[{'lat': 43.646462521503445, 'label': 'display...",43.646463,-79.389644,,M5V 1J5,ON,4aef8854f964a5201cd921e3
2,Marigold Indian Bistro | Indian Restaurants in...,Fast Food Restaurant,552 Mt Pleasant,CA,Toronto,Canada,,1098,"[552 Mt Pleasant, Toronto ON M4S 2M6, Canada]","[{'lat': 43.64430171166487, 'label': 'display'...",43.644302,-79.390002,,M4S 2M6,ON,53a07ba3498ee8946e98a7de
3,The Famous Indian Restaurant,Indian Restaurant,1410 Gerrard St E,CA,Toronto,Canada,at Ashdale Ave,5639,"[1410 Gerrard St E (at Ashdale Ave), Toronto O...","[{'lat': 43.672339, 'label': 'display', 'lng':...",43.672339,-79.321941,"Little India, Toronto, ON",M4L 1Z2,ON,4ae4c793f964a5201b9e21e3
5,Patio Indian Restaurant,Indian Restaurant,15 Gervais Dr.,CA,Toronto,Canada,across superstore,8646,"[15 Gervais Dr. (across superstore), Toronto O...","[{'lat': 43.722103, 'label': 'display', 'lng':...",43.722103,-79.335655,,M3C 1Y8,ON,59e94c0260255e613b025e38
6,Blue Water Curry & Roti West Indian Restaurant,Indian Restaurant,1646 Victoria Park Ave,CA,North York,Canada,,10788,"[1646 Victoria Park Ave, North York ON M1R 1P7...","[{'lat': 43.7309565, 'label': 'display', 'lng'...",43.730956,-79.305799,,M1R 1P7,ON,5c7da4b08ad62e0039395a08
7,Hemispheres Restaurant & Bistro,American Restaurant,110 Chestnut Street,CA,Toronto,Canada,,145,"[110 Chestnut Street, Toronto ON M5G 1R3, Canada]","[{'lat': 43.65488413420439, 'label': 'display'...",43.654884,-79.385931,,M5G 1R3,ON,4ad4c05ff964a52048f720e3
8,Indian Biriyani House,Indian Restaurant,181 Dundas St W,CA,Toronto,Canada,W of Chestnut St,136,"[181 Dundas St W (W of Chestnut St), Toronto O...","[{'lat': 43.65511996683289, 'label': 'display'...",43.65512,-79.386645,,M5G 1C7,ON,4afd920ff964a520ad2822e3
9,Hong Shing Chinese Restaurant,Chinese Restaurant,195 Dundas St W,CA,Toronto,Canada,at University Ave,107,"[195 Dundas St W (at University Ave), Toronto ...","[{'lat': 43.65492521335936, 'label': 'display'...",43.654925,-79.387089,,M5G 1C7,ON,4b2027b5f964a520f82d24e3
10,Rol San Restaurant 龍笙棧,Dim Sum Restaurant,323 Spadina Ave.,CA,Toronto,Canada,at D'Arcy St.,922,"[323 Spadina Ave. (at D'Arcy St.), Toronto ON ...","[{'lat': 43.65431754076345, 'label': 'display'...",43.654318,-79.39865,Kensington Market,M5T 2E9,ON,4ad4c060f964a5207ff720e3
11,360 Restaurant,Wine Bar,301 Front St W,CA,Toronto,Canada,301 Front St. W,1271,"[301 Front St W (301 Front St. W), Toronto ON ...","[{'lat': 43.642537317144566, 'label': 'display...",43.642537,-79.387042,,M5V 2T6,ON,4ad4c05cf964a520dff520e3


In [11]:
# define the dataframe columns
column_names = ['postalcode', 'Latitude', 'Longitude'] 

# instantiate the dataframe
df_postcode = pd.DataFrame(columns=column_names)
df_postcode

Unnamed: 0,postalcode,Latitude,Longitude


In [12]:
#Filling data for each row
df_postcode = df_withpostcode[['postalCode','lat','lng']]
df_postcode.head()

Unnamed: 0,postalCode,lat,lng
0,M5V 1J5,43.646463,-79.389644
2,M4S 2M6,43.644302,-79.390002
3,M4L 1Z2,43.672339,-79.321941
5,M3C 1Y8,43.722103,-79.335655
6,M1R 1P7,43.730956,-79.305799


In [13]:
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [14]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postalCode in zip(df_postcode['lat'], df_postcode['lng'], df_postcode['postalCode']):
    label = '{}'.format(postalCode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [15]:
#Filling data for each row
df_postcode = df_withpostcode[['postalCode','lat','lng']]
df_postcode.head()

Unnamed: 0,postalCode,lat,lng
0,M5V 1J5,43.646463,-79.389644
2,M4S 2M6,43.644302,-79.390002
3,M4L 1Z2,43.672339,-79.321941
5,M3C 1Y8,43.722103,-79.335655
6,M1R 1P7,43.730956,-79.305799


In [16]:
df_postcode['postalCode']=df_postcode['postalCode'].str[:3]
df_postcode.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,postalCode,lat,lng
0,M5V,43.646463,-79.389644
2,M4S,43.644302,-79.390002
3,M4L,43.672339,-79.321941
5,M3C,43.722103,-79.335655
6,M1R,43.730956,-79.305799


In [17]:
df_postcode=df_postcode.groupby('postalCode').count() 
df_postcode

Unnamed: 0_level_0,lat,lng
postalCode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1R,1,1
M3C,1,1
M4L,1,1
M4S,1,1
M4W,1,1
M5C,1,1
M5E,1,1
M5G,6,6
M5H,4,4
M5J,2,2


Results
Highest count of Indian Restaurants are in M5G (6), M5T (5) and M5V (5)

We can see that postal code starting with M5G is the best place. Second best place for an indian would be M5T and M5V. Quick search in wikipedia "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M" indicates that

for M5G Borough is "DownTown Toronto" for M5T Borough is "DownTown Toronto" for M5V Borough is "DownTown Toronto"

Recommendations
With this we conculde that for any new Indian, it would be best to settle down near "DownTown Toronto" on the basis of the number of Indian Resturants.