## Introduction

### 1. Background

East York, Etobicoke, North York, Scarborough, York, and Old Toronto are the 6 boroughs which combine together to form the City of Toronto. 

One of these boroughs, Etobicoke is home to several lakefront parks, golf courses, and vast Centennial Park, with a conservatory featuring tropical plants. The 1830s Montgomery’s Inn has a museum, tea room, and pub and hosts a weekly farmers’ market. Islington - City Centre West area is a busy commercial hub, containing shopping complexes and casual chain eateries, plus history-themed murals along Dundas Street West.

Affordable housing has always been a problem in Toronto. Mr. X is very interested in buying a home in Etobicoke but is unsure about the neighbourhood. He has requested us to find and suggest the best neighbourhood suited as per his needs.

### 2. Business Problem

Mr. X is interested in a neighbourhood that meets the below criteria:
- price should be around CAD 400000 to CAD 500000
- nearby shopping centre
- nearby restaurants and eateries
- nearby park or green area


We need to find a neighbourhood in Etobicoke which fulfills the above conditions and make our suggestions.

## Data Section

The data needed for our analysis will be collected from various sources. 

#### 1. Toronto neighbourhood geo location and boundaries
This dataset will help us identify the latitude and longitude for all the neighbourhood. Further we will use this data with Foursquare APIs to find nearby information.
We are using the Toronto Open data catalogue to support or needs.

Website url - https://open.toronto.ca/dataset/neighbourhoods/

#### 2. Toronto Housing data
The data came from various sources including Toronto Community Housing Corporation, City of Toronto's Shelter, Support and Housing Administration, City of Toronto Affordable Housing Office and Statistics Canada. Average Home Price data was taken from Realosophy.com.

Website url - https://open.toronto.ca/dataset/wellbeing-toronto-housing/

#### 3. Toronto borough-neighbourhood data
This wiki page provides a list of neighbourhoods for each borough in Toronto City.

Website url - https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto

Before we can use this data for our analysis, it must be cleansed.  Once it is cleansed, we will use the data to:
- find the neighbourhoods in Etobicoke
- find the average house prices in each neighbourhood
- using foursquare apis, compare each neighbourhood

### Preprocessing and Data wrangling

In [1]:
# import libraries
import pandas as pd
import requests as req

%pip install lxml


Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/79/37/d420b7fdc9a550bd29b8cfeacff3b38502d9600b09d7dfae9a69e623b891/lxml-4.5.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 8.8MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.2
Note: you may need to restart the kernel to use updated packages.


#### Read Neighbourhoods data from Wikipedia

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto'
pageData = req.get(url)


df_wiki = pd.read_html(pageData.content, header=0)[11]
df_wiki.rename(columns = {'Former city/borough':'Borough'}, inplace = True)
df_wiki.rename(columns = {'City-designated neighbourhood':'Neighborhood'}, inplace = True)
df_wiki.head()

Unnamed: 0,CDN number,Neighborhood,Borough,Neighbourhoods covered,Map,Unnamed: 5
0,129,Agincourt North,Scarborough,Agincourt and Brimwood,,
1,128,Agincourt South-Malvern West,Scarborough,Agincourt and Malvern,,
2,20,Alderwood,Etobicoke,Alderwood,,
3,95,Annex,Old City of Toronto,The Annex and Seaton Village,,
4,42,Banbury-Don Mills,North York,Don Mills,,


#### Get Neighborhood Coordinates

In [3]:
#Get the geographical coordinates of the neighborhoods using Neighbourhoods.csv file which was downloaded from Toronto open dataset website
url = 'Neighbourhoods.csv'
df_raw_csv=pd.read_csv(url)
df_cord = df_raw_csv[['AREA_NAME','LONGITUDE','LATITUDE']]
df_cord['AREA_NAME_CLEAN'] = df_cord['AREA_NAME'].apply(lambda x: x[0:x.find('(')-1])
df_cord.drop(['AREA_NAME'],axis = 1, inplace= True)
df_cord.rename(columns = {'AREA_NAME_CLEAN':'Neighborhood'}, inplace = True)
df_cord.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,LONGITUDE,LATITUDE,Neighborhood
0,-79.425515,43.676919,Wychwood
1,-79.40359,43.704689,Yonge-Eglinton
2,-79.397871,43.687859,Yonge-St.Clair
3,-79.488883,43.765736,York University Heights
4,-79.457108,43.714672,Yorkdale-Glen Park


In [4]:
df_cord.shape

(140, 3)

In [5]:
#merge two dataframes to generate a single dataframe
df_wiki_cord = pd.merge(df_wiki, df_cord, on='Neighborhood')

In [6]:
df_wiki_cord.head()

Unnamed: 0,CDN number,Neighborhood,Borough,Neighbourhoods covered,Map,Unnamed: 5,LONGITUDE,LATITUDE
0,129,Agincourt North,Scarborough,Agincourt and Brimwood,,,-79.266712,43.805441
1,128,Agincourt South-Malvern West,Scarborough,Agincourt and Malvern,,,-79.265612,43.788658
2,20,Alderwood,Etobicoke,Alderwood,,,-79.541611,43.604937
3,95,Annex,Old City of Toronto,The Annex and Seaton Village,,,-79.404001,43.671585
4,42,Banbury-Don Mills,North York,Don Mills,,,-79.349718,43.737657


#### Get Housing Information

In [7]:
#Read Housing prices from the excel file downloaded from the Toronto Open dataset website
url = 'wellbeing-toronto-housing.xlsx'

df_housing_raw = pd.read_excel(url,sheet_name='RawDataRef_2011')
df_housing = df_housing_raw[['Neighbourhood','Home Prices']]
df_housing.rename(columns = {'Neighbourhood':'Neighborhood'},inplace =True)
df_housing.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Home Prices
0,West Humber-Clairville,317508
1,Mount Olive-Silverstone-Jamestown,251119
2,Thistletown-Beaumond Heights,414216
3,Rexdale-Kipling,392271
4,Elms-Old Rexdale,233832


In [8]:
#merge two dataframes

df_neighborhood_data = pd.merge(df_wiki_cord, df_housing, on='Neighborhood')
df_neighborhood_data.head()

Unnamed: 0,CDN number,Neighborhood,Borough,Neighbourhoods covered,Map,Unnamed: 5,LONGITUDE,LATITUDE,Home Prices
0,129,Agincourt North,Scarborough,Agincourt and Brimwood,,,-79.266712,43.805441,375307
1,128,Agincourt South-Malvern West,Scarborough,Agincourt and Malvern,,,-79.265612,43.788658,332710
2,20,Alderwood,Etobicoke,Alderwood,,,-79.541611,43.604937,504233
3,95,Annex,Old City of Toronto,The Annex and Seaton Village,,,-79.404001,43.671585,993491
4,42,Banbury-Don Mills,North York,Don Mills,,,-79.349718,43.737657,613647


#### Get Neighborhoods for Etobicoke

Only process the cells that have an assigned borough. Select cells with a for Etobicoke bor.

In [9]:
#select cells where Borough is 'Etobicoke'
df_Etobicoke = df_neighborhood_data[df_neighborhood_data['Borough'] == 'Etobicoke']
df_Etobicoke.head()

Unnamed: 0,CDN number,Neighborhood,Borough,Neighbourhoods covered,Map,Unnamed: 5,LONGITUDE,LATITUDE,Home Prices
2,20,Alderwood,Etobicoke,Alderwood,,,-79.541611,43.604937,504233
32,9,Edenbridge-Humber Valley,Etobicoke,Humber Valley,,,-79.522458,43.670886,873268
34,5,Elms-Old Rexdale,Etobicoke,The Elms and Rexdale,,,-79.548983,43.721519,233832
36,11,Eringate-Centennial-West Deane,Etobicoke,Centennial Park and West Deane Park,,,-79.580445,43.658017,423034
37,13,Etobicoke West Mall,Etobicoke,Centennial Park and Eatonville,,,-79.568939,43.645063,298426


In [10]:
# drop unwanted columns 

df_Etobicoke.drop(['CDN number', 'Neighbourhoods covered','Map','Unnamed: 5'], axis = 1, inplace=True) 
df_Etobicoke.rename(columns = {'LONGITUDE':'Longitude','LATITUDE':'Latitude'},inplace =True)
df_Etobicoke.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Borough,Longitude,Latitude,Home Prices
2,Alderwood,Etobicoke,-79.541611,43.604937,504233
32,Edenbridge-Humber Valley,Etobicoke,-79.522458,43.670886,873268
34,Elms-Old Rexdale,Etobicoke,-79.548983,43.721519,233832
36,Eringate-Centennial-West Deane,Etobicoke,-79.580445,43.658017,423034
37,Etobicoke West Mall,Etobicoke,-79.568939,43.645063,298426


In [11]:
df_Etobicoke.shape

(20, 5)

#### Import required libraries

In [12]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0

The following packages will be UPDATED:

  openssl                                 1.1.1g-h516909a_0 --> 1.1

## Methodology

The data is now ready and we can proceed with the analysis. This step will include 2 steps:
1. Exploratory data analysis - We analyse and select our desired neighborhoods for testing our model.
2. Modelling - We will create a machine learning model using k-mean clustering mechanism and try to identify the most suitable neighbourhood amongst the selected ones. Clustering the nieghborhood based on similar venues will let us compare and make better suggestions. We will be creating 5 cluster in our neighborhood.

### Exploratory data analysis

#### Use geopy library to get the latitude and longitude values of Toronto

In [13]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Filtering Neighborhoods based on House Prices

In [14]:
#select neghborhoods which satify the expected Housing price condition CAD 400000 to 500000
etobicokeNeighborhood = df_Etobicoke[[all([a, b]) for a, b in zip(df_Etobicoke['Home Prices'] >=400000 , df_Etobicoke['Home Prices'] <=500000)]]
etobicokeNeighborhood.head()


Unnamed: 0,Neighborhood,Borough,Longitude,Latitude,Home Prices
36,Eringate-Centennial-West Deane,Etobicoke,-79.580445,43.658017,423034
49,Humber Heights-Westmount,Etobicoke,-79.522416,43.692233,491396
54,Islington-City Centre West,Etobicoke,-79.543317,43.633463,491678
68,Long Branch,Etobicoke,-79.533345,43.592362,459088
73,Mimico,Etobicoke,-79.500137,43.615924,429941


##### Descriptive statistical analysis of the data

In [15]:
etobicokeNeighborhood.describe()

Unnamed: 0,Longitude,Latitude,Home Prices
count,7.0,7.0,7.0
mean,-79.536216,43.647225,452880.571429
std,0.028634,0.052811,31628.292755
min,-79.580445,43.592362,414216.0
25%,-79.553404,43.608256,426487.5
50%,-79.533345,43.633463,459088.0
75%,-79.516387,43.675125,476103.5
max,-79.500137,43.737988,491678.0


#### Show Etobicoke's qualified Neighborhoods in the map

In [39]:
# create map of Toronto using latitude and longitude values and show Etobicoke's selected Neighborhoods
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(etobicokeNeighborhood['Latitude'], etobicokeNeighborhood['Longitude'], etobicokeNeighborhood['Borough'], etobicokeNeighborhood['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Modelling

#### Foursquare credentials

In [17]:
CLIENT_ID = 'WZOFWKE5PIGFYE0V3XBOUCDM1N1BYQJG3MFGXNVH1IJJXOOH' # your Foursquare ID
CLIENT_SECRET = 'GDBAN2VC1BORCLVARR33514ZQBERDVEHNZSPUCRGB45P30NO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 500
LIMIT = 100

#### Explore all neighborhood in Etobioke

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
# get all venues 

etobicoke_venues = getNearbyVenues(names=etobicokeNeighborhood['Neighborhood'],
                                   latitudes=etobicokeNeighborhood['Latitude'],
                                   longitudes=etobicokeNeighborhood['Longitude']
                                  )



Eringate-Centennial-West Deane
Humber Heights-Westmount
Islington-City Centre West
Long Branch
Mimico
New Toronto
Thistletown-Beaumond Heights


In [20]:
print(etobicoke_venues.shape)
etobicoke_venues.head()

(63, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Eringate-Centennial-West Deane,43.658017,-79.580445,Pizza Pizza,43.660392,-79.582686,Pizza Place
1,Eringate-Centennial-West Deane,43.658017,-79.580445,Tim Hortons,43.660425,-79.583034,Coffee Shop
2,Eringate-Centennial-West Deane,43.658017,-79.580445,Golden Wok Chinese Restaurant,43.660491,-79.582319,Chinese Restaurant
3,Eringate-Centennial-West Deane,43.658017,-79.580445,Eringate park,43.661668,-79.581093,Park
4,Eringate-Centennial-West Deane,43.658017,-79.580445,Mac's,43.661684,-79.582728,Convenience Store


In [21]:
#Count no. of venues for each neighborhood
etobicoke_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Eringate-Centennial-West Deane,6,6,6,6,6,6
Humber Heights-Westmount,3,3,3,3,3,3
Islington-City Centre West,15,15,15,15,15,15
Long Branch,15,15,15,15,15,15
Mimico,4,4,4,4,4,4
New Toronto,6,6,6,6,6,6
Thistletown-Beaumond Heights,14,14,14,14,14,14


In [22]:
#unique venue categories
print('There are {} uniques categories.'.format(len(etobicoke_venues['Venue Category'].unique())))

There are 34 uniques categories.


#### Analyze each neighborhood

In [23]:
# one hot encoding
etobioke_onehot = pd.get_dummies(etobicoke_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
etobioke_onehot['Neighborhood'] = etobicoke_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [etobioke_onehot.columns[-1]] + list(etobioke_onehot.columns[:-1])
etobioke_onehot = etobioke_onehot[fixed_columns]

etobioke_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Beer Store,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Convenience Store,Fast Food Restaurant,Fried Chicken Joint,Garden Center,Gas Station,Greek Restaurant,Grocery Store,Hockey Arena,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Park,Pharmacy,Pizza Place,Pub,Restaurant,Sandwich Place,Skating Rink,Supermarket,Thai Restaurant,Turkish Restaurant,Vietnamese Restaurant,Wings Joint,Women's Store
0,Eringate-Centennial-West Deane,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,Eringate-Centennial-West Deane,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Eringate-Centennial-West Deane,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Eringate-Centennial-West Deane,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,Eringate-Centennial-West Deane,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
etobioke_onehot.shape

(63, 35)

##### group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [25]:
etobioke_grouped = etobioke_onehot.groupby('Neighborhood').mean().reset_index()
etobioke_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Beer Store,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Convenience Store,Fast Food Restaurant,Fried Chicken Joint,Garden Center,Gas Station,Greek Restaurant,Grocery Store,Hockey Arena,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Park,Pharmacy,Pizza Place,Pub,Restaurant,Sandwich Place,Skating Rink,Supermarket,Thai Restaurant,Turkish Restaurant,Vietnamese Restaurant,Wings Joint,Women's Store
0,Eringate-Centennial-West Deane,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Humber Heights-Westmount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Islington-City Centre West,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.066667,0.066667,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.133333,0.066667,0.0,0.0,0.066667,0.066667,0.066667,0.0,0.066667
3,Long Branch,0.0,0.0,0.0,0.133333,0.066667,0.066667,0.066667,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.066667,0.133333,0.0,0.0,0.0,0.066667,0.0,0.066667,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0
4,Mimico,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
etobioke_grouped.shape

(7, 35)

##### print each neighborhood along with the top 5 most common venues

In [27]:
num_top_venues = 5

for hood in etobioke_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = etobioke_grouped[etobioke_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Eringate-Centennial-West Deane----
                venue  freq
0        Hockey Arena  0.17
1  Chinese Restaurant  0.17
2                Park  0.17
3         Pizza Place  0.17
4   Convenience Store  0.17


----Humber Heights-Westmount----
                 venue  freq
0          Gas Station  0.33
1                 Park  0.33
2          Pizza Place  0.33
3  American Restaurant  0.00
4           Restaurant  0.00


----Islington-City Centre West----
                  venue  freq
0            Restaurant  0.13
1  Fast Food Restaurant  0.13
2        Sandwich Place  0.07
3        Ice Cream Shop  0.07
4           Pizza Place  0.07


----Long Branch----
                venue  freq
0                Bank  0.13
1         Coffee Shop  0.13
2       Grocery Store  0.13
3  Italian Restaurant  0.07
4          Restaurant  0.07


----Mimico----
            venue  freq
0    Skating Rink  0.25
1          Bakery  0.25
2   Grocery Store  0.25
3             Bar  0.25
4  Sandwich Place  0.00


----New Toront

In [28]:
#function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
#create dataframe and display top 10 venues for each neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = etobioke_grouped['Neighborhood']

for ind in np.arange(etobioke_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(etobioke_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Eringate-Centennial-West Deane,Chinese Restaurant,Hockey Arena,Convenience Store,Park,Coffee Shop,Pizza Place,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Women's Store
1,Humber Heights-Westmount,Gas Station,Park,Pizza Place,Greek Restaurant,Garden Center,Fried Chicken Joint,Fast Food Restaurant,Convenience Store,Coffee Shop,Women's Store
2,Islington-City Centre West,Restaurant,Fast Food Restaurant,Women's Store,Garden Center,Greek Restaurant,Ice Cream Shop,Pizza Place,Bank,Fried Chicken Joint,Sandwich Place
3,Long Branch,Grocery Store,Bank,Coffee Shop,Restaurant,Greek Restaurant,Wings Joint,Italian Restaurant,Pharmacy,Pizza Place,Café
4,Mimico,Grocery Store,Bakery,Bar,Skating Rink,Chinese Restaurant,Garden Center,Fried Chicken Joint,Fast Food Restaurant,Convenience Store,Coffee Shop


#### Cluster Neighborhood

##### create 5 clusters in the neighborhood

In [30]:
# set number of clusters
kclusters = 5

etobioke_grouped_clustering = etobioke_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(etobioke_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 0, 3, 3, 2, 1, 1], dtype=int32)

In [31]:
#create dataset with cluster and top 10 venues
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

etobioke_merged = etobicokeNeighborhood

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
etobioke_merged = etobioke_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

etobioke_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Borough,Longitude,Latitude,Home Prices,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Eringate-Centennial-West Deane,Etobicoke,-79.580445,43.658017,423034,4,Chinese Restaurant,Hockey Arena,Convenience Store,Park,Coffee Shop,Pizza Place,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Women's Store
49,Humber Heights-Westmount,Etobicoke,-79.522416,43.692233,491396,0,Gas Station,Park,Pizza Place,Greek Restaurant,Garden Center,Fried Chicken Joint,Fast Food Restaurant,Convenience Store,Coffee Shop,Women's Store
54,Islington-City Centre West,Etobicoke,-79.543317,43.633463,491678,3,Restaurant,Fast Food Restaurant,Women's Store,Garden Center,Greek Restaurant,Ice Cream Shop,Pizza Place,Bank,Fried Chicken Joint,Sandwich Place
68,Long Branch,Etobicoke,-79.533345,43.592362,459088,3,Grocery Store,Bank,Coffee Shop,Restaurant,Greek Restaurant,Wings Joint,Italian Restaurant,Pharmacy,Pizza Place,Café
73,Mimico,Etobicoke,-79.500137,43.615924,429941,2,Grocery Store,Bakery,Bar,Skating Rink,Chinese Restaurant,Garden Center,Fried Chicken Joint,Fast Food Restaurant,Convenience Store,Coffee Shop


#### Show the neighborhood clusters in the map

In [32]:
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

etobioke_merged = etobioke_merged.dropna(subset=['Cluster Labels'])

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(etobioke_merged['Latitude'], etobioke_merged['Longitude'], etobioke_merged['Neighborhood'], etobioke_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Analysis of each cluster

#### Cluster 1

In [33]:
etobioke_merged.loc[etobioke_merged['Cluster Labels'] == 0, etobioke_merged.columns[[1] + list(range(5, etobioke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Etobicoke,0,Gas Station,Park,Pizza Place,Greek Restaurant,Garden Center,Fried Chicken Joint,Fast Food Restaurant,Convenience Store,Coffee Shop,Women's Store


Cluster 1 satisfies all the criteria laid by Mr. X. Although park and eateries are sufficient, it only provides one store for Women. It doesn't provide a full shopping experience to its residents.

#### Cluster 2

In [34]:
etobioke_merged.loc[etobioke_merged['Cluster Labels'] == 1, etobioke_merged.columns[[1] + list(range(5, etobioke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
80,Etobicoke,1,Pub,Coffee Shop,Supermarket,Indian Restaurant,Italian Restaurant,Park,Caribbean Restaurant,Fast Food Restaurant,Convenience Store,Chinese Restaurant
110,Etobicoke,1,Indian Restaurant,Caribbean Restaurant,American Restaurant,Pharmacy,Coffee Shop,Ice Cream Shop,Pizza Place,Bank,Supermarket,Thai Restaurant


Cluster 2 looks like a suitaable match for Mr. X. It has a Park and also provides a lot of different restaurant options. It has supermarket and convenience stores to meet most of the daily needs. Its also supported by availability of banks and pharmacy.  

#### Cluster 3

In [35]:
etobioke_merged.loc[etobioke_merged['Cluster Labels'] == 2, etobioke_merged.columns[[1] + list(range(5, etobioke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
73,Etobicoke,2,Grocery Store,Bakery,Bar,Skating Rink,Chinese Restaurant,Garden Center,Fried Chicken Joint,Fast Food Restaurant,Convenience Store,Coffee Shop


Cluster 3 doesn't have a park and is thus not considered as an option for Mr. X

#### Cluster 4

In [36]:
etobioke_merged.loc[etobioke_merged['Cluster Labels'] == 3, etobioke_merged.columns[[1] + list(range(5, etobioke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Etobicoke,3,Restaurant,Fast Food Restaurant,Women's Store,Garden Center,Greek Restaurant,Ice Cream Shop,Pizza Place,Bank,Fried Chicken Joint,Sandwich Place
68,Etobicoke,3,Grocery Store,Bank,Coffee Shop,Restaurant,Greek Restaurant,Wings Joint,Italian Restaurant,Pharmacy,Pizza Place,Café


Cluster 4 also provides a wide variety of facilities to the residents. It has a lot to offer on eateries but doesn't have a park which is on Mr. X's priority criteria and hence fails our recommendation

#### Cluster 5

In [37]:
etobioke_merged.loc[etobioke_merged['Cluster Labels'] == 4, etobioke_merged.columns[[1] + list(range(5, etobioke_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Etobicoke,4,Chinese Restaurant,Hockey Arena,Convenience Store,Park,Coffee Shop,Pizza Place,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Women's Store


Cluster 5 includes all the criteria laid by Mr. X. It also offers gas station as an added advantage.

## Results and discussion

The aim of this project was to identify and suggest a neighborhood area where Mr.X could but a home as per his budget. In this project we tried to identify the neghborhoods where houses were affordable enough. We created neighborhood clusters to compare them and get a better insight. We also used foursquare data to help us understand the surronding of those neighborhood clusters. In this process we realised we have cluster 1, 2 and 5 which look suitable for Mr. X preference, each having their own advantages.

## Conclusion

Although Cluster 1 and 5 satisfy all the criterias from Mr. X, as compared to Cluster 2 they provide limited options to the residents. Cluster 2 has a variety of restaurants to choose from, has park and stores. It also has an added advantage of having banks and pharmacy in the area.

Our suggestion would be to buy a home in cluster 2 neighborhood. Cluster 1 and 5 can also be considered if needed. 