# Capstone Project - Evaluating the best Neighborhood in Ontario, Canada to immigrate to

## Shaun Diplock - 22nd March 2021

## 1. Introduction

The problem / challenge addressed in this report and study is a very personal one; the target audience is myself and my future wife and family. Therefore, this 'business problem' is a fairly unconventional one, as I am performing this study for myself as the main stakeholder. I'll attempt to keep personal details and information minimal throughout this report, however as a natural consequence of the subject matter I will occasionally reference items of personal interest.

I met my fiance over 5 years ago in Ottawa, Canada, whilst I was working in the area on on a business trip. I live in England, and work as a system development manager and engineer, and frequently have to travel abroad for client site visits and buisness meetings. Our long-distance relationship has developed to the point where we our now engaged, and I am actively trying to immigrate to Canada so we can seriously start our life together. 

Moving to another country is a daunting prospect, and one that merits time and research into identifying the best area for both of us. Naturally, I am very anxious (but also excited) for what the future holds - this project represents a very real, genuine attempt to evaluate some areas that will maximise the chance of our move and choice being a success.

The minimum criteria for us is as follows:

1. Within a 60 minute drive (approximately 70 km) of Smiths Falls, Ontario (this is where my fiance currently works).
2. We do not want to live in Quebec.
3. We cannot live in the United States (Smiths Falls is relatively close to the border).
4. We do not want to live in a very small town - the town / city must contain more than 2,500 residents.
5. We do not want to live in a major city or urban area - the town / city must contain less than 50,000 residents.


Providing the above minimum criteria are met, towns / cities / neighboorhoods will be ranked using the following attributes:

1. The amount of restuarants in the town / city (excluding fast food restuarants) - high priority.
2. The amount of bars in the town / city - high priority.
3. The amount of gyms / fitness studios in the town / city - medium priority.
4. The amount of entertainment venues in the town / city - medium priority.
5. The amount of outdoor spaces, such as parks and trails in the town / city - medium priority.
6. The amount of shopping outlets and retail stores - low priority.
7. The distance from Smiths Falls, Ontario - high priority.

With the above critieria evaluated we will hopefully be able to identify some suitable areas for us to move to when I immigrate, to maximise the chance of our future life together being happy and successful.


## 2. Data Acquisition and Preparation


The source data for this problem will be acquired from [Distantias's location proximity tool](https://www.distantias.com/towns-radius-smiths_falls-ontario-canada.htm). This website provides an easy-to-use tool which can quickly search for all towns, cities and habitated places in proximity to another town or place. It then provides the location data in a convenient .csv file which will be easy to process for the study. Unfortunately, this tool also charges a small fee to use. Despite a thorough search for equivalent libraries / functions and data sources, the ease of which this tool provides outweighed the negative aspect of a small extra charge to access the data.

The reliability of this data seems very good, with transparency about some population data that may be missing from the returned query: *'We don't have data for every town and city in Canada and we specify this with NA in our data table. Population data is sourced from a variety of national and international databases some of which are more current than others. The oldest data set is from 2011 but we do make ongoing updates as new census data is released'.* With this in mind, a provisional review of my queried data did indeed contain hamlets and settlements with no population data; evaluating these manually showed that the settlements are so tiny that no census data has ever been collected from them. Therefore these can be simply dropped from the data as they do not fulfill critierium 4 as detailed above.

This data will then be leveraged using foresquare in order to evaluate the areas, towns, cities and neighborhoods that meet the minimum acceptable criteria. This Foresquare location and venues data will then be used to evaluate our preffered neighborhood attributes.

This data can be called to provide lots of meaningful and revelant information - for instance it can be used to examine and cluster the frequency of various amneties in an area, as shown by the following results from a previous and related exercise:

![alt text](https://github.com/ShaunDiplock/Coursera_Capstone/blob/main/Neighborhood%20data%20example.PNG?raw=true "Example clustered neighborhood data")

Ultimately, I will be using all of this data to assign each area a 'weighted score' to help form a list of the best three suitable neighborhoods, with some final discussion about the pros and cons of each area.

Does not take into consideration housing prices, ratings of venues, proximity to other services (hospitals, schools, train stations etc) or other important factors we may also look at when conducting a more forensic study.

# Import required libraries

In [134]:
import pandas as pd
import numpy as np

# Import Distantias Data

In [135]:
path='D:\GithubProjects\Coursera_Capstone\smiths_falls_distantias_data.csv'

df_ontariodata = pd.read_csv(path)

df_ontariodata.head(10)

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
0,Smiths Falls,https://www.distantias.com/towns-radius-smiths...,0.0,0.0,https://www.distantias.com/distance-from-smith...,0.0,0 hour(s) and 0 minutes,0,0.0,Ontario,Canada,9403,NE,44.9,-76.0167,2011
1,Merrickville,https://www.distantias.com/towns-radius-merric...,9.05,14.564527,https://www.distantias.com/distance-from-smith...,10.77,0 hour(s) and 25 minutes,26,24.9,Ontario,Canada,3067,NE,44.9167,-75.8333,2011
2,Perth,https://www.distantias.com/towns-radius-perth-...,10.67,17.171658,https://www.distantias.com/distance-from-smith...,13.22,0 hour(s) and 21 minutes,37,21.4,Ontario,Canada,6211,SW,44.8833,-76.2333,2011
3,Eloida,https://www.distantias.com/towns-radius-eloida...,15.17,24.413688,https://www.distantias.com/distance-from-smith...,18.8,0 hour(s) and 30 minutes,37,30.5,Ontario,Canada,not available,SE,44.6833,-75.9667,2011
4,Carleton Place,https://www.distantias.com/towns-radius-carlet...,17.09,27.503621,https://www.distantias.com/distance-from-smith...,21.17,0 hour(s) and 34 minutes,37,34.3,Ontario,Canada,10013,NW,45.1333,-76.1333,2011
5,Kemptville,https://www.distantias.com/towns-radius-kemptv...,20.39,32.814443,https://www.distantias.com/distance-from-smith...,24.77,0 hour(s) and 27 minutes,55,27.0,Ontario,Canada,3532,NE,45.0167,-75.6333,2011
6,Crosby,https://www.distantias.com/towns-radius-crosby...,20.74,33.377712,https://www.distantias.com/distance-from-smith...,25.2,0 hour(s) and 27 minutes,55,27.5,Ontario,Canada,not available,SW,44.65,-76.25,2011
7,Richmond,https://www.distantias.com/towns-radius-richmo...,21.52,34.632997,https://www.distantias.com/distance-from-smith...,26.15,0 hour(s) and 29 minutes,55,28.5,Ontario,Canada,3797,NE,45.1833,-75.8333,2011
8,Newboro,https://www.distantias.com/towns-radius-newbor...,22.71,36.548111,https://www.distantias.com/distance-from-smith...,27.59,0 hour(s) and 30 minutes,55,30.1,Ontario,Canada,not available,SW,44.65,-76.3167,2011
9,Almonte,https://www.distantias.com/towns-radius-almont...,23.63,38.028704,https://www.distantias.com/distance-from-smith...,28.71,0 hour(s) and 31 minutes,55,31.3,Ontario,Canada,4752,NW,45.2167,-76.2,2011


In [136]:
df_ontariodata.dtypes

Town Name                                 object
Web Link                                  object
Distance Miles                           float64
Distance KM                              float64
Precise Drive time and Directions URL     object
Approx Drive miles                       float64
Approx Drive Time                         object
Assumed Average MPH                        int64
Total Minutes                            float64
Region                                    object
Country                                   object
Population                                object
Direction                                 object
Latitude                                 float64
Longitude                                float64
Date                                       int64
dtype: object

In [137]:
df_ontariodata.shape

(134, 16)

### Filter for towns that meet criteria 1 (within 70km)

In [138]:
df1=df_ontariodata[df_ontariodata['Distance KM'] < 70]

df1.shape

(61, 16)

In [139]:
df1

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
0,Smiths Falls,https://www.distantias.com/towns-radius-smiths...,0.0,0.0,https://www.distantias.com/distance-from-smith...,0.0,0 hour(s) and 0 minutes,0,0.0,Ontario,Canada,9403,NE,44.9,-76.0167,2011
1,Merrickville,https://www.distantias.com/towns-radius-merric...,9.05,14.564527,https://www.distantias.com/distance-from-smith...,10.77,0 hour(s) and 25 minutes,26,24.9,Ontario,Canada,3067,NE,44.9167,-75.8333,2011
2,Perth,https://www.distantias.com/towns-radius-perth-...,10.67,17.171658,https://www.distantias.com/distance-from-smith...,13.22,0 hour(s) and 21 minutes,37,21.4,Ontario,Canada,6211,SW,44.8833,-76.2333,2011
3,Eloida,https://www.distantias.com/towns-radius-eloida...,15.17,24.413688,https://www.distantias.com/distance-from-smith...,18.8,0 hour(s) and 30 minutes,37,30.5,Ontario,Canada,not available,SE,44.6833,-75.9667,2011
4,Carleton Place,https://www.distantias.com/towns-radius-carlet...,17.09,27.503621,https://www.distantias.com/distance-from-smith...,21.17,0 hour(s) and 34 minutes,37,34.3,Ontario,Canada,10013,NW,45.1333,-76.1333,2011
5,Kemptville,https://www.distantias.com/towns-radius-kemptv...,20.39,32.814443,https://www.distantias.com/distance-from-smith...,24.77,0 hour(s) and 27 minutes,55,27.0,Ontario,Canada,3532,NE,45.0167,-75.6333,2011
6,Crosby,https://www.distantias.com/towns-radius-crosby...,20.74,33.377712,https://www.distantias.com/distance-from-smith...,25.2,0 hour(s) and 27 minutes,55,27.5,Ontario,Canada,not available,SW,44.65,-76.25,2011
7,Richmond,https://www.distantias.com/towns-radius-richmo...,21.52,34.632997,https://www.distantias.com/distance-from-smith...,26.15,0 hour(s) and 29 minutes,55,28.5,Ontario,Canada,3797,NE,45.1833,-75.8333,2011
8,Newboro,https://www.distantias.com/towns-radius-newbor...,22.71,36.548111,https://www.distantias.com/distance-from-smith...,27.59,0 hour(s) and 30 minutes,55,30.1,Ontario,Canada,not available,SW,44.65,-76.3167,2011
9,Almonte,https://www.distantias.com/towns-radius-almont...,23.63,38.028704,https://www.distantias.com/distance-from-smith...,28.71,0 hour(s) and 31 minutes,55,31.3,Ontario,Canada,4752,NW,45.2167,-76.2,2011


### Filter for towns that meet criteria 2 (not in Quebec)

In [140]:
#First remove whitespace which is errorenously present in the .csv file, this step is needed so that filter function works

df1['Region']=df1['Region'].str.strip()

#Then filter for regions not called Quebec
df2=df1[df1['Region'] != "Quebec"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['Region']=df1['Region'].str.strip()


The warning above can be ignored - it's just telling me that I have modified the original dataframe rather than a copy, which is our desired result and *not* a mistake.

In [141]:
df2.shape

(57, 16)

In [142]:
df2.tail(10)

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
49,Rockcliffe Park,https://www.distantias.com/towns-radius-rockcl...,41.3,66.465742,https://www.distantias.com/distance-from-smith...,50.18,0 hour(s) and 55 minutes,55,54.7,Ontario,Canada,1932,NE,45.45,-75.6833,2011
50,Chesterville,https://www.distantias.com/towns-radius-cheste...,41.4,66.626676,https://www.distantias.com/distance-from-smith...,50.3,0 hour(s) and 55 minutes,55,54.9,Ontario,Canada,1514,NE,45.1,-75.2167,2011
51,Madrid,https://www.distantias.com/towns-radius-madrid...,41.69,67.093385,https://www.distantias.com/distance-from-smith...,50.65,0 hour(s) and 55 minutes,55,55.3,New York,United States,1651,SE,44.7756,-75.1851,2018
52,Waddington,https://www.distantias.com/towns-radius-waddin...,41.75,67.189945,https://www.distantias.com/distance-from-smith...,50.73,0 hour(s) and 55 minutes,55,55.3,New York,United States,2214,SE,44.8474,-75.1678,2018
53,Redwood,https://www.distantias.com/towns-radius-redwoo...,42.19,67.898055,https://www.distantias.com/distance-from-smith...,51.26,0 hour(s) and 56 minutes,55,55.9,New York,United States,1510,SE,44.3237,-75.735,2011
54,Mountain Grove,https://www.distantias.com/towns-radius-mounta...,42.5,68.39695,https://www.distantias.com/distance-from-smith...,51.64,0 hour(s) and 56 minutes,55,56.3,Ontario,Canada,not available,SW,44.7333,-76.85,2011
55,Thousand Island Park,https://www.distantias.com/towns-radius-thousa...,42.51,68.413043,https://www.distantias.com/distance-from-smith...,51.65,0 hour(s) and 56 minutes,55,56.4,New York,United States,31,SW,44.2849,-76.0299,2011
58,Fishers Landing,https://www.distantias.com/towns-radius-fisher...,42.97,69.15334,https://www.distantias.com/distance-from-smith...,52.21,0 hour(s) and 57 minutes,55,57.0,New York,United States,89,SE,44.2782,-76.0083,2011
59,Braeside,https://www.distantias.com/towns-radius-braesi...,43.34,69.748796,https://www.distantias.com/distance-from-smith...,52.66,0 hour(s) and 57 minutes,55,57.5,Ontario,Canada,7178,NW,45.4667,-76.4,2011
60,Plessis,https://www.distantias.com/towns-radius-plessi...,43.35,69.764889,https://www.distantias.com/distance-from-smith...,52.67,0 hour(s) and 57 minutes,55,57.5,New York,United States,164,SE,44.2847,-75.8458,2011


### Filter for towns that meet criteria 3 (not in the United States)

In [143]:
#Then filter for regions not called Quebec
df3=df2[df2['Country'] != "United States"]
df3.shape

(40, 16)

In [144]:
df3.tail(10)

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
42,Gananoque,https://www.distantias.com/towns-radius-ganano...,39.85,64.132199,https://www.distantias.com/distance-from-smith...,48.42,0 hour(s) and 53 minutes,55,52.8,Ontario,Canada,5194,SW,44.3333,-76.1667,2011
44,Russell,https://www.distantias.com/towns-radius-russel...,40.49,65.162177,https://www.distantias.com/distance-from-smith...,49.2,0 hour(s) and 54 minutes,55,53.7,Ontario,Canada,3759,NE,45.26,-75.36,2011
45,Arnprior,https://www.distantias.com/towns-radius-arnpri...,40.57,65.290924,https://www.distantias.com/distance-from-smith...,49.29,0 hour(s) and 54 minutes,55,53.8,Ontario,Canada,10099,NW,45.4333,-76.3667,2011
46,Ompah,https://www.distantias.com/towns-radius-ompah-...,40.7,65.500138,https://www.distantias.com/distance-from-smith...,49.45,0 hour(s) and 54 minutes,55,54.0,Ontario,Canada,1675,NW,45.0167,-76.8333,2011
47,Morrisburg,https://www.distantias.com/towns-radius-morris...,40.79,65.644979,https://www.distantias.com/distance-from-smith...,49.56,0 hour(s) and 54 minutes,55,54.1,Ontario,Canada,2756,NE,44.9,-75.1833,2011
48,Gloucester,https://www.distantias.com/towns-radius-glouce...,41.03,66.03122,https://www.distantias.com/distance-from-smith...,49.85,0 hour(s) and 54 minutes,55,54.4,Ontario,Canada,133280,NE,45.4167,-75.6,2011
49,Rockcliffe Park,https://www.distantias.com/towns-radius-rockcl...,41.3,66.465742,https://www.distantias.com/distance-from-smith...,50.18,0 hour(s) and 55 minutes,55,54.7,Ontario,Canada,1932,NE,45.45,-75.6833,2011
50,Chesterville,https://www.distantias.com/towns-radius-cheste...,41.4,66.626676,https://www.distantias.com/distance-from-smith...,50.3,0 hour(s) and 55 minutes,55,54.9,Ontario,Canada,1514,NE,45.1,-75.2167,2011
54,Mountain Grove,https://www.distantias.com/towns-radius-mounta...,42.5,68.39695,https://www.distantias.com/distance-from-smith...,51.64,0 hour(s) and 56 minutes,55,56.3,Ontario,Canada,not available,SW,44.7333,-76.85,2011
59,Braeside,https://www.distantias.com/towns-radius-braesi...,43.34,69.748796,https://www.distantias.com/distance-from-smith...,52.66,0 hour(s) and 57 minutes,55,57.5,Ontario,Canada,7178,NW,45.4667,-76.4,2011


### Filter for towns that meet criteria 4 (more than 2,500 residents)

First must drop any tiny settlements or hamlets (those with 'not available' listed in the population column)

In [145]:
df4 = df3[df3['Population'] != 'not available']
df4.shape

(34, 16)

In [146]:
df4.head(10)

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
0,Smiths Falls,https://www.distantias.com/towns-radius-smiths...,0.0,0.0,https://www.distantias.com/distance-from-smith...,0.0,0 hour(s) and 0 minutes,0,0.0,Ontario,Canada,9403,NE,44.9,-76.0167,2011
1,Merrickville,https://www.distantias.com/towns-radius-merric...,9.05,14.564527,https://www.distantias.com/distance-from-smith...,10.77,0 hour(s) and 25 minutes,26,24.9,Ontario,Canada,3067,NE,44.9167,-75.8333,2011
2,Perth,https://www.distantias.com/towns-radius-perth-...,10.67,17.171658,https://www.distantias.com/distance-from-smith...,13.22,0 hour(s) and 21 minutes,37,21.4,Ontario,Canada,6211,SW,44.8833,-76.2333,2011
4,Carleton Place,https://www.distantias.com/towns-radius-carlet...,17.09,27.503621,https://www.distantias.com/distance-from-smith...,21.17,0 hour(s) and 34 minutes,37,34.3,Ontario,Canada,10013,NW,45.1333,-76.1333,2011
5,Kemptville,https://www.distantias.com/towns-radius-kemptv...,20.39,32.814443,https://www.distantias.com/distance-from-smith...,24.77,0 hour(s) and 27 minutes,55,27.0,Ontario,Canada,3532,NE,45.0167,-75.6333,2011
7,Richmond,https://www.distantias.com/towns-radius-richmo...,21.52,34.632997,https://www.distantias.com/distance-from-smith...,26.15,0 hour(s) and 29 minutes,55,28.5,Ontario,Canada,3797,NE,45.1833,-75.8333,2011
9,Almonte,https://www.distantias.com/towns-radius-almont...,23.63,38.028704,https://www.distantias.com/distance-from-smith...,28.71,0 hour(s) and 31 minutes,55,31.3,Ontario,Canada,4752,NW,45.2167,-76.2,2011
10,Westport,https://www.distantias.com/towns-radius-westpo...,24.06,38.72072,https://www.distantias.com/distance-from-smith...,29.23,0 hour(s) and 32 minutes,55,31.9,Ontario,Canada,590,SW,44.6833,-76.4,2011
11,Stittsville,https://www.distantias.com/towns-radius-stitts...,24.67,39.702418,https://www.distantias.com/distance-from-smith...,29.97,0 hour(s) and 33 minutes,55,32.7,Ontario,Canada,41350,NE,45.25,-75.9167,2011
12,Brockville,https://www.distantias.com/towns-radius-brockv...,27.35,44.015449,https://www.distantias.com/distance-from-smith...,33.23,0 hour(s) and 36 minutes,55,36.3,Ontario,Canada,23354,SE,44.5833,-75.6833,2011


Need to next convert the Population column to numeric values

In [147]:
df4["Population"] = pd.to_numeric(df4["Population"])
df4.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df4["Population"] = pd.to_numeric(df4["Population"])


Town Name                                 object
Web Link                                  object
Distance Miles                           float64
Distance KM                              float64
Precise Drive time and Directions URL     object
Approx Drive miles                       float64
Approx Drive Time                         object
Assumed Average MPH                        int64
Total Minutes                            float64
Region                                    object
Country                                   object
Population                                 int64
Direction                                 object
Latitude                                 float64
Longitude                                float64
Date                                       int64
dtype: object

Now the population values are numerical integers, we can perform the operation of removing towns that contain less than 2,500 residents.

In [148]:
df5=df4[df4['Population'] > 2500]
df5.shape

(23, 16)

In [149]:
df5

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
0,Smiths Falls,https://www.distantias.com/towns-radius-smiths...,0.0,0.0,https://www.distantias.com/distance-from-smith...,0.0,0 hour(s) and 0 minutes,0,0.0,Ontario,Canada,9403,NE,44.9,-76.0167,2011
1,Merrickville,https://www.distantias.com/towns-radius-merric...,9.05,14.564527,https://www.distantias.com/distance-from-smith...,10.77,0 hour(s) and 25 minutes,26,24.9,Ontario,Canada,3067,NE,44.9167,-75.8333,2011
2,Perth,https://www.distantias.com/towns-radius-perth-...,10.67,17.171658,https://www.distantias.com/distance-from-smith...,13.22,0 hour(s) and 21 minutes,37,21.4,Ontario,Canada,6211,SW,44.8833,-76.2333,2011
4,Carleton Place,https://www.distantias.com/towns-radius-carlet...,17.09,27.503621,https://www.distantias.com/distance-from-smith...,21.17,0 hour(s) and 34 minutes,37,34.3,Ontario,Canada,10013,NW,45.1333,-76.1333,2011
5,Kemptville,https://www.distantias.com/towns-radius-kemptv...,20.39,32.814443,https://www.distantias.com/distance-from-smith...,24.77,0 hour(s) and 27 minutes,55,27.0,Ontario,Canada,3532,NE,45.0167,-75.6333,2011
7,Richmond,https://www.distantias.com/towns-radius-richmo...,21.52,34.632997,https://www.distantias.com/distance-from-smith...,26.15,0 hour(s) and 29 minutes,55,28.5,Ontario,Canada,3797,NE,45.1833,-75.8333,2011
9,Almonte,https://www.distantias.com/towns-radius-almont...,23.63,38.028704,https://www.distantias.com/distance-from-smith...,28.71,0 hour(s) and 31 minutes,55,31.3,Ontario,Canada,4752,NW,45.2167,-76.2,2011
11,Stittsville,https://www.distantias.com/towns-radius-stitts...,24.67,39.702418,https://www.distantias.com/distance-from-smith...,29.97,0 hour(s) and 33 minutes,55,32.7,Ontario,Canada,41350,NE,45.25,-75.9167,2011
12,Brockville,https://www.distantias.com/towns-radius-brockv...,27.35,44.015449,https://www.distantias.com/distance-from-smith...,33.23,0 hour(s) and 36 minutes,55,36.3,Ontario,Canada,23354,SE,44.5833,-75.6833,2011
13,Prescott,https://www.distantias.com/towns-radius-presco...,27.63,44.466064,https://www.distantias.com/distance-from-smith...,33.57,0 hour(s) and 37 minutes,55,36.6,Ontario,Canada,4284,SE,44.7167,-75.5167,2011


### Filter for towns that meet criteria 5 (less than 50,000 residents)

In [150]:
df6=df5[df5['Population'] < 50000]
df6.shape

(19, 16)

In [151]:
df6.head(10)

Unnamed: 0,Town Name,Web Link,Distance Miles,Distance KM,Precise Drive time and Directions URL,Approx Drive miles,Approx Drive Time,Assumed Average MPH,Total Minutes,Region,Country,Population,Direction,Latitude,Longitude,Date
0,Smiths Falls,https://www.distantias.com/towns-radius-smiths...,0.0,0.0,https://www.distantias.com/distance-from-smith...,0.0,0 hour(s) and 0 minutes,0,0.0,Ontario,Canada,9403,NE,44.9,-76.0167,2011
1,Merrickville,https://www.distantias.com/towns-radius-merric...,9.05,14.564527,https://www.distantias.com/distance-from-smith...,10.77,0 hour(s) and 25 minutes,26,24.9,Ontario,Canada,3067,NE,44.9167,-75.8333,2011
2,Perth,https://www.distantias.com/towns-radius-perth-...,10.67,17.171658,https://www.distantias.com/distance-from-smith...,13.22,0 hour(s) and 21 minutes,37,21.4,Ontario,Canada,6211,SW,44.8833,-76.2333,2011
4,Carleton Place,https://www.distantias.com/towns-radius-carlet...,17.09,27.503621,https://www.distantias.com/distance-from-smith...,21.17,0 hour(s) and 34 minutes,37,34.3,Ontario,Canada,10013,NW,45.1333,-76.1333,2011
5,Kemptville,https://www.distantias.com/towns-radius-kemptv...,20.39,32.814443,https://www.distantias.com/distance-from-smith...,24.77,0 hour(s) and 27 minutes,55,27.0,Ontario,Canada,3532,NE,45.0167,-75.6333,2011
7,Richmond,https://www.distantias.com/towns-radius-richmo...,21.52,34.632997,https://www.distantias.com/distance-from-smith...,26.15,0 hour(s) and 29 minutes,55,28.5,Ontario,Canada,3797,NE,45.1833,-75.8333,2011
9,Almonte,https://www.distantias.com/towns-radius-almont...,23.63,38.028704,https://www.distantias.com/distance-from-smith...,28.71,0 hour(s) and 31 minutes,55,31.3,Ontario,Canada,4752,NW,45.2167,-76.2,2011
11,Stittsville,https://www.distantias.com/towns-radius-stitts...,24.67,39.702418,https://www.distantias.com/distance-from-smith...,29.97,0 hour(s) and 33 minutes,55,32.7,Ontario,Canada,41350,NE,45.25,-75.9167,2011
12,Brockville,https://www.distantias.com/towns-radius-brockv...,27.35,44.015449,https://www.distantias.com/distance-from-smith...,33.23,0 hour(s) and 36 minutes,55,36.3,Ontario,Canada,23354,SE,44.5833,-75.6833,2011
13,Prescott,https://www.distantias.com/towns-radius-presco...,27.63,44.466064,https://www.distantias.com/distance-from-smith...,33.57,0 hour(s) and 37 minutes,55,36.6,Ontario,Canada,4284,SE,44.7167,-75.5167,2011


### Drop unrequired columns, then reset the index

In [278]:
df_shortlist=df6[['Town Name','Population','Distance KM','Latitude','Longitude']]
df_shortlist=df_shortlist.sort_values('Town Name').reset_index(drop=True)
df_shortlist

Unnamed: 0,Town Name,Population,Distance KM,Latitude,Longitude
0,Almonte,4752,38.028704,45.2167,-76.2
1,Arnprior,10099,65.290924,45.4333,-76.3667
2,Braeside,7178,69.748796,45.4667,-76.4
3,Brockville,23354,44.015449,44.5833,-75.6833
4,Carleton Place,10013,27.503621,45.1333,-76.1333
5,Gananoque,5194,64.132199,44.3333,-76.1667
6,Greely,9049,53.1565,45.26,-75.57
7,Kemptville,3532,32.814443,45.0167,-75.6333
8,Manotick,4486,46.091498,45.24,-75.68
9,Merrickville,3067,14.564527,44.9167,-75.8333


This represent our final processed, cleaned and finished dataframe, ready for leveraging and evaluation using foresquare

## Import additional Required libraries

In [153]:
import numpy as np # library to handle data in a vectorized manner

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Create a map centered on Smith's Falls, with all towns that meet the minimum criteria superimposed on top

Obtain longitudes and latitudes from top row in dataframe

In [154]:
latitude = df_shortlist.iloc[0,3] 
longitude = df_shortlist.iloc[0,4] 

Create map with suitable zoom level

In [155]:
map_smithsFalls = folium.Map(location=[latitude, longitude], zoom_start=8)

Add markers to map

In [156]:
for lat, lng, town in zip(df_shortlist['Latitude'], df_shortlist['Longitude'], df_shortlist['Town Name']):
    label = '{}'.format(town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_smithsFalls)  
    
map_smithsFalls

### Define my Foursquare Credentials and Version

In [157]:
CLIENT_ID = 'CCKOQHSERN1KJLQR44C4RZBZ4NH4QO4ELA3EU4FBLIMKBEGL'
CLIENT_SECRET = 'E0FLMAGE44ZKOJ44VXSHPX2J4NZDCPX34NJRKQIFDKEEJ2HO' 
VERSION = '20180605'
LIMIT = 100 

### Create a function to iteratively explore the Towns

In [158]:
def getNearbyVenues(names, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town', 
                  'Town Latitude', 
                  'Town Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function on each town and create a new dataframe called shortlist_venues

In [159]:
shortlist_venues = getNearbyVenues(names=df_shortlist['Town Name'],
                                   latitudes=df_shortlist['Latitude'],
                                   longitudes=df_shortlist['Longitude']
                                  )

Smiths Falls
Merrickville
Perth
Carleton Place
Kemptville
Richmond
Almonte
Stittsville
Brockville
Prescott
Manotick
Mississippi Mills
Greely
Rockport
Gananoque
Russell
Arnprior
Morrisburg
Braeside


Examine the size and print out a sample of the resulting dataframe

In [160]:
print(shortlist_venues.shape)
shortlist_venues

(302, 7)


Unnamed: 0,Town,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Smiths Falls,44.9,-76.0167,Coffee Culture Cafe & Eatery,44.901396,-76.021224,Café
1,Smiths Falls,44.9,-76.0167,TD Canada Trust,44.900042,-76.020867,Bank
2,Smiths Falls,44.9,-76.0167,Pizza Hut,44.891743,-76.030096,Pizza Place
3,Smiths Falls,44.9,-76.0167,Andress' Your Independent Grocer,44.892481,-76.026833,Grocery Store
4,Smiths Falls,44.9,-76.0167,The Beer Store,44.893694,-76.027683,Beer Store
5,Smiths Falls,44.9,-76.0167,Canadian Tire,44.892104,-76.029183,Hardware Store
6,Smiths Falls,44.9,-76.0167,Burger King,44.902312,-76.022032,Fast Food Restaurant
7,Smiths Falls,44.9,-76.0167,Dairy Queen,44.893304,-76.028558,Ice Cream Shop
8,Smiths Falls,44.9,-76.0167,Rob Roy's Pub,44.897986,-76.019312,Pub
9,Smiths Falls,44.9,-76.0167,Tim Hortons,44.90263,-76.022064,Coffee Shop


Examine how many venues were returned for each town

In [161]:
shortlist_venues.groupby('Town').count()

Unnamed: 0_level_0,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Almonte,7,7,7,7,7,7
Arnprior,20,20,20,20,20,20
Braeside,4,4,4,4,4,4
Brockville,48,48,48,48,48,48
Carleton Place,30,30,30,30,30,30
Gananoque,20,20,20,20,20,20
Greely,5,5,5,5,5,5
Kemptville,23,23,23,23,23,23
Manotick,16,16,16,16,16,16
Merrickville,6,6,6,6,6,6


Check how many unique categories can be curated from all the returned venues

In [162]:
print('There are {} unique categories.'.format(len(shortlist_venues['Venue Category'].unique())))

There are 82 unique categories.


Print out all the unique venues

In [163]:
shortlist_venues['Venue Category'].unique()

array(['Café', 'Bank', 'Pizza Place', 'Grocery Store', 'Beer Store',
       'Hardware Store', 'Fast Food Restaurant', 'Ice Cream Shop', 'Pub',
       'Coffee Shop', 'Paper / Office Supplies Store', 'Pharmacy',
       'Big Box Store', 'Department Store', 'Supermarket', 'Skating Rink',
       'Hotel', 'Gas Station', 'Discount Store', 'Liquor Store',
       'Shopping Mall', 'Gourmet Shop', 'Restaurant', 'Canal Lock',
       'American Restaurant', 'Gastropub', 'BBQ Joint', 'Bar',
       'Mexican Restaurant', 'Convenience Store', 'Park',
       'Sandwich Place', 'Diner', 'Clothing Store',
       'Middle Eastern Restaurant', 'Auto Garage', 'Thai Restaurant',
       'Brewery', 'Food Truck', 'Gym', 'Italian Restaurant',
       'Asian Restaurant', 'Bridal Shop', 'History Museum',
       'Mobile Phone Shop', 'Golf Course', 'Fish & Chips Shop', 'Theater',
       'Historic Site', 'Bakery', 'Steakhouse', 'Harbor / Marina',
       'Tex-Mex Restaurant', 'Sushi Restaurant', 'Smoothie Shop',
       'Fo

The venue categories returned are very specific venue types, which is great for typical Foresquare users, but not easily useable for us. Reffering back to our initial attributes we wish to rate the Towns by, we need to re-classify these specific venue types into general ones to be able to evaluate the amount that each Town contains. The six categories we are interested in ranking are restuarants (excluding fast food restaurants or similiar, which will be classed as 'Other'), bars, fitness venues, entertainment venues, outdoor spaces and shopping outlets. Therefore, the unique categories need to be re-classified as follows for us to use them:

-Restuarant: 'Café', 'Pizza Place', 'Coffee Shop', 'Restaurant', 'American Restaurant', 'Gastropub', 'BBQ Joint', 'Mexican Restaurant', 'Sandwich Place', 'Diner', 'Middle Eastern Restaurant', 'Thai Restaurant', 'Italian Restaurant', 'Asian Restaurant', 'Wings Joint', 'Bakery', 'Steakhouse', 'Tex-Mex Restaurant', 'Sushi Restaurant', 'Burger Joint', 'Breakfast Spot'

-Bar: 'Pub', 'Bar', 'Brewery', 'Sports Bar'

-Fitness: 'Skating Rink', 'Gym', 'Golf Course', 'Athletics & Sports'

-Entertainment: 'Theater', 'Historic Site', 'Movie Theater', 'Hockey Arena', 'Art Gallery', 'Castle', 'Casino', 'Recreation Center', 'History Museum'

-Outdoor Space: 'Harbor / Marina', 'Park', 'Plaza', 'Beach', 'Scenic Lookout', 'Lake', 'Waterfront', 'Canal Lock'

-Shopping Outlet: 'Grocery Store', 'Beer Store', 'Hardware Store', 'Ice Cream Shop', 'Paper / Office Supplies Store', 'Big Box Store', 'Department Store', 'Supermarket', 'Discount Store', 'Liquor Store', 'Shopping Mall', 'Gourmet Shop', 'Video Game Store', 'Clothing Store', 'Sporting Goods Shop', 'Thrift / Vintage Store', 'Electronics Store', 'Smoothie Shop', 'Fish & Chips Shop', 'Food & Drink Shop', 'Convenience Store', 'Market', 'Gift Shop', 'Mobile Phone Shop', 'Video Store', 'Optical Shop', 'Pet Store'

-Other: 'Bank', 'Fast Food Restuarant', 'Pharmacy', 'Hotel', 'Gas Station', 'Auto Garage', 'Food Truck', 'Bridal Shop', 'Train Station', 'Light Rail Station', 'Construction & Landscaping', 'Resort', 'Boat or Ferry', 'Truck Stop','Post Office','Business Service'

First, we need to make a list of all the restuarant venues we want to replace with the general category 'Restuarant':

In [172]:
restuarant_venue_list = '|'.join(['Café', 'Pizza Place', 'Coffee Shop', 'Restaurant', 'American Restaurant', 'Gastropub', 'BBQ Joint', 'Mexican Restaurant', 'Sandwich Place', 'Diner', 'Middle Eastern Restaurant', 'Thai Restaurant', 'Italian Restaurant', 'Asian Restaurant', 'Wings Joint', 'Bakery', 'Steakhouse', 'Tex-Mex Restaurant', 'Sushi Restaurant', 'Burger Joint', 'Breakfast Spot'])

And then replace them accordingly:

In [173]:
shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(restuarant_venue_list, 'Restuarant')

Check to see if this worked by examining some of the table and printing out the new unique categories:

In [174]:
shortlist_venues.head(20)

Unnamed: 0,Town,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Smiths Falls,44.9,-76.0167,Coffee Culture Cafe & Eatery,44.901396,-76.021224,Restuarant
1,Smiths Falls,44.9,-76.0167,TD Canada Trust,44.900042,-76.020867,Other
2,Smiths Falls,44.9,-76.0167,Pizza Hut,44.891743,-76.030096,Restuarant
3,Smiths Falls,44.9,-76.0167,Andress' Your Independent Grocer,44.892481,-76.026833,Shopping Outlet
4,Smiths Falls,44.9,-76.0167,The Beer Store,44.893694,-76.027683,Shopping Outlet
5,Smiths Falls,44.9,-76.0167,Canadian Tire,44.892104,-76.029183,Shopping Outlet
6,Smiths Falls,44.9,-76.0167,Burger King,44.902312,-76.022032,Other
7,Smiths Falls,44.9,-76.0167,Dairy Queen,44.893304,-76.028558,Shopping Outlet
8,Smiths Falls,44.9,-76.0167,Rob Roy's Pub,44.897986,-76.019312,Bar
9,Smiths Falls,44.9,-76.0167,Tim Hortons,44.90263,-76.022064,Restuarant


In [175]:
shortlist_venues['Venue Category'].unique()

array(['Restuarant', 'Other', 'Shopping Outlet', 'Bar', 'Fitness',
       'Canal Lock', 'Outdoor Space', 'History Museum',
       'Mobile Phone Shop', 'Entertainment', 'Video Store', 'Post Office',
       'Optical Shop', 'Pet Store', 'Business Service'], dtype=object)

It seems to have worked! Now we just need to repeat for the other categories. First make the remaining lists:

In [180]:
bar_venue_list = '|'.join(['Pub', 'Bar', 'Brewery', 'Sports Bar'])

fitness_venue_list = '|'.join(['Skating Rink', 'Gym', 'Golf Course', 'Athletics & Sports'])

entertainment_venue_list = '|'.join(['Theater', 'Historic Site', 'Movie Theater', 'Hockey Arena', 'Art Gallery', 'Castle', 'Casino', 'Recreation Center', 'History Museum'])

outdoor_venue_list = '|'.join(['Harbor / Marina', 'Park', 'Plaza', 'Beach', 'Scenic Lookout', 'Lake', 'Waterfront', 'Canal Lock'])

shopping_venue_list = '|'.join(['Grocery Store', 'Beer Store', 'Hardware Store', 'Ice Cream Shop', 'Paper / Office Supplies Store', 'Big Box Store', 'Department Store', 'Supermarket', 'Discount Store', 'Liquor Store', 'Shopping Mall', 'Gourmet Shop', 'Video Game Store', 'Clothing Store', 'Sporting Goods Shop', 'Thrift / Vintage Store', 'Electronics Store', 'Smoothie Shop', 'Fish & Chips Shop', 'Food & Drink Shop', 'Convenience Store', 'Market', 'Gift Shop', 'Mobile Phone Shop', 'Video Store', 'Optical Shop', 'Pet Store'])

other_venue_list = '|'.join(['Bank', 'Fast Food Restuarant', 'Pharmacy', 'Hotel', 'Gas Station', 'Auto Garage', 'Food Truck', 'Bridal Shop', 'Train Station', 'Light Rail Station', 'Construction & Landscaping', 'Resort', 'Boat or Ferry', 'Truck Stop', 'Post Office', 'Business Service'])

And now replace them with the relevant generalised classifications:

In [181]:
shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(bar_venue_list, 'Bar')

shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(fitness_venue_list, 'Fitness')

shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(entertainment_venue_list, 'Entertainment')

shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(outdoor_venue_list, 'Outdoor Space')

shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(shopping_venue_list, 'Shopping Outlet')

shortlist_venues['Venue Category'] = shortlist_venues['Venue Category'].str.replace(other_venue_list, 'Other')

Examine the new dataframe and unique values again to check it has worked:

In [182]:
shortlist_venues.head(20)

Unnamed: 0,Town,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Smiths Falls,44.9,-76.0167,Coffee Culture Cafe & Eatery,44.901396,-76.021224,Restuarant
1,Smiths Falls,44.9,-76.0167,TD Canada Trust,44.900042,-76.020867,Other
2,Smiths Falls,44.9,-76.0167,Pizza Hut,44.891743,-76.030096,Restuarant
3,Smiths Falls,44.9,-76.0167,Andress' Your Independent Grocer,44.892481,-76.026833,Shopping Outlet
4,Smiths Falls,44.9,-76.0167,The Beer Store,44.893694,-76.027683,Shopping Outlet
5,Smiths Falls,44.9,-76.0167,Canadian Tire,44.892104,-76.029183,Shopping Outlet
6,Smiths Falls,44.9,-76.0167,Burger King,44.902312,-76.022032,Other
7,Smiths Falls,44.9,-76.0167,Dairy Queen,44.893304,-76.028558,Shopping Outlet
8,Smiths Falls,44.9,-76.0167,Rob Roy's Pub,44.897986,-76.019312,Bar
9,Smiths Falls,44.9,-76.0167,Tim Hortons,44.90263,-76.022064,Restuarant


In [183]:
shortlist_venues['Venue Category'].unique()

array(['Restuarant', 'Other', 'Shopping Outlet', 'Bar', 'Fitness',
       'Outdoor Space', 'Entertainment'], dtype=object)

Great! All the categories have now been generalised nicely so we can use them for the ranking evaluations.

Next, we can group by the Town names and count the venue categories in a nice one-line of python:

In [184]:
venue_count=shortlist_venues.groupby(['Town','Venue Category']).size().reset_index().rename(columns={0: "Count"})

In [185]:
pd.set_option('display.max_rows', None)
venue_count

Unnamed: 0,Town,Venue Category,Count
0,Almonte,Bar,1
1,Almonte,Other,1
2,Almonte,Restuarant,4
3,Almonte,Shopping Outlet,1
4,Arnprior,Entertainment,1
5,Arnprior,Fitness,1
6,Arnprior,Other,7
7,Arnprior,Restuarant,5
8,Arnprior,Shopping Outlet,6
9,Braeside,Restuarant,1


This dataframe contains all the information we need to rank and score the restuarant categories. What we will need to do now is break this data down into the six different categories, then merge it into a final 'scoring' dataframe.

In [268]:
restuarant_count=venue_count.loc[venue_count['Venue Category'] == 'Restuarant']
restuarant_count=restuarant_count[['Town','Count']]

bar_count=venue_count.loc[venue_count['Venue Category'] == 'Bar']
bar_count=bar_count[['Town','Count']]

fitness_count=venue_count.loc[venue_count['Venue Category'] == 'Fitness']
fitness_count=fitness_count[['Town','Count']]

entertainment_count=venue_count.loc[venue_count['Venue Category'] == 'Entertainment']
entertainment_count=entertainment_count[['Town','Count']]

outdoor_count=venue_count.loc[venue_count['Venue Category'] == 'Outdoor Space']
outdoor_count=outdoor_count[['Town','Count']]

shopping_count=venue_count.loc[venue_count['Venue Category'] == 'Shopping Outlet']
shopping_count=shopping_count[['Town','Count']]

shopping_count

Unnamed: 0,Town,Count
3,Almonte,1
8,Arnprior,6
10,Braeside,3
17,Brockville,12
21,Carleton Place,9
27,Gananoque,4
30,Greely,3
34,Kemptville,5
40,Manotick,3
44,Merrickville,1


Create a final 'scoring' dataframe to merge the results into

In [304]:
town_list=shortlist_venues['Town'].unique()
scoring=pd.DataFrame(data=town_list, index=None, columns=['Town']) 
scoring=scoring.sort_values('Town').reset_index(drop=True)
scoring

Unnamed: 0,Town
0,Almonte
1,Arnprior
2,Braeside
3,Brockville
4,Carleton Place
5,Gananoque
6,Greely
7,Kemptville
8,Manotick
9,Merrickville


Now need to merge the seperate category count dataframes into the final scoring one:

In [305]:
scoring = pd.merge(scoring, restuarant_count, on=['Town'], how='outer').fillna(0)
scoring = pd.merge(scoring, bar_count, on=['Town'], how='outer').fillna(0)
scoring = pd.merge(scoring, fitness_count, on=['Town'], how='outer').fillna(0)
scoring = pd.merge(scoring, entertainment_count, on=['Town'], how='outer').fillna(0)
scoring = pd.merge(scoring, outdoor_count, on=['Town'], how='outer').fillna(0)
scoring = pd.merge(scoring, shopping_count, on=['Town'], how='outer').fillna(0)

scoring.columns = ['Town','Restuarant Count','Bar Count','Fitness Count','Entertainment Count','Outdoor Count','Shopping Count']
scoring

Unnamed: 0,Town,Restuarant Count,Bar Count,Fitness Count,Entertainment Count,Outdoor Count,Shopping Count
0,Almonte,4.0,1.0,0.0,0.0,0.0,1
1,Arnprior,5.0,0.0,1.0,1.0,0.0,6
2,Braeside,1.0,0.0,0.0,0.0,0.0,3
3,Brockville,14.0,2.0,2.0,4.0,3.0,12
4,Carleton Place,12.0,2.0,0.0,0.0,0.0,9
5,Gananoque,7.0,2.0,0.0,3.0,2.0,4
6,Greely,1.0,0.0,0.0,0.0,1.0,3
7,Kemptville,10.0,1.0,0.0,0.0,0.0,5
8,Manotick,4.0,2.0,0.0,1.0,2.0,3
9,Merrickville,3.0,1.0,0.0,0.0,1.0,1


Excellent! However, there is still one missing attribute that we need to consider for the final scoring - the distance from Smith's Falls. We can easily get and merge these distances in from the ' df_shortlist' dataframe:

In [306]:
distances=df_shortlist[['Town Name','Distance KM']]
distances.columns=['Town','Distance from Smiths Falls (KM)']
scoring = pd.merge(scoring, distances, on=['Town'],how='right')
scoring

Unnamed: 0,Town,Restuarant Count,Bar Count,Fitness Count,Entertainment Count,Outdoor Count,Shopping Count,Distance from Smiths Falls (KM)
0,Almonte,4.0,1.0,0.0,0.0,0.0,1,38.028704
1,Arnprior,5.0,0.0,1.0,1.0,0.0,6,65.290924
2,Braeside,1.0,0.0,0.0,0.0,0.0,3,69.748796
3,Brockville,14.0,2.0,2.0,4.0,3.0,12,44.015449
4,Carleton Place,12.0,2.0,0.0,0.0,0.0,9,27.503621
5,Gananoque,7.0,2.0,0.0,3.0,2.0,4,64.132199
6,Greely,1.0,0.0,0.0,0.0,1.0,3,53.1565
7,Kemptville,10.0,1.0,0.0,0.0,0.0,5,32.814443
8,Manotick,4.0,2.0,0.0,1.0,2.0,3,46.091498
9,Merrickville,3.0,1.0,0.0,0.0,1.0,1,14.564527


To turn these count values into a more accurate score, we need to normalise the values. To do this, we will simply divide all the values in each column by the maximum value in each column. This will give each Town a score for each category between 0 and 1.

In [307]:
scoring['Restuarant Count']=scoring['Restuarant Count']/scoring['Restuarant Count'].max()
scoring['Bar Count']=scoring['Bar Count']/scoring['Bar Count'].max()
scoring['Fitness Count']=scoring['Fitness Count']/scoring['Fitness Count'].max()
scoring['Entertainment Count']=scoring['Entertainment Count']/scoring['Entertainment Count'].max()
scoring['Outdoor Count']=scoring['Outdoor Count']/scoring['Outdoor Count'].max()
scoring['Shopping Count']=scoring['Shopping Count']/scoring['Shopping Count'].max()
scoring['Distance from Smiths Falls (KM)']=scoring['Distance from Smiths Falls (KM)']/scoring['Distance from Smiths Falls (KM)'].max()

scoring.columns=['Town','Restaurant Score', 'Bar Score','Fitness Score','Entertainment Score', 'Outdoor Score', 'Shopping Score', 'Distance Score']

scoring

We are getting close, but now we need to turn these initial scores into **weighted** scores, to more accurately represent the priority of importance as outlined initially, as follows:

1. Restaurants - high priority.
2. Bars - high priority.
3. Fitness - medium priority.
4. Entertainment - medium priority.
5. Outdoor - medium priority.
6. Shopping Outlets - low priority.
7. Distance from Smiths Falls- high priority.

To apply these weightings we will simply multiply the scores by the following amounts:

-High priority: Weighting of 2
-Medium priority: Weighting of 1.5
-Low priority: Weighting of 1 (i.e. no change)

In [314]:
scoring['Restaurant Score']=scoring['Restaurant Score']*2
scoring['Bar Score']=scoring['Bar Score']*2
scoring['Fitness Score']=scoring['Fitness Score']*1.5
scoring['Entertainment Score']=scoring['Entertainment Score']*1.5
scoring['Outdoor Score']=scoring['Outdoor Score']*1.5
scoring['Distance Score']=scoring['Distance Score']*2

scoring.columns=['Town','Final Restaurant Score', 'Final Bar Score','Final Fitness Score','Final Entertainment Score', 'Final Outdoor Score', 'Final Shopping Score', 'Final Distance Deduction']

scoring

Unnamed: 0,Town,Restaurant Score,Bar Score,Fitness Score,Entertainment Score,Outdoor Score,Shopping Score,Distance Score
0,Almonte,0.571429,1.0,0.0,0.0,0.0,0.083333,1.090448
1,Arnprior,0.714286,0.0,0.75,0.375,0.0,0.5,1.872174
2,Braeside,0.142857,0.0,0.0,0.0,0.0,0.25,2.0
3,Brockville,2.0,2.0,1.5,1.5,0.642857,1.0,1.262114
4,Carleton Place,1.714286,2.0,0.0,0.0,0.0,0.75,0.788648
5,Gananoque,1.0,2.0,0.0,1.125,0.428571,0.333333,1.838948
6,Greely,0.142857,0.0,0.0,0.0,0.214286,0.25,1.524227
7,Kemptville,1.428571,1.0,0.0,0.0,0.0,0.416667,0.940932
8,Manotick,0.571429,2.0,0.0,0.375,0.428571,0.25,1.321643
9,Merrickville,0.428571,1.0,0.0,0.0,0.214286,0.083333,0.417628


To calculate the final socring for each town, we need to add a new 'Overall Score' column, then evaluate the results. To do this, we will sum up the score of all the columns, however minus the distance deduction, as the further the distance away from Smiths Falls, the harder my fiance's commute will be (i.e. this is a negative attribute!).