# Capstone Project - Home is here too
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will find **similar neighbourhoods** between two cities. Specifically, this report will be targeted to people who will be reallocated and is interested in finding a **new home in a strange city**. The idea behind finding the most similar neighbourhood is to make it easier the adaptation and the sense of belonging in the new city.

For this project we will find the most similar neighbourhoods in **Toronto** city to **Brooklyn’s Sunset Park, New York**. We are interested in searching on neighbourhoods which are in a **distance limit of 7 km from the new workplace**. The new workplace will be located in **129 Spadina Ave**. 

We will use our data science powers to generate a few most promissing neighborhoods. Advantages of each area will then be clearly expressed so that best possible final location can be chosen.

## Data <a name="data"></a>

Based on the definition of our problem, factors that will influence our decission are:
* number of each kind of venue per neighbourhood: restaurants, grocery stores, parks, etc.
* distance from the new workplace

Following data sources will be needed to extract/generate the required information:
* neighbourhood's locations of Toronto city from **Wikipedia and external csv file shared on this course**
* location and type of restaurants, parks and grocery stores in every neighborhood will be obtained using **Foursquare API**
* coordinate of new workplace will be obtained using **Geocoder** or given since it is unreliable

### Neighborhood Candidates

Let's create latitude & longitude coordinates for candidate neighborhoods. We will create a list of neighbourhoods covering our area of interest which is aprox. 7.5 kilometers around 129 Spadina Ave.

First, lets import all the packages we will need

In [1]:
import math
import time
import pyproj
import folium
import geocoder
import requests
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans

Lets find the latitude & longitude of **129 Spadina Ave, Toronto**

In [2]:
# give a time tolerance
timeout = time.time() + 60*0.5   # 0.5 minutes from now

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
    g = geocoder.google('129 Spadina Ave, Toronto, Ontario')
    lat_lng_coords = g.latlng
    if time.time() > timeout:
        lat_lng_coords = [43.647500, -79.395430]

latitude_work = lat_lng_coords[0]
longitude_work = lat_lng_coords[1]

Lets import the locations of Toronto's neighbourhoods from Wikipedia and the external csv file

In [3]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df=pd.read_html(url, header=0)[0]
df.columns =['Postal Code', 'Borough', 'Neighborhood']
df =  df[df['Borough']!='Not assigned']
filtered_indexes = df['Neighborhood']=='Not assigned'
df[filtered_indexes]['Neighborhood'] =  df[filtered_indexes]['Borough']
location_data = pd.read_csv('Geospatial_Coordinates.csv')
df = pd.merge(df, location_data, on="Postal Code")
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters)

In [4]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Workplace longitude={}, latitude={}'.format(longitude_work, latitude_work))
x, y = lonlat_to_xy(longitude_work, latitude_work)
print('Workplace UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Workplace longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Workplace longitude=-79.39543, latitude=43.6475
Workplace UTM X=-5311331.858992651, Y=10508971.996873861
Workplace longitude=-79.39543000000047, latitude=43.647499999999766


Filter Toronto's postal codes and neighbourhoods located in a 7.5 km range from the workplace

In [5]:
df['Distance from worplace'] = np.nan
df['Inside range'] = False
for i in range(len(df)):
    x2, y2 = lonlat_to_xy(df.iloc[i,4], df.iloc[i,3])
    distance = calc_xy_distance(x, y, x2, y2)
    df.iloc[i,5] = distance
    if distance <= 7500:
        df.iloc[i,6] = True
df = df[df['Inside range'] == True].drop('Inside range', axis=1)
df.shape
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distance from worplace
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,4186.777641
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,2467.71541
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2463.167415
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2412.581046
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,2608.91864


Let's visualize so far the candidate neighbourhoods

In [6]:
map_workplace = folium.Map(location=[longitude_work, latitude_work], zoom_start=13)
folium.Marker([longitude_work, latitude_work], popup='Workplace').add_to(map_workplace)
for lat, lon in zip(df['Latitude'], df['Longitude']):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_workplace)
map_workplace

### Current neighborhood

Let's create latitude & longitude coordinates for our current neighborhood. 

In [7]:
# give a time tolerance
timeout = time.time() + 60*0.5   # 0.5 minutes from now

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
    g = geocoder.google('Brooklyn’s Sunset Park, New York')
    lat_lng_coords = g.latlng
    if time.time() > timeout:
        lat_lng_coords = [40.639030, -73.998720]

latitude_home = lat_lng_coords[0]
longitude_home = lat_lng_coords[1]

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants, parks, grocery stores and more venues in each neighborhood.

In [8]:
CLIENT_ID = 'FGTVVEEF5SJOM5FXAHYJNRRUIHEQJKEFW0PXPTSEDUAOAEBK' # your Foursquare ID
CLIENT_SECRET = '4ICZEV0QBFHJRM1V3ONYKWAVTIXIMHPC5UG2OTTHCFXYUQWF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FGTVVEEF5SJOM5FXAHYJNRRUIHEQJKEFW0PXPTSEDUAOAEBK
CLIENT_SECRET:4ICZEV0QBFHJRM1V3ONYKWAVTIXIMHPC5UG2OTTHCFXYUQWF


In [9]:
def getNearbyVenues(df, radius=500, LIMIT = 100):
    
    venues_list=[]
    
    for index, row in df.iterrows():
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            row['Latitude'], 
            row['Longitude'], 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
              
        # return only relevant information for each nearby venue
        venues_list.append([(
            row['Neighborhood'],
            row['Borough'],
            row['Postal Code'], 
            row['Latitude'], 
            row['Longitude'],
            v['venue']['name'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])  
    
    nearby_venues.columns = ['Neighborhood',
                             'Borough',
                             'Postal Code', 
                             'Postal Code Latitude', 
                             'Postal Code Longitude', 
                             'Venue', 
                             'Venue Category']
    
    return(nearby_venues)

First, we will explore the venues in the current neighborhood: **Brooklyn’s Sunset Park, New York**

In [10]:
df_home = pd.DataFrame({'Neighborhood':['Sunset Park'], 'Borough':['Brooklyn'], 'Postal Code':['11220'], 'Latitude':[latitude_home], 'Longitude':[longitude_home]})
home_venues = getNearbyVenues(df_home)
print(home_venues.shape)
home_venues.head()

(21, 7)


Unnamed: 0,Neighborhood,Borough,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Category
0,Sunset Park,Brooklyn,11220,40.63903,-73.99872,Jentana's Pizza,Pizza Place
1,Sunset Park,Brooklyn,11220,40.63903,-73.99872,Kai Feng Fu Dumpling House,Dumpling Restaurant
2,Sunset Park,Brooklyn,11220,40.63903,-73.99872,Boat House,Seafood Restaurant
3,Sunset Park,Brooklyn,11220,40.63903,-73.99872,D&D Coffee Shop,Diner
4,Sunset Park,Brooklyn,11220,40.63903,-73.99872,Lucky Vegetarian,Vegetarian / Vegan Restaurant


Then we will explore the venues in the neighborhoods of Toronto which are close to the new workplace

In [11]:
toronto_venues = getNearbyVenues(df)
print(toronto_venues.shape)
toronto_venues.head()

(1445, 7)


Unnamed: 0,Neighborhood,Borough,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Category
0,Regent Park / Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,Roselle Desserts,Bakery
1,Regent Park / Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,Tandem Coffee,Coffee Shop
2,Regent Park / Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,Morning Glory Cafe,Breakfast Spot
3,Regent Park / Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,Cooper Koo Family YMCA,Distribution Center
4,Regent Park / Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,Body Blitz Spa East,Spa


Use venues categories as features for the postal codes

In [12]:
combi_venues = home_venues.append(toronto_venues)

In [13]:
combi_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Neighborhood,Borough,Postal Code Latitude,Postal Code Longitude,Venue,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
11220,21,21,21,21,21,21
M4K,43,43,43,43,43,43
M4M,41,41,41,41,41,41
M4T,1,1,1,1,1,1
M4V,17,17,17,17,17,17
M4W,4,4,4,4,4,4
M4X,46,46,46,46,46,46
M4Y,78,78,78,78,78,78
M5A,48,48,48,48,48,48
M5B,100,100,100,100,100,100


In [14]:
print('There are {} uniques categories in the neighborhoods.'.format(len(combi_venues['Venue Category'].unique())))

There are 219 uniques categories in the neighborhoods.


In [15]:
# one hot encoding
combi_onehot = pd.get_dummies(combi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
combi_onehot['Postal Code'] = combi_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [combi_onehot.columns[-1]] + list(combi_onehot.columns[:-1])
combi_onehot = combi_onehot[fixed_columns]

combi_onehot.head()

combi_grouped = combi_onehot.groupby('Postal Code').mean().reset_index()
print(combi_grouped.shape)

current_nb = combi_grouped[:1]
candidates_nb = combi_grouped[1:]
pd.set_option('display.max_columns', None)
combi_grouped

(29, 220)


Unnamed: 0,Postal Code,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,11220,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.238095,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0
1,M4K,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093023,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.023256,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.186047,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.023256,0.0,0.0,0.0,0.069767,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256
2,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.097561,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.073171,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439
3,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4V,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
5,M4W,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4X,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.043478,0.0,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.043478,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.021739,0.021739,0.043478,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M4Y,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.012821,0.0,0.012821,0.012821,0.0,0.012821,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.089744,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.025641,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.064103,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.025641,0.025641,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.025641,0.012821,0.0,0.0,0.0,0.038462,0.0,0.012821,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.076923,0.0,0.0,0.0,0.0,0.012821,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641
8,M5A,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0625,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.020833,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833
9,M5B,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.09,0.0,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0


Explore the top venues in current neighborhood: Brooklyn's Sunset Park, New York

In [16]:
num_top_venues = 5

for hood in current_nb['Postal Code']:
    print("----"+hood+"----")
    temp = current_nb[current_nb['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----11220----
             venue  freq
0      Pizza Place  0.24
1         Tea Room  0.10
2  Thai Restaurant  0.05
3    Deli / Bodega  0.05
4            Diner  0.05




Explore the top venues for candidates neighborhoods in Toronto:

In [17]:
num_top_venues = 5

for hood in candidates_nb['Postal Code']:
    print("----"+hood+"----")
    temp = candidates_nb[candidates_nb['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4K----
                venue  freq
0    Greek Restaurant  0.19
1         Coffee Shop  0.09
2  Italian Restaurant  0.07
3      Ice Cream Shop  0.05
4  Frozen Yogurt Shop  0.05


----M4M----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.07
2              Brewery  0.05
3  American Restaurant  0.05
4            Gastropub  0.05


----M4T----
                      venue  freq
0                      Park   1.0
1                   Airport   0.0
2                    Museum   0.0
3  Mediterranean Restaurant   0.0
4               Men's Store   0.0


----M4V----
              venue  freq
0       Coffee Shop  0.12
1               Pub  0.12
2              Bank  0.06
3      Liquor Store  0.06
4  Sushi Restaurant  0.06


----M4W----
           venue  freq
0           Park  0.50
1     Playground  0.25
2          Trail  0.25
3        Airport  0.00
4  Moving Target  0.00


----M4X----
                venue  freq
0         Coffee Shop  0.07
1                Park 

## Methodology <a name="methodology"></a>

In our firs step we obtained the data of the venues in the neighborhoods of Toronto which are in a range of 7 km from the new workplace: 129 Spadina Ave.
1. We obtained the postal code's and neighborhoods of Toronto by scraping Wikipedia
2. We obtained an approximate postal code's latitude and longitude information with an csv file shared on this course
3. We converted the latitude and logitude information to distance from the workplace in km with pyproj library
4. We filtered the postal codes by the ones in a range of 7 km from the workplace
5. We included venues information for each postal code with Foursquare

Now, with this information we will find the most similar neighborhoods of Toronto in terms of venues to the current neighborhood: Brooklyn's Sunset Park in New York. 

## Analysis <a name="analysis"></a>

We will propose Toronto neighborhoods based on similarity.

Just for exploration we will create some clusters as well to explore the trends in venues per neighborhoods in Toronto and to confirm our proposed Toronto neighborhoods.

### Similarity

In our case we will use a simple measure for finding the most similar neighbourhoods: the dot product.

In [18]:
similarity = candidates_nb.iloc[:,1:] @ current_nb.iloc[:,1:].T
maxSimilarity = similarity.sort_values(axis=0, by=0, ascending=False)[:5]
maxSimilarity

Unnamed: 0,0
3,0.047619
17,0.028139
4,0.02521
5,0.02381
6,0.02381


In [19]:
proposed_nb = candidates_nb.iloc[maxSimilarity.index.values.astype(int)-1,:]
proposed = pd.merge(proposed_nb.iloc[:,0].to_frame(), df, how='left', on=['Postal Code'])
proposed

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distance from worplace
0,M4T,Central Toronto,Moore Park / Summerhill East,43.689574,-79.38316,6882.554126
1,M5R,Central Toronto,The Annex / North Midtown / Yorkville,43.67271,-79.405678,4207.02223
2,M4V,Central Toronto,Summerhill West / Rathnelly / South Hill / For...,43.686412,-79.400049,6250.484361
3,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,5537.060134
4,M4X,Downtown Toronto,St. James Town / Cabbagetown,43.667967,-79.367675,4597.425075


In [20]:
combi2_venues = home_venues.append(toronto_venues[toronto_venues['Postal Code'].isin(proposed_nb['Postal Code'])])

# one hot encoding
combi2_onehot = pd.get_dummies(combi2_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
combi2_onehot['Postal Code'] = combi2_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [combi2_onehot.columns[-1]] + list(combi2_onehot.columns[:-1])
combi2_onehot = combi2_onehot[fixed_columns]

combi2_grouped = combi2_onehot.groupby('Postal Code').sum().reset_index()

pd.set_option('display.max_columns', None)
combi2_grouped

Unnamed: 0,Postal Code,American Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Beer Store,Breakfast Spot,Burger Joint,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Coffee Shop,Convenience Store,Cosmetics Shop,Deli / Bodega,Diner,Donut Shop,Dumpling Restaurant,Farmers Market,Flower Shop,Fried Chicken Joint,Gastropub,General Entertainment,Gift Shop,Grocery Store,Health & Beauty Service,History Museum,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Light Rail Station,Liquor Store,Market,Middle Eastern Restaurant,Moving Target,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Sports Bar,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Thai Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,11220,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,5,0,0,0,1,1,1,0,0,0,0,0,2,1,0,1,1
1,M4T,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M4V,1,0,1,0,1,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,2,1,0,0,0,1,1,1,0,0,0,0,0,1
3,M4W,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
4,M4X,1,0,0,2,1,1,1,0,1,2,1,1,3,1,0,1,1,0,0,1,1,0,1,1,1,1,0,0,1,2,1,1,0,1,1,0,0,2,1,1,2,1,1,2,2,1,0,1,0,0,0,1,0,1,0,0,0
5,M5R,1,1,0,0,0,0,0,1,0,3,0,0,2,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,0,0,1,0,3,0,0,0,0,0,0,0,0,0,1,0


Just out of curiosity I will create clusters with the neighborhoods including the current neighborhood.

I will do this to see which neighborhoods are choosen to be in the same cluster as the current neighborhood.

In [21]:
# set number of clusters
kclusters = 5

combi_grouped_clustering = combi_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(combi_grouped_clustering)

# add clustering labels
df_combi = df.append(df_home)
df_combi['Cluster Labels'] = kmeans.labels_

df_combi.sort_values('Cluster Labels')

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


Unnamed: 0,Borough,Distance from worplace,Latitude,Longitude,Neighborhood,Postal Code,Cluster Labels
97,Downtown Toronto,1535.898078,43.648429,-79.38228,First Canadian Place / Underground city,M5X,0
91,Downtown Toronto,5537.060134,43.679563,-79.377529,Rosedale,M4W,0
15,Downtown Toronto,2412.581046,43.651494,-79.375418,St. James Town,M5C,1
48,Downtown Toronto,1818.490132,43.648198,-79.379817,Commerce Court / Victoria Hotel,M5L,2
96,Downtown Toronto,4597.425075,43.667967,-79.367675,St. James Town / Cabbagetown,M4X,2
92,Downtown Toronto,2399.069349,43.646435,-79.374846,Stn A PO Boxes,M5W,2
87,Downtown Toronto,2973.057253,43.628947,-79.39442,CN Tower / King and Spadina / Railway Lands / ...,M5V,2
86,Central Toronto,6250.484361,43.686412,-79.400049,Summerhill West / Rathnelly / South Hill / For...,M4V,2
84,Downtown Toronto,1059.526074,43.653206,-79.400049,Kensington Market / Chinatown / Grange Park,M5T,2
83,Central Toronto,6882.554126,43.689574,-79.38316,Moore Park / Summerhill East,M4T,2


## Results and Discussion <a name="results"></a>

Our analysis shows that the most similar neighborhoods in Toronto to Brooklyn's Sunset Park, New York are:
* Moore Park
* Summerhill East
* The Annex 
* North Midtown 
* Yorkville
* Summerhill West 
* Rathnelly 
* South Hill
* Forest Hill SE
* Deer Park
* Rosedale
* St. James Town
* Cabbagetown

However, similarity found was small. The higher similarity found was of only 4.7%. Probably other areas further from home could be more similar. Anyways, for next analysis it can be choose wether the person wants to give more emphasis to some venues which they have in their current neighborhood: specific kind of restaurant they like, amount of green area, etc. 

When creating the clusters we found that most neighborhoo's of Toronto got classified into the same cluster. Which didn't provided us with so much insightful information to verify the results obtained with our similarity score.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to propose neighborhoods in Toronto close to the workplace that were similar to their neighborhood back home: Brooklyn's Sunset Park, New York.

We proposed a total of 13 neighborhood's in Toronto that where in a distance range of 7.5 km from workplace. 

Anyways, the similarity found was small. So more work needs to be done to improve and explore the proposed neighborhoods, according to what the client values more to have at home. So, some exploration into what people tends to value more needs to be done. Also type of living (house, apartment, etc.), price of properties and size of the properties should be included in further analysis.

With the proposed neighborhoods, the person who is moving can help themselves to visiting those places first in the process of finding a new home in a strange city. 